Sample records for identified genomic regions

  1. Augmenting Chinese hamster genome assembly by identifying regions of high confidence.

    PubMed

    Vishwanathan, Nandita; Bandyopadhyay, Arpan A; Fu, Hsu-Yuan; Sharma, Mohit; Johnson, Kathryn C; Mudge, Joann; Ramaraj, Thiruvarangan; Onsongo, Getiria; Silverstein, Kevin A T; Jacob, Nitya M; Le, Huong; Karypis, George; Hu, Wei-Shou

    2016-09-01

    Chinese hamster Ovary (CHO) cell lines are the dominant industrial workhorses for therapeutic recombinant protein production. The availability of genome sequence of Chinese hamster and CHO cells will spur further genome and RNA sequencing of producing cell lines. However, the mammalian genomes assembled using shot-gun sequencing data still contain regions of uncertain quality due to assembly errors. Identifying high confidence regions in the assembled genome will facilitate its use for cell engineering and genome engineering. We assembled two independent drafts of Chinese hamster genome by de novo assembly from shotgun sequencing reads and by re-scaffolding and gap-filling the draft genome from NCBI for improved scaffold lengths and gap fractions. We then used the two independent assemblies to identify high confidence regions using two different approaches. First, the two independent assemblies were compared at the sequence level to identify their consensus regions as "high confidence regions" which accounts for at least 78 % of the assembled genome. Further, a genome wide comparison of the Chinese hamster scaffolds with mouse chromosomes revealed scaffolds with large blocks of collinearity, which were also compiled as high-quality scaffolds. Genome scale collinearity was complemented with EST based synteny which also revealed conserved gene order compared to mouse. As cell line sequencing becomes more commonly practiced, the approaches reported here are useful for assessing the quality of assembly and potentially facilitate the engineering of cell lines. Copyright © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  2. Whole-genome resequencing of 292 pigeonpea accessions identifies genomic regions associated with domestication and agronomic traits.

    PubMed

    Varshney, Rajeev K; Saxena, Rachit K; Upadhyaya, Hari D; Khan, Aamir W; Yu, Yue; Kim, Changhoon; Rathore, Abhishek; Kim, Dongseon; Kim, Jihun; An, Shaun; Kumar, Vinay; Anuradha, Ghanta; Yamini, Kalinati Narasimhan; Zhang, Wei; Muniswamy, Sonnappa; Kim, Jong-So; Penmetsa, R Varma; von Wettberg, Eric; Datta, Swapan K

    2017-07-01

    Pigeonpea (Cajanus cajan), a tropical grain legume with low input requirements, is expected to continue to have an important role in supplying food and nutritional security in developing countries in Asia, Africa and the tropical Americas. From whole-genome resequencing of 292 Cajanus accessions encompassing breeding lines, landraces and wild species, we characterize genome-wide variation. On the basis of a scan for selective sweeps, we find several genomic regions that were likely targets of domestication and breeding. Using genome-wide association analysis, we identify associations between several candidate genes and agronomically important traits. Candidate genes for these traits in pigeonpea have sequence similarity to genes functionally characterized in other plants for flowering time control, seed development and pod dehiscence. Our findings will allow acceleration of genetic gains for key traits to improve yield and sustainability in pigeonpea.

  3. Human genomic regions with exceptionally high levels of population differentiation identified from 911 whole-genome sequences.

    PubMed

    Colonna, Vincenza; Ayub, Qasim; Chen, Yuan; Pagani, Luca; Luisi, Pierre; Pybus, Marc; Garrison, Erik; Xue, Yali; Tyler-Smith, Chris; Abecasis, Goncalo R; Auton, Adam; Brooks, Lisa D; DePristo, Mark A; Durbin, Richard M; Handsaker, Robert E; Kang, Hyun Min; Marth, Gabor T; McVean, Gil A

    2014-06-30

    Population differentiation has proved to be effective for identifying loci under geographically localized positive selection, and has the potential to identify loci subject to balancing selection. We have previously investigated the pattern of genetic differentiation among human populations at 36.8 million genomic variants to identify sites in the genome showing high frequency differences. Here, we extend this dataset to include additional variants, survey sites with low levels of differentiation, and evaluate the extent to which highly differentiated sites are likely to result from selective or other processes. We demonstrate that while sites with low differentiation represent sampling effects rather than balancing selection, sites showing extremely high population differentiation are enriched for positive selection events and that one half may be the result of classic selective sweeps. Among these, we rediscover known examples, where we actually identify the established functional SNP, and discover novel examples including the genes ABCA12, CALD1 and ZNF804, which we speculate may be linked to adaptations in skin, calcium metabolism and defense, respectively. We identify known and many novel candidate regions for geographically restricted positive selection, and suggest several directions for further research.

  4. A genome-wide association study identifies a genomic region for the polycerate phenotype in sheep (Ovis aries).

    PubMed

    Ren, Xue; Yang, Guang-Li; Peng, Wei-Feng; Zhao, Yong-Xin; Zhang, Min; Chen, Ze-Hui; Wu, Fu-An; Kantanen, Juha; Shen, Min; Li, Meng-Hua

    2016-02-17

    Horns are a cranial appendage found exclusively in Bovidae, and play important roles in accessing resources and mates. In sheep (Ovies aries), horns vary from polled to six-horned, and human have been selecting polled animals in farming and breeding. Here, we conducted a genome-wide association study on 24 two-horned versus 22 four-horned phenotypes in a native Chinese breed of Sishui Fur sheep. Together with linkage disequilibrium (LD) analyses and haplotype-based association tests, we identified a genomic region comprising 132.0-133.1 Mb on chromosome 2 that contained the top 10 SNPs (including 4 significant SNPs) and 5 most significant haplotypes associated with the polycerate phenotype. In humans and mice, this genomic region contains the HOXD gene cluster and adjacent functional genes EVX2 and KIAA1715, which have a close association with the formation of limbs and genital buds. Our results provide new insights into the genetic basis underlying variable numbers of horns and represent a new resource for use in sheep genetics and breeding.

  5. A Genome-Wide Association Study Identifies Multiple Regions Associated with Head Size in Catfish

    PubMed Central

    Geng, Xin; Liu, Shikai; Yao, Jun; Bao, Lisui; Zhang, Jiaren; Li, Chao; Wang, Ruijia; Sha, Jin; Zeng, Peng; Zhi, Degui; Liu, Zhanjiang

    2016-01-01

    Skull morphology is fundamental to evolution and the biological adaptation of species to their environments. With aquaculture fish species, head size is also important for economic reasons because it has a direct impact on fillet yield. However, little is known about the underlying genetic basis of head size. Catfish is the primary aquaculture species in the United States. In this study, we performed a genome-wide association study using the catfish 250K SNP array with backcross hybrid catfish to map the QTL for head size (head length, head width, and head depth). One significantly associated region on linkage group (LG) 7 was identified for head length. In addition, LGs 7, 9, and 16 contain suggestively associated regions for head length. For head width, significantly associated regions were found on LG9, and additional suggestively associated regions were identified on LGs 5 and 7. No region was found associated with head depth. Head size genetic loci were mapped in catfish to genomic regions with candidate genes involved in bone development. Comparative analysis indicated that homologs of several candidate genes are also involved in skull morphology in various other species ranging from amphibian to mammalian species, suggesting possible evolutionary conservation of those genes in the control of skull morphologies. PMID:27558670

  6. Application of selection mapping to identify genomic regions associated with dairy production in sheep.

    PubMed

    Gutiérrez-Gil, Beatriz; Arranz, Juan Jose; Pong-Wong, Ricardo; García-Gámez, Elsa; Kijas, James; Wiener, Pamela

    2014-01-01

    In Europe, especially in Mediterranean areas, the sheep has been traditionally exploited as a dual purpose species, with income from both meat and milk. Modernization of husbandry methods and the establishment of breeding schemes focused on milk production have led to the development of "dairy breeds." This study investigated selective sweeps specifically related to dairy production in sheep by searching for regions commonly identified in different European dairy breeds. With this aim, genotypes from 44,545 SNP markers covering the sheep autosomes were analysed in both European dairy and non-dairy sheep breeds using two approaches: (i) identification of genomic regions showing extreme genetic differentiation between each dairy breed and a closely related non-dairy breed, and (ii) identification of regions with reduced variation (heterozygosity) in the dairy breeds using two methods. Regions detected in at least two breeds (breed pairs) by the two approaches (genetic differentiation and at least one of the heterozygosity-based analyses) were labeled as core candidate convergence regions and further investigated for candidate genes. Following this approach six regions were detected. For some of them, strong candidate genes have been proposed (e.g. ABCG2, SPP1), whereas some other genes designated as candidates based on their association with sheep and cattle dairy traits (e.g. LALBA, DGAT1A) were not associated with a detectable sweep signal. Few of the identified regions were coincident with QTL previously reported in sheep, although many of them corresponded to orthologous regions in cattle where QTL for dairy traits have been identified. Due to the limited number of QTL studies reported in sheep compared with cattle, the results illustrate the potential value of selection mapping to identify genomic regions associated with dairy traits in sheep.

  7. Application of Selection Mapping to Identify Genomic Regions Associated with Dairy Production in Sheep

    PubMed Central

    Gutiérrez-Gil, Beatriz; Arranz, Juan Jose; Pong-Wong, Ricardo; García-Gámez, Elsa; Kijas, James; Wiener, Pamela

    2014-01-01

    In Europe, especially in Mediterranean areas, the sheep has been traditionally exploited as a dual purpose species, with income from both meat and milk. Modernization of husbandry methods and the establishment of breeding schemes focused on milk production have led to the development of “dairy breeds.” This study investigated selective sweeps specifically related to dairy production in sheep by searching for regions commonly identified in different European dairy breeds. With this aim, genotypes from 44,545 SNP markers covering the sheep autosomes were analysed in both European dairy and non-dairy sheep breeds using two approaches: (i) identification of genomic regions showing extreme genetic differentiation between each dairy breed and a closely related non-dairy breed, and (ii) identification of regions with reduced variation (heterozygosity) in the dairy breeds using two methods. Regions detected in at least two breeds (breed pairs) by the two approaches (genetic differentiation and at least one of the heterozygosity-based analyses) were labeled as core candidate convergence regions and further investigated for candidate genes. Following this approach six regions were detected. For some of them, strong candidate genes have been proposed (e.g. ABCG2, SPP1), whereas some other genes designated as candidates based on their association with sheep and cattle dairy traits (e.g. LALBA, DGAT1A) were not associated with a detectable sweep signal. Few of the identified regions were coincident with QTL previously reported in sheep, although many of them corresponded to orthologous regions in cattle where QTL for dairy traits have been identified. Due to the limited number of QTL studies reported in sheep compared with cattle, the results illustrate the potential value of selection mapping to identify genomic regions associated with dairy traits in sheep. PMID:24788864

  8. Genomic scan of selective sweeps in thin and fat tail sheep breeds for identifying of candidate regions associated with fat deposition

    PubMed Central

    2012-01-01

    Background Identification of genomic regions that have been targets of selection for phenotypic traits is one of the most important and challenging areas of research in animal genetics. However, currently there are relatively few genomic regions identified that have been subject to positive selection. In this study, a genome-wide scan using ~50,000 Single Nucleotide Polymorphisms (SNPs) was performed in an attempt to identify genomic regions associated with fat deposition in fat-tail breeds. This trait and its modification are very important in those countries grazing these breeds. Results Two independent experiments using either Iranian or Ovine HapMap genotyping data contrasted thin and fat tail breeds. Population differentiation using FST in Iranian thin and fat tail breeds revealed seven genomic regions. Almost all of these regions overlapped with QTLs that had previously been identified as affecting fat and carcass yield traits in beef and dairy cattle. Study of selection sweep signatures using FST in thin and fat tail breeds sampled from the Ovine HapMap project confirmed three of these regions located on Chromosomes 5, 7 and X. We found increased homozygosity in these regions in favour of fat tail breeds on chromosome 5 and X and in favour of thin tail breeds on chromosome 7. Conclusions In this study, we were able to identify three novel regions associated with fat deposition in thin and fat tail sheep breeds. Two of these were associated with an increase of homozygosity in the fat tail breeds which would be consistent with selection for mutations affecting fat tail size several thousand years after domestication. PMID:22364287

  9. Whole genome association study identifies regions of the bovine genome and biological pathways involved in carcass trait performance in Holstein-Friesian cattle.

    PubMed

    Doran, Anthony G; Berry, Donagh P; Creevey, Christopher J

    2014-10-01

    Four traits related to carcass performance have been identified as economically important in beef production: carcass weight, carcass fat, carcass conformation of progeny and cull cow carcass weight. Although Holstein-Friesian cattle are primarily utilized for milk production, they are also an important source of meat for beef production and export. Because of this, there is great interest in understanding the underlying genomic structure influencing these traits. Several genome-wide association studies have identified regions of the bovine genome associated with growth or carcass traits, however, little is known about the mechanisms or underlying biological pathways involved. This study aims to detect regions of the bovine genome associated with carcass performance traits (employing a panel of 54,001 SNPs) using measures of genetic merit (as predicted transmitting abilities) for 5,705 Irish Holstein-Friesian animals. Candidate genes and biological pathways were then identified for each trait under investigation. Following adjustment for false discovery (q-value < 0.05), 479 quantitative trait loci (QTL) were associated with at least one of the four carcass traits using a single SNP regression approach. Using a Bayesian approach, 46 QTL were associated (posterior probability > 0.5) with at least one of the four traits. In total, 557 unique bovine genes, which mapped to 426 human orthologs, were within 500kbs of QTL found associated with a trait using the Bayesian approach. Using this information, 24 significantly over-represented pathways were identified across all traits. The most significantly over-represented biological pathway was the peroxisome proliferator-activated receptor (PPAR) signaling pathway. A large number of genomic regions putatively associated with bovine carcass traits were detected using two different statistical approaches. Notably, several significant associations were detected in close proximity to genes with a known role in animal growth

  10. Genome-Wide Association Identifies SLC2A9 and NLN Gene Regions as Associated with Entropion in Domestic Sheep

    PubMed Central

    Mousel, Michelle R.; Reynolds, James O.; White, Stephen N.

    2015-01-01

    Entropion is an inward rolling of the eyelid allowing contact between the eyelashes and cornea that may lead to blindness if not corrected. Although many mammalian species, including humans and dogs, are afflicted by congenital entropion, no specific genes or gene regions related to development of entropion have been reported in any mammalian species to date. Entropion in domestic sheep is known to have a genetic component therefore, we used domestic sheep as a model system to identify genomic regions containing genes associated with entropion. A genome-wide association was conducted with congenital entropion in 998 Columbia, Polypay, and Rambouillet sheep genotyped with 50,000 SNP markers. Prevalence of entropion was 6.01%, with all breeds represented. Logistic regression was performed in PLINK with additive allelic, recessive, dominant, and genotypic inheritance models. Two genome-wide significant (empirical P<0.05) SNP were identified, specifically markers in SLC2A9 (empirical P = 0.007; genotypic model) and near NLN (empirical P = 0.026; dominance model). Six additional genome-wide suggestive SNP (nominal P<1x10-5) were identified including markers in or near PIK3CB (P = 2.22x10-6; additive model), KCNB1 (P = 2.93x10-6; dominance model), ZC3H12C (P = 3.25x10-6; genotypic model), JPH1 (P = 4.68x20-6; genotypic model), and MYO3B (P = 5.74x10-6; recessive model). This is the first report of specific gene regions associated with congenital entropion in any mammalian species, to our knowledge. Further, none of these genes have previously been associated with any eyelid traits. These results represent the first genome-wide analysis of gene regions associated with entropion and provide target regions for the development of sheep genetic markers for marker-assisted selection. PMID:26098909

  11. Genome-Wide Association Identifies SLC2A9 and NLN Gene Regions as Associated with Entropion in Domestic Sheep.

    PubMed

    Mousel, Michelle R; Reynolds, James O; White, Stephen N

    2015-01-01

    Entropion is an inward rolling of the eyelid allowing contact between the eyelashes and cornea that may lead to blindness if not corrected. Although many mammalian species, including humans and dogs, are afflicted by congenital entropion, no specific genes or gene regions related to development of entropion have been reported in any mammalian species to date. Entropion in domestic sheep is known to have a genetic component therefore, we used domestic sheep as a model system to identify genomic regions containing genes associated with entropion. A genome-wide association was conducted with congenital entropion in 998 Columbia, Polypay, and Rambouillet sheep genotyped with 50,000 SNP markers. Prevalence of entropion was 6.01%, with all breeds represented. Logistic regression was performed in PLINK with additive allelic, recessive, dominant, and genotypic inheritance models. Two genome-wide significant (empirical P<0.05) SNP were identified, specifically markers in SLC2A9 (empirical P = 0.007; genotypic model) and near NLN (empirical P = 0.026; dominance model). Six additional genome-wide suggestive SNP (nominal P<1x10(-5)) were identified including markers in or near PIK3CB (P = 2.22x10(-6); additive model), KCNB1 (P = 2.93x10(-6); dominance model), ZC3H12C (P = 3.25x10(-6); genotypic model), JPH1 (P = 4.68x20(-6); genotypic model), and MYO3B (P = 5.74x10(-6); recessive model). This is the first report of specific gene regions associated with congenital entropion in any mammalian species, to our knowledge. Further, none of these genes have previously been associated with any eyelid traits. These results represent the first genome-wide analysis of gene regions associated with entropion and provide target regions for the development of sheep genetic markers for marker-assisted selection.

  12. Pooled-DNA sequencing identifies genomic regions of selection in Nigerian isolates of Plasmodium falciparum.

    PubMed

    Oyebola, Kolapo M; Idowu, Emmanuel T; Olukosi, Yetunde A; Awolola, Taiwo S; Amambua-Ngwa, Alfred

    2017-06-29

    The burden of falciparum malaria is especially high in sub-Saharan Africa. Differences in pressure from host immunity and antimalarial drugs lead to adaptive changes responsible for high level of genetic variations within and between the parasite populations. Population-specific genetic studies to survey for genes under positive or balancing selection resulting from drug pressure or host immunity will allow for refinement of interventions. We performed a pooled sequencing (pool-seq) of the genomes of 100 Plasmodium falciparum isolates from Nigeria. We explored allele-frequency based neutrality test (Tajima's D) and integrated haplotype score (iHS) to identify genes under selection. Fourteen shared iHS regions that had at least 2 SNPs with a score > 2.5 were identified. These regions code for genes that were likely to have been under strong directional selection. Two of these genes were the chloroquine resistance transporter (CRT) on chromosome 7 and the multidrug resistance 1 (MDR1) on chromosome 5. There was a weak signature of selection in the dihydrofolate reductase (DHFR) gene on chromosome 4 and MDR5 genes on chromosome 13, with only 2 and 3 SNPs respectively identified within the iHS window. We observed strong selection pressure attributable to continued chloroquine and sulfadoxine-pyrimethamine use despite their official proscription for the treatment of uncomplicated malaria. There was also a major selective sweep on chromosome 6 which had 32 SNPs within the shared iHS region. Tajima's D of circumsporozoite protein (CSP), erythrocyte-binding antigen (EBA-175), merozoite surface proteins - MSP3 and MSP7, merozoite surface protein duffy binding-like (MSPDBL2) and serine repeat antigen (SERA-5) were 1.38, 1.29, 0.73, 0.84 and 0.21, respectively. We have demonstrated the use of pool-seq to understand genomic patterns of selection and variability in P. falciparum from Nigeria, which bears the highest burden of infections. This investigation identified known

  13. Genome-wide methylation analysis identified sexually dimorphic methylated regions in hybrid tilapia

    PubMed Central

    Wan, Zi Yi; Xia, Jun Hong; Lin, Grace; Wang, Le; Lin, Valerie C. L.; Yue, Gen Hua

    2016-01-01

    Sexual dimorphism is an interesting biological phenomenon. Previous studies showed that DNA methylation might play a role in sexual dimorphism. However, the overall picture of the genome-wide methylation landscape in sexually dimorphic species remains unclear. We analyzed the DNA methylation landscape and transcriptome in hybrid tilapia (Oreochromis spp.) using whole genome bisulfite sequencing (WGBS) and RNA-sequencing (RNA-seq). We found 4,757 sexually dimorphic differentially methylated regions (DMRs), with significant clusters of DMRs located on chromosomal regions associated with sex determination. CpG methylation in promoter regions was negatively correlated with the gene expression level. MAPK/ERK pathway was upregulated in male tilapia. We also inferred active cis-regulatory regions (ACRs) in skeletal muscle tissues from WGBS datasets, revealing sexually dimorphic cis-regulatory regions. These results suggest that DNA methylation contribute to sex-specific phenotypes and serve as resources for further investigation to analyze the functions of these regions and their contributions towards sexual dimorphisms. PMID:27782217

  14. Mammalian genomic regulatory regions predicted by utilizing human genomics, transcriptomics, and epigenetics data

    PubMed Central

    Nguyen, Quan H; Tellam, Ross L; Naval-Sanchez, Marina; Porto-Neto, Laercio R; Barendse, William; Reverter, Antonio; Hayes, Benjamin; Kijas, James; Dalrymple, Brian P

    2018-01-01

    Abstract Genome sequences for hundreds of mammalian species are available, but an understanding of their genomic regulatory regions, which control gene expression, is only beginning. A comprehensive prediction of potential active regulatory regions is necessary to functionally study the roles of the majority of genomic variants in evolution, domestication, and animal production. We developed a computational method to predict regulatory DNA sequences (promoters, enhancers, and transcription factor binding sites) in production animals (cows and pigs) and extended its broad applicability to other mammals. The method utilizes human regulatory features identified from thousands of tissues, cell lines, and experimental assays to find homologous regions that are conserved in sequences and genome organization and are enriched for regulatory elements in the genome sequences of other mammalian species. Importantly, we developed a filtering strategy, including a machine learning classification method, to utilize a very small number of species-specific experimental datasets available to select for the likely active regulatory regions. The method finds the optimal combination of sensitivity and accuracy to unbiasedly predict regulatory regions in mammalian species. Furthermore, we demonstrated the utility of the predicted regulatory datasets in cattle for prioritizing variants associated with multiple production and climate change adaptation traits and identifying potential genome editing targets. PMID:29618048

  15. Mammalian genomic regulatory regions predicted by utilizing human genomics, transcriptomics, and epigenetics data.

    PubMed

    Nguyen, Quan H; Tellam, Ross L; Naval-Sanchez, Marina; Porto-Neto, Laercio R; Barendse, William; Reverter, Antonio; Hayes, Benjamin; Kijas, James; Dalrymple, Brian P

    2018-03-01

    Genome sequences for hundreds of mammalian species are available, but an understanding of their genomic regulatory regions, which control gene expression, is only beginning. A comprehensive prediction of potential active regulatory regions is necessary to functionally study the roles of the majority of genomic variants in evolution, domestication, and animal production. We developed a computational method to predict regulatory DNA sequences (promoters, enhancers, and transcription factor binding sites) in production animals (cows and pigs) and extended its broad applicability to other mammals. The method utilizes human regulatory features identified from thousands of tissues, cell lines, and experimental assays to find homologous regions that are conserved in sequences and genome organization and are enriched for regulatory elements in the genome sequences of other mammalian species. Importantly, we developed a filtering strategy, including a machine learning classification method, to utilize a very small number of species-specific experimental datasets available to select for the likely active regulatory regions. The method finds the optimal combination of sensitivity and accuracy to unbiasedly predict regulatory regions in mammalian species. Furthermore, we demonstrated the utility of the predicted regulatory datasets in cattle for prioritizing variants associated with multiple production and climate change adaptation traits and identifying potential genome editing targets.

  16. QTL-seq approach identified genomic regions and diagnostic markers for rust and late leaf spot resistance in groundnut (Arachis hypogaea L.).

    PubMed

    Pandey, Manish K; Khan, Aamir W; Singh, Vikas K; Vishwakarma, Manish K; Shasidhar, Yaduru; Kumar, Vinay; Garg, Vanika; Bhat, Ramesh S; Chitikineni, Annapurna; Janila, Pasupuleti; Guo, Baozhu; Varshney, Rajeev K

    2017-08-01

    Rust and late leaf spot (LLS) are the two major foliar fungal diseases in groundnut, and their co-occurrence leads to significant yield loss in addition to the deterioration of fodder quality. To identify candidate genomic regions controlling resistance to rust and LLS, whole-genome resequencing (WGRS)-based approach referred as 'QTL-seq' was deployed. A total of 231.67 Gb raw and 192.10 Gb of clean sequence data were generated through WGRS of resistant parent and the resistant and susceptible bulks for rust and LLS. Sequence analysis of bulks for rust and LLS with reference-guided resistant parent assembly identified 3136 single-nucleotide polymorphisms (SNPs) for rust and 66 SNPs for LLS with the read depth of ≥7 in the identified genomic region on pseudomolecule A03. Detailed analysis identified 30 nonsynonymous SNPs affecting 25 candidate genes for rust resistance, while 14 intronic and three synonymous SNPs affecting nine candidate genes for LLS resistance. Subsequently, allele-specific diagnostic markers were identified for three SNPs for rust resistance and one SNP for LLS resistance. Genotyping of one RIL population (TAG 24 × GPBD 4) with these four diagnostic markers revealed higher phenotypic variation for these two diseases. These results suggest usefulness of QTL-seq approach in precise and rapid identification of candidate genomic regions and development of diagnostic markers for breeding applications. © 2016 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.

  17. Genome-wide comparisons of phylogenetic similarities between partial genomic regions and the full-length genome in Hepatitis E virus genotyping.

    PubMed

    Wang, Shuai; Wei, Wei; Luo, Xuenong; Cai, Xuepeng

    2014-01-01

    Besides the complete genome, different partial genomic sequences of Hepatitis E virus (HEV) have been used in genotyping studies, making it difficult to compare the results based on them. No commonly agreed partial region for HEV genotyping has been determined. In this study, we used a statistical method to evaluate the phylogenetic performance of each partial genomic sequence from a genome wide, by comparisons of evolutionary distances between genomic regions and the full-length genomes of 101 HEV isolates to identify short genomic regions that can reproduce HEV genotype assignments based on full-length genomes. Several genomic regions, especially one genomic region at the 3'-terminal of the papain-like cysteine protease domain, were detected to have relatively high phylogenetic correlations with the full-length genome. Phylogenetic analyses confirmed the identical performances between these regions and the full-length genome in genotyping, in which the HEV isolates involved could be divided into reasonable genotypes. This analysis may be of value in developing a partial sequence-based consensus classification of HEV species.

  18. QTL-seq approach identified genomic regions and diagnostic markers for rust and late leaf spot resistance in groundnut (Arachis hypogaea L.)

    USDA-ARS?s Scientific Manuscript database

    Rust and late leaf spot (LLS) are the two major foliar fungal diseases in groundnut, and their co-occurrence leads to yield loss up to 50–70% in addition to the deterioration of fodder quality. To identify candidate genomic regions controlling rust and LLS resistance, we deployed whole genome re-seq...

  19. Cracking the genomic piggy bank: identifying secrets of the pig genome.

    PubMed

    Mote, B E; Rothschild, M F

    2006-01-01

    Though researchers are uncovering valuable information about the pig genome at unprecedented speed, the porcine genome community is barely scratching the surface as to understanding interactions of the biological code. The pig genetic linkage map has nearly 5,000 loci comprised of genes, microsatellites, and amplified fragment length polymorphism markers. Likewise, the physical map is becoming denser with nearly 6,000 markers. The long awaited sequencing efforts are providing multidimensional benefits with sequence available for comparative genomics and identifying single nucleotide polymorphisms for use in linkage and trait association studies. Scientists are using exotic and commercial breeds for quantitative trait loci scans. Additionally, candidate gene studies continue to identify chromosomal regions or genes associated with economically important traits such as growth rate, leanness, feed intake, meat quality, litter size, and disease resistance. The commercial pig industry is actively incorporating these markers in marker-assisted selection along with traditional performance information to improve said traits. Researchers are utilizing novel tools including pig microarrays along with advanced bioinformatics to identify new candidate genes, understand gene function, and piece together gene networks involved in important biological processes. Advances in pig genomics and implications to the pork industry as well as human health are reviewed.

  20. Genomic suppression subtractive hybridization as a tool to identify differences in mycorrhizal fungal genomes.

    PubMed

    Murat, Claude; Zampieri, Elisa; Vallino, Marta; Daghino, Stefania; Perotto, Silvia; Bonfante, Paola

    2011-05-01

    Characterization of genomic variation among different microbial species, or different strains of the same species, is a field of significant interest with a wide range of potential applications. We have investigated the genomic variation in mycorrhizal fungal genomes through genomic suppressive subtractive hybridization. The comparison was between phylogenetically distant and close truffle species (Tuber spp.), and between isolates of the ericoid mycorrhizal fungus Oidiodendron maius featuring different degrees of metal tolerance. In the interspecies experiment, almost all the sequences that were identified in the Tuber melanosporum genome and absent in Tuber borchii and Tuber indicum corresponded to transposable elements. In the intraspecies comparison, some specific sequences corresponded to regions coding for enzymes, among them a glutathione synthetase known to be involved in metal tolerance. This approach is a quick and rather inexpensive tool to develop molecular markers for mycorrhizal fungi tracking and barcoding, to identify functional genes and to investigate the genome plasticity, adaptation and evolution. © 2011 Federation of European Microbiological Societies. Published by Blackwell Publishing Ltd. All rights reserved.

  1. AnnotateGenomicRegions: a web application.

    PubMed

    Zammataro, Luca; DeMolfetta, Rita; Bucci, Gabriele; Ceol, Arnaud; Muller, Heiko

    2014-01-01

    Modern genomic technologies produce large amounts of data that can be mapped to specific regions in the genome. Among the first steps in interpreting the results is annotation of genomic regions with known features such as genes, promoters, CpG islands etc. Several tools have been published to perform this task. However, using these tools often requires a significant amount of bioinformatics skills and/or downloading and installing dedicated software. Here we present AnnotateGenomicRegions, a web application that accepts genomic regions as input and outputs a selection of overlapping and/or neighboring genome annotations. Supported organisms include human (hg18, hg19), mouse (mm8, mm9, mm10), zebrafish (danRer7), and Saccharomyces cerevisiae (sacCer2, sacCer3). AnnotateGenomicRegions is accessible online on a public server or can be installed locally. Some frequently used annotations and genomes are embedded in the application while custom annotations may be added by the user. The increasing spread of genomic technologies generates the need for a simple-to-use annotation tool for genomic regions that can be used by biologists and bioinformaticians alike. AnnotateGenomicRegions meets this demand. AnnotateGenomicRegions is an open-source web application that can be installed on any personal computer or institute server. AnnotateGenomicRegions is available at: http://cru.genomics.iit.it/AnnotateGenomicRegions.

  2. AnnotateGenomicRegions: a web application

    PubMed Central

    2014-01-01

    Background Modern genomic technologies produce large amounts of data that can be mapped to specific regions in the genome. Among the first steps in interpreting the results is annotation of genomic regions with known features such as genes, promoters, CpG islands etc. Several tools have been published to perform this task. However, using these tools often requires a significant amount of bioinformatics skills and/or downloading and installing dedicated software. Results Here we present AnnotateGenomicRegions, a web application that accepts genomic regions as input and outputs a selection of overlapping and/or neighboring genome annotations. Supported organisms include human (hg18, hg19), mouse (mm8, mm9, mm10), zebrafish (danRer7), and Saccharomyces cerevisiae (sacCer2, sacCer3). AnnotateGenomicRegions is accessible online on a public server or can be installed locally. Some frequently used annotations and genomes are embedded in the application while custom annotations may be added by the user. Conclusions The increasing spread of genomic technologies generates the need for a simple-to-use annotation tool for genomic regions that can be used by biologists and bioinformaticians alike. AnnotateGenomicRegions meets this demand. AnnotateGenomicRegions is an open-source web application that can be installed on any personal computer or institute server. AnnotateGenomicRegions is available at: http://cru.genomics.iit.it/AnnotateGenomicRegions. PMID:24564446

  3. Integrated pathway-based approach identifies association between genomic regions at CTCF and CACNB2 and schizophrenia.

    PubMed

    Juraeva, Dilafruz; Haenisch, Britta; Zapatka, Marc; Frank, Josef; Witt, Stephanie H; Mühleisen, Thomas W; Treutlein, Jens; Strohmaier, Jana; Meier, Sandra; Degenhardt, Franziska; Giegling, Ina; Ripke, Stephan; Leber, Markus; Lange, Christoph; Schulze, Thomas G; Mössner, Rainald; Nenadic, Igor; Sauer, Heinrich; Rujescu, Dan; Maier, Wolfgang; Børglum, Anders; Ophoff, Roel; Cichon, Sven; Nöthen, Markus M; Rietschel, Marcella; Mattheisen, Manuel; Brors, Benedikt

    2014-06-01

    In the present study, an integrated hierarchical approach was applied to: (1) identify pathways associated with susceptibility to schizophrenia; (2) detect genes that may be potentially affected in these pathways since they contain an associated polymorphism; and (3) annotate the functional consequences of such single-nucleotide polymorphisms (SNPs) in the affected genes or their regulatory regions. The Global Test was applied to detect schizophrenia-associated pathways using discovery and replication datasets comprising 5,040 and 5,082 individuals of European ancestry, respectively. Information concerning functional gene-sets was retrieved from the Kyoto Encyclopedia of Genes and Genomes, Gene Ontology, and the Molecular Signatures Database. Fourteen of the gene-sets or pathways identified in the discovery dataset were confirmed in the replication dataset. These include functional processes involved in transcriptional regulation and gene expression, synapse organization, cell adhesion, and apoptosis. For two genes, i.e. CTCF and CACNB2, evidence for association with schizophrenia was available (at the gene-level) in both the discovery study and published data from the Psychiatric Genomics Consortium schizophrenia study. Furthermore, these genes mapped to four of the 14 presently identified pathways. Several of the SNPs assigned to CTCF and CACNB2 have potential functional consequences, and a gene in close proximity to CACNB2, i.e. ARL5B, was identified as a potential gene of interest. Application of the present hierarchical approach thus allowed: (1) identification of novel biological gene-sets or pathways with potential involvement in the etiology of schizophrenia, as well as replication of these findings in an independent cohort; (2) detection of genes of interest for future follow-up studies; and (3) the highlighting of novel genes in previously reported candidate regions for schizophrenia.

  4. Genome-wide prediction of cis-regulatory regions using supervised deep learning methods.

    PubMed

    Li, Yifeng; Shi, Wenqiang; Wasserman, Wyeth W

    2018-05-31

    In the human genome, 98% of DNA sequences are non-protein-coding regions that were previously disregarded as junk DNA. In fact, non-coding regions host a variety of cis-regulatory regions which precisely control the expression of genes. Thus, Identifying active cis-regulatory regions in the human genome is critical for understanding gene regulation and assessing the impact of genetic variation on phenotype. The developments of high-throughput sequencing and machine learning technologies make it possible to predict cis-regulatory regions genome wide. Based on rich data resources such as the Encyclopedia of DNA Elements (ENCODE) and the Functional Annotation of the Mammalian Genome (FANTOM) projects, we introduce DECRES based on supervised deep learning approaches for the identification of enhancer and promoter regions in the human genome. Due to their ability to discover patterns in large and complex data, the introduction of deep learning methods enables a significant advance in our knowledge of the genomic locations of cis-regulatory regions. Using models for well-characterized cell lines, we identify key experimental features that contribute to the predictive performance. Applying DECRES, we delineate locations of 300,000 candidate enhancers genome wide (6.8% of the genome, of which 40,000 are supported by bidirectional transcription data), and 26,000 candidate promoters (0.6% of the genome). The predicted annotations of cis-regulatory regions will provide broad utility for genome interpretation from functional genomics to clinical applications. The DECRES model demonstrates potentials of deep learning technologies when combined with high-throughput sequencing data, and inspires the development of other advanced neural network models for further improvement of genome annotations.

  5. Comparison of gene expression in segregating families identifies genes and genomic regions involved in a novel adaptation, zinc hyperaccumulation.

    PubMed

    Filatov, Victor; Dowdle, John; Smirnoff, Nicholas; Ford-Lloyd, Brian; Newbury, H John; Macnair, Mark R

    2006-09-01

    One of the challenges of comparative genomics is to identify specific genetic changes associated with the evolution of a novel adaptation or trait. We need to be able to disassociate the genes involved with a particular character from all the other genetic changes that take place as lineages diverge. Here we show that by comparing the transcriptional profile of segregating families with that of parent species differing in a novel trait, it is possible to narrow down substantially the list of potential target genes. In addition, by assuming synteny with a related model organism for which the complete genome sequence is available, it is possible to use the cosegregation of markers differing in transcription level to identify regions of the genome which probably contain quantitative trait loci (QTLs) for the character. This novel combination of genomics and classical genetics provides a very powerful tool to identify candidate genes. We use this methodology to investigate zinc hyperaccumulation in Arabidopsis halleri, the sister species to the model plant, Arabidopsis thaliana. We compare the transcriptional profile of A. halleri with that of its sister nonaccumulator species, Arabidopsis petraea, and between accumulator and nonaccumulator F(3)s derived from the cross between the two species. We identify eight genes which consistently show greater expression in accumulator phenotypes in both roots and shoots, including two metal transporter genes (NRAMP3 and ZIP6), and cytoplasmic aconitase, a gene involved in iron homeostasis in mammals. We also show that there appear to be two QTLs for zinc accumulation, on chromosomes 3 and 7.

  6. GEAR: genomic enrichment analysis of regional DNA copy number changes.

    PubMed

    Kim, Tae-Min; Jung, Yu-Chae; Rhyu, Mun-Gan; Jung, Myeong Ho; Chung, Yeun-Jun

    2008-02-01

    We developed an algorithm named GEAR (genomic enrichment analysis of regional DNA copy number changes) for functional interpretation of genome-wide DNA copy number changes identified by array-based comparative genomic hybridization. GEAR selects two types of chromosomal alterations with potential biological relevance, i.e. recurrent and phenotype-specific alterations. Then it performs functional enrichment analysis using a priori selected functional gene sets to identify primary and clinical genomic signatures. The genomic signatures identified by GEAR represent functionally coordinated genomic changes, which can provide clues on the underlying molecular mechanisms related to the phenotypes of interest. GEAR can help the identification of key molecular functions that are activated or repressed in the tumor genomes leading to the improved understanding on the tumor biology. GEAR software is available with online manual in the website, http://www.systemsbiology.co.kr/GEAR/.

  7. Efficiently Identifying Significant Associations in Genome-wide Association Studies

    PubMed Central

    Eskin, Eleazar

    2013-01-01

    Abstract Over the past several years, genome-wide association studies (GWAS) have implicated hundreds of genes in common disease. More recently, the GWAS approach has been utilized to identify regions of the genome that harbor variation affecting gene expression or expression quantitative trait loci (eQTLs). Unlike GWAS applied to clinical traits, where only a handful of phenotypes are analyzed per study, in eQTL studies, tens of thousands of gene expression levels are measured, and the GWAS approach is applied to each gene expression level. This leads to computing billions of statistical tests and requires substantial computational resources, particularly when applying novel statistical methods such as mixed models. We introduce a novel two-stage testing procedure that identifies all of the significant associations more efficiently than testing all the single nucleotide polymorphisms (SNPs). In the first stage, a small number of informative SNPs, or proxies, across the genome are tested. Based on their observed associations, our approach locates the regions that may contain significant SNPs and only tests additional SNPs from those regions. We show through simulations and analysis of real GWAS datasets that the proposed two-stage procedure increases the computational speed by a factor of 10. Additionally, efficient implementation of our software increases the computational speed relative to the state-of-the-art testing approaches by a factor of 75. PMID:24033261

  8. Enhancer scanning to locate regulatory regions in genomic loci

    PubMed Central

    Buckley, Melissa; Gjyshi, Anxhela; Mendoza-Fandiño, Gustavo; Baskin, Rebekah; Carvalho, Renato S.; Carvalho, Marcelo A.; Woods, Nicholas T.; Monteiro, Alvaro N.A.

    2016-01-01

    The present protocol provides a rapid, streamlined and scalable strategy to systematically scan genomic regions for the presence of transcriptional regulatory regions active in a specific cell type. It creates genomic tiles spanning a region of interest that are subsequently cloned by recombination into a luciferase reporter vector containing the Simian Virus 40 promoter. Tiling clones are transfected into specific cell types to test for the presence of transcriptional regulatory regions. The protocol includes testing of different SNP (single nucleotide polymorphism) alleles to determine their effect on regulatory activity. This procedure provides a systematic framework to identify candidate functional SNPs within a locus during functional analysis of genome-wide association studies. This protocol adapts and combines previous well-established molecular biology methods to provide a streamlined strategy, based on automated primer design and recombinational cloning to rapidly go from a genomic locus to a set of candidate functional SNPs in eight weeks. PMID:26658467

  9. Genomic regions underlying susceptibility to bovine tuberculosis in Holstein-Friesian cattle.

    PubMed

    Raphaka, Kethusegile; Matika, Oswald; Sánchez-Molano, Enrique; Mrode, Raphael; Coffey, Mike Peter; Riggio, Valentina; Glass, Elizabeth Janet; Woolliams, John Arthur; Bishop, Stephen Christopher; Banos, Georgios

    2017-03-23

    The significant social and economic loss as a result of bovine tuberculosis (bTB) presents a continuous challenge to cattle industries in the UK and worldwide. However, host genetic variation in cattle susceptibility to bTB provides an opportunity to select for resistant animals and further understand the genetic mechanisms underlying disease dynamics. The present study identified genomic regions associated with susceptibility to bTB using genome-wide association (GWA), regional heritability mapping (RHM) and chromosome association approaches. Phenotypes comprised de-regressed estimated breeding values of 804 Holstein-Friesian sires and pertained to three bTB indicator traits: i) positive reactors to the skin test with positive post-mortem examination results (phenotype 1); ii) positive reactors to the skin test regardless of post-mortem examination results (phenotype 2) and iii) as in (ii) plus non-reactors and inconclusive reactors to the skin tests with positive post-mortem examination results (phenotype 3). Genotypes based on the 50 K SNP DNA array were available and a total of 34,874 SNPs remained per animal after quality control. The estimated polygenic heritability for susceptibility to bTB was 0.26, 0.37 and 0.34 for phenotypes 1, 2 and 3, respectively. GWA analysis identified a putative SNP on Bos taurus autosomes (BTA) 2 associated with phenotype 1, and another on BTA 23 associated with phenotype 2. Genomic regions encompassing these SNPs were found to harbour potentially relevant annotated genes. RHM confirmed the effect of these genomic regions and identified new regions on BTA 18 for phenotype 1 and BTA 3 for phenotypes 2 and 3. Heritabilities of the genomic regions ranged between 0.05 and 0.08 across the three phenotypes. Chromosome association analysis indicated a major role of BTA 23 on susceptibility to bTB. Genomic regions and candidate genes identified in the present study provide an opportunity to further understand pathways critical to cattle

  10. Genome-wide association study identified a narrow chromosome 1 region associated with chicken growth traits.

    PubMed

    Xie, Liang; Luo, Chenglong; Zhang, Chengguang; Zhang, Rong; Tang, Jun; Nie, Qinghua; Ma, Li; Hu, Xiaoxiang; Li, Ning; Da, Yang; Zhang, Xiquan

    2012-01-01

    Chicken growth traits are important economic traits in broilers. A large number of studies are available on finding genetic factors affecting chicken growth. However, most of these studies identified chromosome regions containing putative quantitative trait loci and finding causal mutations is still a challenge. In this genome-wide association study (GWAS), we identified a narrow 1.5 Mb region (173.5-175 Mb) of chicken (Gallus gallus) chromosome (GGA) 1 to be strongly associated with chicken growth using 47,678 SNPs and 489 F2 chickens. The growth traits included aggregate body weight (BW) at 0-90 d of age measured weekly, biweekly average daily gains (ADG) derived from weekly body weight, and breast muscle weight (BMW), leg muscle weight (LMW) and wing weight (WW) at 90 d of age. Five SNPs in the 1.5 Mb KPNA3-FOXO1A region at GGA1 had the highest significant effects for all growth traits in this study, including a SNP at 8.9 Kb upstream of FOXO1A for BW at 22-48 d and 70 d, a SNP at 1.9 Kb downstream of FOXO1A for WW, a SNP at 20.9 Kb downstream of ENSGALG00000022732 for ADG at 29-42 d, a SNP in INTS6 for BW at 90 d, and a SNP in KPNA3 for BMW and LMW. The 1.5 Mb KPNA3-FOXO1A region contained two microRNA genes that could bind to messenger ribonucleic acid (mRNA) of IGF1, FOXO1A and KPNA3. It was further indicated that the 1.5 Mb GGA1 region had the strongest effects on chicken growth during 22-42 d.

  11. GANESH: software for customized annotation of genome regions.

    PubMed

    Huntley, Derek; Hummerich, Holger; Smedley, Damian; Kittivoravitkul, Sasivimol; McCarthy, Mark; Little, Peter; Sergot, Marek

    2003-09-01

    GANESH is a software package designed to support the genetic analysis of regions of human and other genomes. It provides a set of components that may be assembled to construct a self-updating database of DNA sequence, mapping data, and annotations of possible genome features. Once one or more remote sources of data for the target region have been identified, all sequences for that region are downloaded, assimilated, and subjected to a (configurable) set of standard database-searching and genome-analysis packages. The results are stored in compressed form in a relational database, and are updated automatically on a regular schedule so that they are always immediately available in their most up-to-date versions. A Java front-end, executed as a stand alone application or web applet, provides a graphical interface for navigating the database and for viewing the annotations. There are facilities for importing and exporting data in the format of the Distributed Annotation System (DAS), enabling a GANESH database to be used as a component of a DAS configuration. The system has been used to construct databases for about a dozen regions of human chromosomes and for three regions of mouse chromosomes.

  12. kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets.

    PubMed

    Fletez-Brant, Christopher; Lee, Dongwon; McCallion, Andrew S; Beer, Michael A

    2013-07-01

    Massively parallel sequencing technologies have made the generation of genomic data sets a routine component of many biological investigations. For example, Chromatin immunoprecipitation followed by sequence assays detect genomic regions bound (directly or indirectly) by specific factors, and DNase-seq identifies regions of open chromatin. A major bottleneck in the interpretation of these data is the identification of the underlying DNA sequence code that defines, and ultimately facilitates prediction of, these transcription factor (TF) bound or open chromatin regions. We have recently developed a novel computational methodology, which uses a support vector machine (SVM) with kmer sequence features (kmer-SVM) to identify predictive combinations of short transcription factor-binding sites, which determine the tissue specificity of these genomic assays (Lee, Karchin and Beer, Discriminative prediction of mammalian enhancers from DNA sequence. Genome Res. 2011; 21:2167-80). This regulatory information can (i) give confidence in genomic experiments by recovering previously known binding sites, and (ii) reveal novel sequence features for subsequent experimental testing of cooperative mechanisms. Here, we describe the development and implementation of a web server to allow the broader research community to independently apply our kmer-SVM to analyze and interpret their genomic datasets. We analyze five recently published data sets and demonstrate how this tool identifies accessory factors and repressive sequence elements. kmer-SVM is available at http://kmersvm.beerlab.org.

  13. kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets

    PubMed Central

    Fletez-Brant, Christopher; Lee, Dongwon; McCallion, Andrew S.; Beer, Michael A.

    2013-01-01

    Massively parallel sequencing technologies have made the generation of genomic data sets a routine component of many biological investigations. For example, Chromatin immunoprecipitation followed by sequence assays detect genomic regions bound (directly or indirectly) by specific factors, and DNase-seq identifies regions of open chromatin. A major bottleneck in the interpretation of these data is the identification of the underlying DNA sequence code that defines, and ultimately facilitates prediction of, these transcription factor (TF) bound or open chromatin regions. We have recently developed a novel computational methodology, which uses a support vector machine (SVM) with kmer sequence features (kmer-SVM) to identify predictive combinations of short transcription factor-binding sites, which determine the tissue specificity of these genomic assays (Lee, Karchin and Beer, Discriminative prediction of mammalian enhancers from DNA sequence. Genome Res. 2011; 21:2167–80). This regulatory information can (i) give confidence in genomic experiments by recovering previously known binding sites, and (ii) reveal novel sequence features for subsequent experimental testing of cooperative mechanisms. Here, we describe the development and implementation of a web server to allow the broader research community to independently apply our kmer-SVM to analyze and interpret their genomic datasets. We analyze five recently published data sets and demonstrate how this tool identifies accessory factors and repressive sequence elements. kmer-SVM is available at http://kmersvm.beerlab.org. PMID:23771147

  14. Genome-Wide Analysis in Brazilians Reveals Highly Differentiated Native American Genome Regions

    PubMed Central

    Havt, Alexandre; Nayak, Uma; Pinkerton, Relana; Farber, Emily; Concannon, Patrick; Lima, Aldo A.; Guerrant, Richard L.

    2017-01-01

    Despite its population, geographic size, and emerging economic importance, disproportionately little genome-scale research exists into genetic factors that predispose Brazilians to disease, or the population genetics of risk. After identification of suitable proxy populations and careful analysis of tri-continental admixture in 1,538 North-Eastern Brazilians to estimate individual ancestry and ancestral allele frequencies, we computed 400,000 genome-wide locus-specific branch length (LSBL) Fst statistics of Brazilian Amerindian ancestry compared to European and African; and a similar set of differentiation statistics for their Amerindian component compared with the closest Asian 1000 Genomes population (surprisingly, Bengalis in Bangladesh). After ranking SNPs by these statistics, we identified the top 10 highly differentiated SNPs in five genome regions in the LSBL tests of Brazilian Amerindian ancestry compared to European and African; and the top 10 SNPs in eight regions comparing their Amerindian component to the closest Asian 1000 Genomes population. We found SNPs within or proximal to the genes CIITA (rs6498115), SMC6 (rs1834619), and KLHL29 (rs2288697) were most differentiated in the Amerindian-specific branch, while SNPs in the genes ADAMTS9 (rs7631391), DOCK2 (rs77594147), SLC28A1 (rs28649017), ARHGAP5 (rs7151991), and CIITA (rs45601437) were most highly differentiated in the Asian comparison. These genes are known to influence immune function, metabolic and anthropometry traits, and embryonic development. These analyses have identified candidate genes for selection within Amerindian ancestry, and by comparison of the two analyses, those for which the differentiation may have arisen during the migration from Asia to the Americas. PMID:28100790

  15. Genomic regions associated with kyphosis in swine

    USDA-ARS?s Scientific Manuscript database

    Background: A back curvature defect similar to kyphosis in humans has been observed in swine herds. The defect ranges from mild to severe curvature of the thoracic vertebrate in split carcasses and has an estimated heritability of 0.3. The objective of this study was to identify genomic regions that...

  16. Correlation approach to identify coding regions in DNA sequences

    NASA Technical Reports Server (NTRS)

    Ossadnik, S. M.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1994-01-01

    Recently, it was observed that noncoding regions of DNA sequences possess long-range power-law correlations, whereas coding regions typically display only short-range correlations. We develop an algorithm based on this finding that enables investigators to perform a statistical analysis on long DNA sequences to locate possible coding regions. The algorithm is particularly successful in predicting the location of lengthy coding regions. For example, for the complete genome of yeast chromosome III (315,344 nucleotides), at least 82% of the predictions correspond to putative coding regions; the algorithm correctly identified all coding regions larger than 3000 nucleotides, 92% of coding regions between 2000 and 3000 nucleotides long, and 79% of coding regions between 1000 and 2000 nucleotides. The predictive ability of this new algorithm supports the claim that there is a fundamental difference in the correlation property between coding and noncoding sequences. This algorithm, which is not species-dependent, can be implemented with other techniques for rapidly and accurately locating relatively long coding regions in genomic sequences.

  17. Segment-Wise Genome-Wide Association Analysis Identifies a Candidate Region Associated with Schizophrenia in Three Independent Samples

    PubMed Central

    Rietschel, Marcella; Mattheisen, Manuel; Breuer, René; Schulze, Thomas G.; Nöthen, Markus M.; Levinson, Douglas; Shi, Jianxin; Gejman, Pablo V.; Cichon, Sven; Ophoff, Roel A.

    2012-01-01

    Recent studies suggest that variation in complex disorders (e.g., schizophrenia) is explained by a large number of genetic variants with small effect size (Odds Ratio∼1.05–1.1). The statistical power to detect these genetic variants in Genome Wide Association (GWA) studies with large numbers of cases and controls (∼15,000) is still low. As it will be difficult to further increase sample size, we decided to explore an alternative method for analyzing GWA data in a study of schizophrenia, dramatically reducing the number of statistical tests. The underlying hypothesis was that at least some of the genetic variants related to a common outcome are collocated in segments of chromosomes at a wider scale than single genes. Our approach was therefore to study the association between relatively large segments of DNA and disease status. An association test was performed for each SNP and the number of nominally significant tests in a segment was counted. We then performed a permutation-based binomial test to determine whether this region contained significantly more nominally significant SNPs than expected under the null hypothesis of no association, taking linkage into account. Genome Wide Association data of three independent schizophrenia case/control cohorts with European ancestry (Dutch, German, and US) using segments of DNA with variable length (2 to 32 Mbp) was analyzed. Using this approach we identified a region at chromosome 5q23.3-q31.3 (128–160 Mbp) that was significantly enriched with nominally associated SNPs in three independent case-control samples. We conclude that considering relatively wide segments of chromosomes may reveal reliable relationships between the genome and schizophrenia, suggesting novel methodological possibilities as well as raising theoretical questions. PMID:22723893

  18. Multiple genome alignment for identifying the core structure among moderately related microbial genomes.

    PubMed

    Uchiyama, Ikuo

    2008-10-31

    Identifying the set of intrinsically conserved genes, or the genomic core, among related genomes is crucial for understanding prokaryotic genomes where horizontal gene transfers are common. Although core genome identification appears to be obvious among very closely related genomes, it becomes more difficult when more distantly related genomes are compared. Here, we consider the core structure as a set of sufficiently long segments in which gene orders are conserved so that they are likely to have been inherited mainly through vertical transfer, and developed a method for identifying the core structure by finding the order of pre-identified orthologous groups (OGs) that maximally retains the conserved gene orders. The method was applied to genome comparisons of two well-characterized families, Bacillaceae and Enterobacteriaceae, and identified their core structures comprising 1438 and 2125 OGs, respectively. The core sets contained most of the essential genes and their related genes, which were primarily included in the intersection of the two core sets comprising around 700 OGs. The definition of the genomic core based on gene order conservation was demonstrated to be more robust than the simpler approach based only on gene conservation. We also investigated the core structures in terms of G+C content homogeneity and phylogenetic congruence, and found that the core genes primarily exhibited the expected characteristic, i.e., being indigenous and sharing the same history, more than the non-core genes. The results demonstrate that our strategy of genome alignment based on gene order conservation can provide an effective approach to identify the genomic core among moderately related microbial genomes.

  19. Using genome wide association studies to identify common QTL regions in three different genetic backgrounds based on Iberian pig breed.

    PubMed

    Martínez-Montes, Ángel M; Fernández, Almudena; Muñoz, María; Noguera, Jose Luis; Folch, Josep M; Fernández, Ana I

    2018-01-01

    One of the major limitation for the application of QTL results in pig breeding and QTN identification has been the limited number of QTL effects validated in different animal material. The aim of the current work was to validate QTL regions through joint and specific genome wide association and haplotype analyses for growth, fatness and premier cut weights in three different genetic backgrounds, backcrosses based on Iberian pigs, which has a major role in the analysis due to its high productive relevance. The results revealed nine common QTL regions, three segregating in all three backcrosses on SSC1, 0-3 Mb, for body weight, on SSC2, 3-9 Mb, for loin bone-in weight, and on SSC7, 3 Mb, for shoulder weight, and six segregating in two of the three backcrosses, on SSC2, SSC4, SSC6 and SSC10 for backfat thickness, shoulder and ham weights. Besides, 18 QTL regions were specifically identified in one of the three backcrosses, five identified only in BC_LD, seven in BC_DU and six in BC_PI. Beyond identifying and validating QTL, candidate genes and gene variants within the most interesting regions have been explored using functional annotation, gene expression data and SNP identification from RNA-Seq data. The results allowed us to propose a promising list of candidate mutations, those identified in PDE10A, DHCR7, MFN2 and CCNY genes located within the common QTL regions and those identified near ssc-mir-103-1 considered PANK3 regulators to be further analysed.

  20. Using genome wide association studies to identify common QTL regions in three different genetic backgrounds based on Iberian pig breed

    PubMed Central

    Martínez-Montes, Ángel M.; Fernández, Almudena; Muñoz, María; Noguera, Jose Luis; Folch, Josep M.

    2018-01-01

    One of the major limitation for the application of QTL results in pig breeding and QTN identification has been the limited number of QTL effects validated in different animal material. The aim of the current work was to validate QTL regions through joint and specific genome wide association and haplotype analyses for growth, fatness and premier cut weights in three different genetic backgrounds, backcrosses based on Iberian pigs, which has a major role in the analysis due to its high productive relevance. The results revealed nine common QTL regions, three segregating in all three backcrosses on SSC1, 0–3 Mb, for body weight, on SSC2, 3–9 Mb, for loin bone-in weight, and on SSC7, 3 Mb, for shoulder weight, and six segregating in two of the three backcrosses, on SSC2, SSC4, SSC6 and SSC10 for backfat thickness, shoulder and ham weights. Besides, 18 QTL regions were specifically identified in one of the three backcrosses, five identified only in BC_LD, seven in BC_DU and six in BC_PI. Beyond identifying and validating QTL, candidate genes and gene variants within the most interesting regions have been explored using functional annotation, gene expression data and SNP identification from RNA-Seq data. The results allowed us to propose a promising list of candidate mutations, those identified in PDE10A, DHCR7, MFN2 and CCNY genes located within the common QTL regions and those identified near ssc-mir-103-1 considered PANK3 regulators to be further analysed. PMID:29522525

  1. Differentially Methylated Region-Representational Difference Analysis (DMR-RDA): A Powerful Method to Identify DMRs in Uncharacterized Genomes.

    PubMed

    Sasheva, Pavlina; Grossniklaus, Ueli

    2017-01-01

    Over the last years, it has become increasingly clear that environmental influences can affect the epigenomic landscape and that some epigenetic variants can have heritable, phenotypic effects. While there are a variety of methods to perform genome-wide analyses of DNA methylation in model organisms, this is still a challenging task for non-model organisms without a reference genome. Differentially methylated region-representational difference analysis (DMR-RDA) is a sensitive and powerful PCR-based technique that isolates DNA fragments that are differentially methylated between two otherwise identical genomes. The technique does not require special equipment and is independent of prior knowledge about the genome. It is even applicable to genomes that have high complexity and a large size, being the method of choice for the analysis of plant non-model systems.

  2. New Sequence Variants in HLA Class II/III Region Associated with Susceptibility to Knee Osteoarthritis Identified by Genome-Wide Association Study

    PubMed Central

    Nakajima, Masahiro; Takahashi, Atsushi; Kou, Ikuyo; Rodriguez-Fontenla, Cristina; Gomez-Reino, Juan J.; Furuichi, Tatsuya; Dai, Jin; Sudo, Akihiro; Uchida, Atsumasa; Fukui, Naoshi; Kubo, Michiaki; Kamatani, Naoyuki; Tsunoda, Tatsuhiko; Malizos, Konstantinos N.; Tsezou, Aspasia; Gonzalez, Antonio; Nakamura, Yusuke; Ikegawa, Shiro

    2010-01-01

    Osteoarthritis (OA) is a common disease that has a definite genetic component. Only a few OA susceptibility genes that have definite functional evidence and replication of association have been reported, however. Through a genome-wide association study and a replication using a total of ∼4,800 Japanese subjects, we identified two single nucleotide polymorphisms (SNPs) (rs7775228 and rs10947262) associated with susceptibility to knee OA. The two SNPs were in a region containing HLA class II/III genes and their association reached genome-wide significance (combined P = 2.43×10−8 for rs7775228 and 6.73×10−8 for rs10947262). Our results suggest that immunologic mechanism is implicated in the etiology of OA. PMID:20305777

  3. Attenuation of monkeypox virus by deletion of genomic regions

    USGS Publications Warehouse

    Lopera, Juan G.; Falendysz, Elizabeth A.; Rocke, Tonie E.; Osorio, Jorge E.

    2015-01-01

    Monkeypox virus (MPXV) is an emerging pathogen from Africa that causes disease similar to smallpox. Two clades with different geographic distributions and virulence have been described. Here, we utilized bioinformatic tools to identify genomic regions in MPXV containing multiple virulence genes and explored their roles in pathogenicity; two selected regions were then deleted singularly or in combination. In vitro and in vivostudies indicated that these regions play a significant role in MPXV replication, tissue spread, and mortality in mice. Interestingly, while deletion of either region led to decreased virulence in mice, one region had no effect on in vitro replication. Deletion of both regions simultaneously also reduced cell culture replication and significantly increased the attenuation in vivo over either single deletion. Attenuated MPXV with genomic deletions present a safe and efficacious tool in the study of MPX pathogenesis and in the identification of genetic factors associated with virulence.

  4. Attenuation of monkeypox virus by deletion of genomic regions.

    PubMed

    Lopera, Juan G; Falendysz, Elizabeth A; Rocke, Tonie E; Osorio, Jorge E

    2015-01-15

    Monkeypox virus (MPXV) is an emerging pathogen from Africa that causes disease similar to smallpox. Two clades with different geographic distributions and virulence have been described. Here, we utilized bioinformatic tools to identify genomic regions in MPXV containing multiple virulence genes and explored their roles in pathogenicity; two selected regions were then deleted singularly or in combination. In vitro and in vivo studies indicated that these regions play a significant role in MPXV replication, tissue spread, and mortality in mice. Interestingly, while deletion of either region led to decreased virulence in mice, one region had no effect on in vitro replication. Deletion of both regions simultaneously also reduced cell culture replication and significantly increased the attenuation in vivo over either single deletion. Attenuated MPXV with genomic deletions present a safe and efficacious tool in the study of MPX pathogenesis and in the identification of genetic factors associated with virulence. Copyright © 2014 Elsevier Inc. All rights reserved.

  5. 4C-ker: A Method to Reproducibly Identify Genome-Wide Interactions Captured by 4C-Seq Experiments.

    PubMed

    Raviram, Ramya; Rocha, Pedro P; Müller, Christian L; Miraldi, Emily R; Badri, Sana; Fu, Yi; Swanzey, Emily; Proudhon, Charlotte; Snetkova, Valentina; Bonneau, Richard; Skok, Jane A

    2016-03-01

    4C-Seq has proven to be a powerful technique to identify genome-wide interactions with a single locus of interest (or "bait") that can be important for gene regulation. However, analysis of 4C-Seq data is complicated by the many biases inherent to the technique. An important consideration when dealing with 4C-Seq data is the differences in resolution of signal across the genome that result from differences in 3D distance separation from the bait. This leads to the highest signal in the region immediately surrounding the bait and increasingly lower signals in far-cis and trans. Another important aspect of 4C-Seq experiments is the resolution, which is greatly influenced by the choice of restriction enzyme and the frequency at which it can cut the genome. Thus, it is important that a 4C-Seq analysis method is flexible enough to analyze data generated using different enzymes and to identify interactions across the entire genome. Current methods for 4C-Seq analysis only identify interactions in regions near the bait or in regions located in far-cis and trans, but no method comprehensively analyzes 4C signals of different length scales. In addition, some methods also fail in experiments where chromatin fragments are generated using frequent cutter restriction enzymes. Here, we describe 4C-ker, a Hidden-Markov Model based pipeline that identifies regions throughout the genome that interact with the 4C bait locus. In addition, we incorporate methods for the identification of differential interactions in multiple 4C-seq datasets collected from different genotypes or experimental conditions. Adaptive window sizes are used to correct for differences in signal coverage in near-bait regions, far-cis and trans chromosomes. Using several datasets, we demonstrate that 4C-ker outperforms all existing 4C-Seq pipelines in its ability to reproducibly identify interaction domains at all genomic ranges with different resolution enzymes.

  6. 4C-ker: A Method to Reproducibly Identify Genome-Wide Interactions Captured by 4C-Seq Experiments

    PubMed Central

    Raviram, Ramya; Rocha, Pedro P.; Müller, Christian L.; Miraldi, Emily R.; Badri, Sana; Fu, Yi; Swanzey, Emily; Proudhon, Charlotte; Snetkova, Valentina

    2016-01-01

    4C-Seq has proven to be a powerful technique to identify genome-wide interactions with a single locus of interest (or “bait”) that can be important for gene regulation. However, analysis of 4C-Seq data is complicated by the many biases inherent to the technique. An important consideration when dealing with 4C-Seq data is the differences in resolution of signal across the genome that result from differences in 3D distance separation from the bait. This leads to the highest signal in the region immediately surrounding the bait and increasingly lower signals in far-cis and trans. Another important aspect of 4C-Seq experiments is the resolution, which is greatly influenced by the choice of restriction enzyme and the frequency at which it can cut the genome. Thus, it is important that a 4C-Seq analysis method is flexible enough to analyze data generated using different enzymes and to identify interactions across the entire genome. Current methods for 4C-Seq analysis only identify interactions in regions near the bait or in regions located in far-cis and trans, but no method comprehensively analyzes 4C signals of different length scales. In addition, some methods also fail in experiments where chromatin fragments are generated using frequent cutter restriction enzymes. Here, we describe 4C-ker, a Hidden-Markov Model based pipeline that identifies regions throughout the genome that interact with the 4C bait locus. In addition, we incorporate methods for the identification of differential interactions in multiple 4C-seq datasets collected from different genotypes or experimental conditions. Adaptive window sizes are used to correct for differences in signal coverage in near-bait regions, far-cis and trans chromosomes. Using several datasets, we demonstrate that 4C-ker outperforms all existing 4C-Seq pipelines in its ability to reproducibly identify interaction domains at all genomic ranges with different resolution enzymes. PMID:26938081

  7. Genome-wide association and regional heritability mapping to identify loci underlying variation in nematode resistance and body weight in Scottish Blackface lambs.

    PubMed

    Riggio, V; Matika, O; Pong-Wong, R; Stear, M J; Bishop, S C

    2013-05-01

    The genetic architecture underlying nematode resistance and body weight in Blackface lambs was evaluated comparing genome-wide association (GWA) and regional heritability mapping (RHM) approaches. The traits analysed were faecal egg count (FEC) and immunoglobulin A activity against third-stage larvae from Teladorsagia circumcincta, as indicators of nematode resistance, and body weight in a population of 752 Scottish Blackface lambs, genotyped with the 50k single-nucleotide polymorphism (SNP) chip. FEC for both Nematodirus and Strongyles nematodes (excluding Nematodirus), as well as body weight were collected at approximately 16, 20 and 24 weeks of age. In addition, a weighted average animal effect was estimated for both FEC and body weight traits. After quality control, 44 388 SNPs were available for the GWA analysis and 42 841 for the RHM, which utilises only mapped SNPs. The same fixed effects were used in both analyses: sex, year, management group, litter size and age of dam, with day of birth as covariate. Some genomic regions of interest for both nematode resistance and body weight traits were identified, using both GWA and RHM approaches. For both methods, strong evidence for association was found on chromosome 14 for Nematodirus average animal effect, chromosome 6 for Strongyles FEC at 16 weeks and chromosome 6 for body weight at 16 weeks. Across the entire data set, RHM identified more regions reaching the suggestive level than GWA, suggesting that RHM is capable of capturing some of the variation not detected by GWA analyses.

  8. Regulation of Sex Determination in Mice by a Non-coding Genomic Region

    PubMed Central

    Arboleda, Valerie A.; Fleming, Alice; Barseghyan, Hayk; Délot, Emmanuèle; Sinsheimer, Janet S.; Vilain, Eric

    2014-01-01

    To identify novel genomic regions that regulate sex determination, we utilized the powerful C57BL/6J-YPOS (B6-YPOS) model of XY sex reversal where mice with autosomes from the B6 strain and a Y chromosome from a wild-derived strain, Mus domesticus poschiavinus (YPOS), show complete sex reversal. In B6-YPOS, the presence of a 55-Mb congenic region on chromosome 11 protects from sex reversal in a dose-dependent manner. Using mouse genetic backcross designs and high-density SNP arrays, we narrowed the congenic region to a 1.62-Mb genomic region on chromosome 11 that confers 80% protection from B6-YPOS sex reversal when one copy is present and complete protection when two copies are present. It was previously believed that the protective congenic region originated from the 129S1/SviMJ (129) strain. However, genomic analysis revealed that this region is not derived from 129 and most likely is derived from the semi-inbred strain POSA. We show that the small 1.62-Mb congenic region that protects against B6-YPOS sex reversal is located within the Sox9 promoter and promotes the expression of Sox9, thereby driving testis development within the B6-YPOS background. Through 30 years of backcrossing, this congenic region was maintained, as it promoted male sex determination and fertility despite the female-promoting B6-YPOS genetic background. Our findings demonstrate that long-range enhancer regions are critical to developmental processes and can be used to identify the complex interplay between genome variants, epigenetics, and developmental gene regulation. PMID:24793290

  9. Identification of Genomic Regions Associated with Phenotypic Variation between Dog Breeds using Selection Mapping

    PubMed Central

    Derrien, Thomas; Axelsson, Erik; Rosengren Pielberg, Gerli; Sigurdsson, Snaevar; Fall, Tove; Seppälä, Eija H.; Hansen, Mark S. T.; Lawley, Cindy T.; Karlsson, Elinor K.; Bannasch, Danika; Vilà, Carles; Lohi, Hannes; Galibert, Francis; Fredholm, Merete; Häggström, Jens; Hedhammar, Åke; André, Catherine; Lindblad-Toh, Kerstin; Hitte, Christophe; Webster, Matthew T.

    2011-01-01

    The extraordinary phenotypic diversity of dog breeds has been sculpted by a unique population history accompanied by selection for novel and desirable traits. Here we perform a comprehensive analysis using multiple test statistics to identify regions under selection in 509 dogs from 46 diverse breeds using a newly developed high-density genotyping array consisting of >170,000 evenly spaced SNPs. We first identify 44 genomic regions exhibiting extreme differentiation across multiple breeds. Genetic variation in these regions correlates with variation in several phenotypic traits that vary between breeds, and we identify novel associations with both morphological and behavioral traits. We next scan the genome for signatures of selective sweeps in single breeds, characterized by long regions of reduced heterozygosity and fixation of extended haplotypes. These scans identify hundreds of regions, including 22 blocks of homozygosity longer than one megabase in certain breeds. Candidate selection loci are strongly enriched for developmental genes. We chose one highly differentiated region, associated with body size and ear morphology, and characterized it using high-throughput sequencing to provide a list of variants that may directly affect these traits. This study provides a catalogue of genomic regions showing extreme reduction in genetic variation or population differentiation in dogs, including many linked to phenotypic variation. The many blocks of reduced haplotype diversity observed across the genome in dog breeds are the result of both selection and genetic drift, but extended blocks of homozygosity on a megabase scale appear to be best explained by selection. Further elucidation of the variants under selection will help to uncover the genetic basis of complex traits and disease. PMID:22022279

  10. Identification of genomic regions associated with phenotypic variation between dog breeds using selection mapping.

    PubMed

    Vaysse, Amaury; Ratnakumar, Abhirami; Derrien, Thomas; Axelsson, Erik; Rosengren Pielberg, Gerli; Sigurdsson, Snaevar; Fall, Tove; Seppälä, Eija H; Hansen, Mark S T; Lawley, Cindy T; Karlsson, Elinor K; Bannasch, Danika; Vilà, Carles; Lohi, Hannes; Galibert, Francis; Fredholm, Merete; Häggström, Jens; Hedhammar, Ake; André, Catherine; Lindblad-Toh, Kerstin; Hitte, Christophe; Webster, Matthew T

    2011-10-01

    The extraordinary phenotypic diversity of dog breeds has been sculpted by a unique population history accompanied by selection for novel and desirable traits. Here we perform a comprehensive analysis using multiple test statistics to identify regions under selection in 509 dogs from 46 diverse breeds using a newly developed high-density genotyping array consisting of >170,000 evenly spaced SNPs. We first identify 44 genomic regions exhibiting extreme differentiation across multiple breeds. Genetic variation in these regions correlates with variation in several phenotypic traits that vary between breeds, and we identify novel associations with both morphological and behavioral traits. We next scan the genome for signatures of selective sweeps in single breeds, characterized by long regions of reduced heterozygosity and fixation of extended haplotypes. These scans identify hundreds of regions, including 22 blocks of homozygosity longer than one megabase in certain breeds. Candidate selection loci are strongly enriched for developmental genes. We chose one highly differentiated region, associated with body size and ear morphology, and characterized it using high-throughput sequencing to provide a list of variants that may directly affect these traits. This study provides a catalogue of genomic regions showing extreme reduction in genetic variation or population differentiation in dogs, including many linked to phenotypic variation. The many blocks of reduced haplotype diversity observed across the genome in dog breeds are the result of both selection and genetic drift, but extended blocks of homozygosity on a megabase scale appear to be best explained by selection. Further elucidation of the variants under selection will help to uncover the genetic basis of complex traits and disease.

  11. DMRfinder: efficiently identifying differentially methylated regions from MethylC-seq data.

    PubMed

    Gaspar, John M; Hart, Ronald P

    2017-11-29

    DNA methylation is an epigenetic modification that is studied at a single-base resolution with bisulfite treatment followed by high-throughput sequencing. After alignment of the sequence reads to a reference genome, methylation counts are analyzed to determine genomic regions that are differentially methylated between two or more biological conditions. Even though a variety of software packages is available for different aspects of the bioinformatics analysis, they often produce results that are biased or require excessive computational requirements. DMRfinder is a novel computational pipeline that identifies differentially methylated regions efficiently. Following alignment, DMRfinder extracts methylation counts and performs a modified single-linkage clustering of methylation sites into genomic regions. It then compares methylation levels using beta-binomial hierarchical modeling and Wald tests. Among its innovative attributes are the analyses of novel methylation sites and methylation linkage, as well as the simultaneous statistical analysis of multiple sample groups. To demonstrate its efficiency, DMRfinder is benchmarked against other computational approaches using a large published dataset. Contrasting two replicates of the same sample yielded minimal genomic regions with DMRfinder, whereas two alternative software packages reported a substantial number of false positives. Further analyses of biological samples revealed fundamental differences between DMRfinder and another software package, despite the fact that they utilize the same underlying statistical basis. For each step, DMRfinder completed the analysis in a fraction of the time required by other software. Among the computational approaches for identifying differentially methylated regions from high-throughput bisulfite sequencing datasets, DMRfinder is the first that integrates all the post-alignment steps in a single package. Compared to other software, DMRfinder is extremely efficient and unbiased in

  12. A Rapid Method of Genomic Array Analysis of Scaffold/Matrix Attachment Regions (S/MARs) Identifies a 2.5-Mb Region of Enhanced Scaffold/Matrix Attachment at a Human Neocentromere

    PubMed Central

    Sumer, Huseyin; Craig, Jeffrey M.; Sibson, Mandy; Choo, K.H. Andy

    2003-01-01

    Human neocentromeres are fully functional centromeres that arise at previously noncentromeric regions of the genome. We have tested a rapid procedure of genomic array analysis of chromosome scaffold/matrix attachment regions (S/MARs), involving the isolation of S/MAR DNA and hybridization of this DNA to a genomic BAC/PAC array. Using this procedure, we have defined a 2.5-Mb domain of S/MAR-enriched chromatin that fully encompasses a previously mapped centromere protein-A (CENP-A)-associated domain at a human neocentromere. We have independently verified this procedure using a previously established fluorescence in situ hybridization method on salt-treated metaphase chromosomes. In silico sequence analysis of the S/MAR-enriched and surrounding regions has revealed no outstanding sequence-related predisposition. This study defines the S/MAR-enriched domain of a higher eukaryotic centromere and provides a method that has broad application for the mapping of S/MAR attachment sites over large genomic regions or throughout a genome. PMID:12840048

  13. regioneR: an R/Bioconductor package for the association analysis of genomic regions based on permutation tests.

    PubMed

    Gel, Bernat; Díez-Villanueva, Anna; Serra, Eduard; Buschbeck, Marcus; Peinado, Miguel A; Malinverni, Roberto

    2016-01-15

    Statistically assessing the relation between a set of genomic regions and other genomic features is a common challenging task in genomic and epigenomic analyses. Randomization based approaches implicitly take into account the complexity of the genome without the need of assuming an underlying statistical model. regioneR is an R package that implements a permutation test framework specifically designed to work with genomic regions. In addition to the predefined randomization and evaluation strategies, regioneR is fully customizable allowing the use of custom strategies to adapt it to specific questions. Finally, it also implements a novel function to evaluate the local specificity of the detected association. regioneR is an R package released under Artistic-2.0 License. The source code and documents are freely available through Bioconductor (http://www.bioconductor.org/packages/regioneR). rmalinverni@carrerasresearch.org. © The Author 2015. Published by Oxford University Press.

  14. Five endometrial cancer risk loci identified through genome-wide association analysis.

    PubMed

    Cheng, Timothy Ht; Thompson, Deborah J; O'Mara, Tracy A; Painter, Jodie N; Glubb, Dylan M; Flach, Susanne; Lewis, Annabelle; French, Juliet D; Freeman-Mills, Luke; Church, David; Gorman, Maggie; Martin, Lynn; Hodgson, Shirley; Webb, Penelope M; Attia, John; Holliday, Elizabeth G; McEvoy, Mark; Scott, Rodney J; Henders, Anjali K; Martin, Nicholas G; Montgomery, Grant W; Nyholt, Dale R; Ahmed, Shahana; Healey, Catherine S; Shah, Mitul; Dennis, Joe; Fasching, Peter A; Beckmann, Matthias W; Hein, Alexander; Ekici, Arif B; Hall, Per; Czene, Kamila; Darabi, Hatef; Li, Jingmei; Dörk, Thilo; Dürst, Matthias; Hillemanns, Peter; Runnebaum, Ingo; Amant, Frederic; Schrauwen, Stefanie; Zhao, Hui; Lambrechts, Diether; Depreeuw, Jeroen; Dowdy, Sean C; Goode, Ellen L; Fridley, Brooke L; Winham, Stacey J; Njølstad, Tormund S; Salvesen, Helga B; Trovik, Jone; Werner, Henrica Mj; Ashton, Katie; Otton, Geoffrey; Proietto, Tony; Liu, Tao; Mints, Miriam; Tham, Emma; Consortium, Chibcha; Jun Li, Mulin; Yip, Shun H; Wang, Junwen; Bolla, Manjeet K; Michailidou, Kyriaki; Wang, Qin; Tyrer, Jonathan P; Dunlop, Malcolm; Houlston, Richard; Palles, Claire; Hopper, John L; Peto, Julian; Swerdlow, Anthony J; Burwinkel, Barbara; Brenner, Hermann; Meindl, Alfons; Brauch, Hiltrud; Lindblom, Annika; Chang-Claude, Jenny; Couch, Fergus J; Giles, Graham G; Kristensen, Vessela N; Cox, Angela; Cunningham, Julie M; Pharoah, Paul D P; Dunning, Alison M; Edwards, Stacey L; Easton, Douglas F; Tomlinson, Ian; Spurdle, Amanda B

    2016-06-01

    We conducted a meta-analysis of three endometrial cancer genome-wide association studies (GWAS) and two follow-up phases totaling 7,737 endometrial cancer cases and 37,144 controls of European ancestry. Genome-wide imputation and meta-analysis identified five new risk loci of genome-wide significance at likely regulatory regions on chromosomes 13q22.1 (rs11841589, near KLF5), 6q22.31 (rs13328298, in LOC643623 and near HEY2 and NCOA7), 8q24.21 (rs4733613, telomeric to MYC), 15q15.1 (rs937213, in EIF2AK4, near BMF) and 14q32.33 (rs2498796, in AKT1, near SIVA1). We also found a second independent 8q24.21 signal (rs17232730). Functional studies of the 13q22.1 locus showed that rs9600103 (pairwise r(2) = 0.98 with rs11841589) is located in a region of active chromatin that interacts with the KLF5 promoter region. The rs9600103[T] allele that is protective in endometrial cancer suppressed gene expression in vitro, suggesting that regulation of the expression of KLF5, a gene linked to uterine development, is implicated in tumorigenesis. These findings provide enhanced insight into the genetic and biological basis of endometrial cancer.

  15. Comparative Genomics of Campylobacter iguaniorum to Unravel Genetic Regions Associated with Reptilian Hosts

    PubMed Central

    Gilbert, Maarten J.; Miller, William G.; Yee, Emma; Kik, Marja; Zomer, Aldert L.; Wagenaar, Jaap A.; Duim, Birgitta

    2016-01-01

    Abstract Campylobacter iguaniorum is most closely related to the species C. fetus, C. hyointestinalis, and C. lanienae. Reptiles, chelonians and lizards in particular, appear to be a primary reservoir of this Campylobacter species. Here we report the genome comparison of C. iguaniorum strain 1485E, isolated from a bearded dragon (Pogona vitticeps), and strain 2463D, isolated from a green iguana (Iguana iguana), with the genomes of closely related taxa, in particular with reptile-associated C. fetus subsp. testudinum. In contrast to C. fetus, C. iguaniorum is lacking an S-layer encoding region. Furthermore, a defined lipooligosaccharide biosynthesis locus, encoding multiple glycosyltransferases and bounded by waa genes, is absent from C. iguaniorum. Instead, multiple predicted glycosylation regions were identified in C. iguaniorum. One of these regions is > 50 kb with deviant G + C content, suggesting acquisition via lateral transfer. These similar, but non-homologous glycosylation regions were located at the same position on the genome in both strains. Multiple genes encoding respiratory enzymes not identified to date within the C. fetus clade were present. C. iguaniorum shared highest homology with C. hyointestinalis and C. fetus. As in reptile-associated C. fetus subsp. testudinum, a putative tricarballylate catabolism locus was identified. However, despite colonizing a shared host, no recent recombination between both taxa was detected. This genomic study provides a better understanding of host adaptation, virulence, phylogeny, and evolution of C. iguaniorum and related Campylobacter taxa. PMID:27604878

  16. SynFind: Compiling Syntenic Regions across Any Set of Genomes on Demand.

    PubMed

    Tang, Haibao; Bomhoff, Matthew D; Briones, Evan; Zhang, Liangsheng; Schnable, James C; Lyons, Eric

    2015-11-11

    The identification of conserved syntenic regions enables discovery of predicted locations for orthologous and homeologous genes, even when no such gene is present. This capability means that synteny-based methods are far more effective than sequence similarity-based methods in identifying true-negatives, a necessity for studying gene loss and gene transposition. However, the identification of syntenic regions requires complex analyses which must be repeated for pairwise comparisons between any two species. Therefore, as the number of published genomes increases, there is a growing demand for scalable, simple-to-use applications to perform comparative genomic analyses that cater to both gene family studies and genome-scale studies. We implemented SynFind, a web-based tool that addresses this need. Given one query genome, SynFind is capable of identifying conserved syntenic regions in any set of target genomes. SynFind is capable of reporting per-gene information, useful for researchers studying specific gene families, as well as genome-wide data sets of syntenic gene and predicted gene locations, critical for researchers focused on large-scale genomic analyses. Inference of syntenic homologs provides the basis for correlation of functional changes around genes of interests between related organisms. Deployed on the CoGe online platform, SynFind is connected to the genomic data from over 15,000 organisms from all domains of life as well as supporting multiple releases of the same organism. SynFind makes use of a powerful job execution framework that promises scalability and reproducibility. SynFind can be accessed at http://genomevolution.org/CoGe/SynFind.pl. A video tutorial of SynFind using Phytophthrora as an example is available at http://www.youtube.com/watch?v=2Agczny9Nyc. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  17. Comparative Genomic Analyses of the Human NPHP1 Locus Reveal Complex Genomic Architecture and Its Regional Evolution in Primates

    PubMed Central

    Yuan, Bo; Liu, Pengfei; Gupta, Aditya; Beck, Christine R.; Tejomurtula, Anusha; Campbell, Ian M.; Gambin, Tomasz; Simmons, Alexandra D.; Withers, Marjorie A.; Harris, R. Alan; Rogers, Jeffrey; Schwartz, David C.; Lupski, James R.

    2015-01-01

    Many loci in the human genome harbor complex genomic structures that can result in susceptibility to genomic rearrangements leading to various genomic disorders. Nephronophthisis 1 (NPHP1, MIM# 256100) is an autosomal recessive disorder that can be caused by defects of NPHP1; the gene maps within the human 2q13 region where low copy repeats (LCRs) are abundant. Loss of function of NPHP1 is responsible for approximately 85% of the NPHP1 cases—about 80% of such individuals carry a large recurrent homozygous NPHP1 deletion that occurs via nonallelic homologous recombination (NAHR) between two flanking directly oriented ~45 kb LCRs. Published data revealed a non-pathogenic inversion polymorphism involving the NPHP1 gene flanked by two inverted ~358 kb LCRs. Using optical mapping and array-comparative genomic hybridization, we identified three potential novel structural variant (SV) haplotypes at the NPHP1 locus that may protect a haploid genome from the NPHP1 deletion. Inter-species comparative genomic analyses among primate genomes revealed massive genomic changes during evolution. The aggregated data suggest that dynamic genomic rearrangements occurred historically within the NPHP1 locus and generated SV haplotypes observed in the human population today, which may confer differential susceptibility to genomic instability and the NPHP1 deletion within a personal genome. Our study documents diverse SV haplotypes at a complex LCR-laden human genomic region. Comparative analyses provide a model for how this complex region arose during primate evolution, and studies among humans suggest that intra-species polymorphism may potentially modulate an individual’s susceptibility to acquiring disease-associated alleles. PMID:26641089

  18. Genome-wide association and regional heritability mapping to identify loci underlying variation in nematode resistance and body weight in Scottish Blackface lambs

    PubMed Central

    Riggio, V; Matika, O; Pong-Wong, R; Stear, M J; Bishop, S C

    2013-01-01

    The genetic architecture underlying nematode resistance and body weight in Blackface lambs was evaluated comparing genome-wide association (GWA) and regional heritability mapping (RHM) approaches. The traits analysed were faecal egg count (FEC) and immunoglobulin A activity against third-stage larvae from Teladorsagia circumcincta, as indicators of nematode resistance, and body weight in a population of 752 Scottish Blackface lambs, genotyped with the 50k single-nucleotide polymorphism (SNP) chip. FEC for both Nematodirus and Strongyles nematodes (excluding Nematodirus), as well as body weight were collected at approximately 16, 20 and 24 weeks of age. In addition, a weighted average animal effect was estimated for both FEC and body weight traits. After quality control, 44 388 SNPs were available for the GWA analysis and 42 841 for the RHM, which utilises only mapped SNPs. The same fixed effects were used in both analyses: sex, year, management group, litter size and age of dam, with day of birth as covariate. Some genomic regions of interest for both nematode resistance and body weight traits were identified, using both GWA and RHM approaches. For both methods, strong evidence for association was found on chromosome 14 for Nematodirus average animal effect, chromosome 6 for Strongyles FEC at 16 weeks and chromosome 6 for body weight at 16 weeks. Across the entire data set, RHM identified more regions reaching the suggestive level than GWA, suggesting that RHM is capable of capturing some of the variation not detected by GWA analyses. PMID:23512009

  19. A Genome-Wide Association Study Identifies Genomic Regions for Virulence in the Non-Model Organism Heterobasidion annosum s.s

    PubMed Central

    Dalman, Kerstin; Himmelstrand, Kajsa; Olson, Åke; Lind, Mårten; Brandström-Durling, Mikael; Stenlid, Jan

    2013-01-01

    The dense single nucleotide polymorphisms (SNP) panels needed for genome wide association (GWA) studies have hitherto been expensive to establish and use on non-model organisms. To overcome this, we used a next generation sequencing approach to both establish SNPs and to determine genotypes. We conducted a GWA study on a fungal species, analysing the virulence of Heterobasidion annosum s.s., a necrotrophic pathogen, on its hosts Picea abies and Pinus sylvestris. From a set of 33,018 single nucleotide polymorphisms (SNP) in 23 haploid isolates, twelve SNP markers distributed on seven contigs were associated with virulence (P<0.0001). Four of the contigs harbour known virulence genes from other fungal pathogens and the remaining three harbour novel candidate genes. Two contigs link closely to virulence regions recognized previously by QTL mapping in the congeneric hybrid H. irregulare × H. occidentale. Our study demonstrates the efficiency of GWA studies for dissecting important complex traits of small populations of non-model haploid organisms with small genomes. PMID:23341945

  20. Genome-wide association studies to identify rice salt-tolerance markers.

    PubMed

    Patishtan, Juan; Hartley, Tom N; Fonseca de Carvalho, Raquel; Maathuis, Frans J M

    2018-05-01

    Salinity is an ever increasing menace that affects agriculture worldwide. Crops such as rice are salt sensitive, but its degree of susceptibility varies widely between cultivars pointing to extensive genetic diversity that can be exploited to identify genes and proteins that are relevant in the response of rice to salt stress. We used a diversity panel of 306 rice accessions and collected phenotypic data after short (6 h), medium (7 d) and long (30 d) salinity treatment (50 mm NaCl). A genome-wide association study (GWAS) was subsequently performed, which identified around 1200 candidate genes from many functional categories, but this was treatment period dependent. Further analysis showed the presence of cation transporters and transcription factors with a known role in salinity tolerance and those that hitherto were not known to be involved in salt stress. Localization analysis of single nucleotide polymorphisms (SNPs) showed the presence of several hundred non-synonymous SNPs (nsSNPs) in coding regions and earmarked specific genomic regions with increased numbers of nsSNPs. It points to components of the ubiquitination pathway as important sources of genetic diversity that could underpin phenotypic variation in stress tolerance. © 2017 John Wiley & Sons Ltd.

  1. Identifying artificial selection signals in the chicken genome.

    PubMed

    Ma, Yunlong; Gu, Lantao; Yang, Liubin; Sun, Chenghao; Xie, Shengsong; Fang, Chengchi; Gong, Yangzhang; Li, Shijun

    2018-01-01

    Identifying the signals of artificial selection can contribute to further shaping economically important traits. Here, a chicken 600k SNP-array was employed to detect the signals of artificial selection using 331 individuals from 9 breeds, including Jingfen (JF), Jinghong (JH), Araucanas (AR), White Leghorn (WL), Pekin-Bantam (PB), Shamo (SH), Gallus-Gallus-Spadiceus (GA), Rheinlander (RH) and Vorwerkhuhn (VO). Per the population genetic structure, 9 breeds were combined into 5 breed-pools, and a 'two-step' strategy was used to reveal the signals of artificial selection. GA, which has little artificial selection, was defined as the reference population, and a total of 204, 155, 305 and 323 potential artificial selection signals were identified in AR_VO, PB, RH_WL and JH_JF, respectively. We also found signals derived from standing and de-novo genetic variations have contributed to adaptive evolution during artificial selection. Further enrichment analysis suggests that the genomic regions of artificial selection signals harbour genes, including THSR, PTHLH and PMCH, responsible for economic traits, such as fertility, growth and immunization. Overall, this study found a series of genes that contribute to the improvement of chicken breeds and revealed the genetic mechanisms of adaptive evolution, which can be used as fundamental information in future chicken functional genomics study.

  2. Characterisation of the subtelomeric regions of Giardia lamblia genome isolate WBC6.

    PubMed

    Prabhu, Anjali; Morrison, Hilary G; Martinez, Charles R; Adam, Rodney D

    2007-04-01

    Giardia trophozoites are polyploid and have five chromosomes. The chromosome homologues demonstrate considerable size heterogeneity due to variation in the subtelomeric regions. We used clones from the genome project with telomeric sequence at one end to identify six subtelomeric regions in addition to previously identified subtelomeric regions, to study the telomeric arrangement of the chromosomes. The subtelomeric regions included two retroposons, one retroposon pseudogene, and two vsp genes, in addition to the previously identified subtelomeric regions that include ribosomal DNA repeats. The presence of vsp genes in a subtelomeric region suggests that telomeric rearrangements may contribute to the generation of vsp diversity. These studies of the subtelomeric regions of Giardia may contribute to our understanding of the factors that maintain stability, while allowing diversity in chromosome structure.

  3. Comparative Genomics of Campylobacter iguaniorum to Unravel Genetic Regions Associated with Reptilian Hosts.

    PubMed

    Gilbert, Maarten J; Miller, William G; Yee, Emma; Kik, Marja; Zomer, Aldert L; Wagenaar, Jaap A; Duim, Birgitta

    2016-10-05

    Campylobacter iguaniorum is most closely related to the species C fetus, C hyointestinalis, and C lanienae Reptiles, chelonians and lizards in particular, appear to be a primary reservoir of this Campylobacter species. Here we report the genome comparison of C iguaniorum strain 1485E, isolated from a bearded dragon (Pogona vitticeps), and strain 2463D, isolated from a green iguana (Iguana iguana), with the genomes of closely related taxa, in particular with reptile-associated C fetus subsp. testudinum In contrast to C fetus, C iguaniorum is lacking an S-layer encoding region. Furthermore, a defined lipooligosaccharide biosynthesis locus, encoding multiple glycosyltransferases and bounded by waa genes, is absent from C iguaniorum Instead, multiple predicted glycosylation regions were identified in C iguaniorum One of these regions is > 50 kb with deviant G + C content, suggesting acquisition via lateral transfer. These similar, but non-homologous glycosylation regions were located at the same position on the genome in both strains. Multiple genes encoding respiratory enzymes not identified to date within the C. fetus clade were present. C iguaniorum shared highest homology with C hyointestinalis and C fetus. As in reptile-associated C fetus subsp. testudinum, a putative tricarballylate catabolism locus was identified. However, despite colonizing a shared host, no recent recombination between both taxa was detected. This genomic study provides a better understanding of host adaptation, virulence, phylogeny, and evolution of C iguaniorum and related Campylobacter taxa. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  4. Genetic analysis of multi-environmental spring wheat trials identifies genomic regions for locus-specific trade-offs for grain weight and grain number.

    PubMed

    Sukumaran, Sivakumar; Lopes, Marta; Dreisigacker, Susanne; Reynolds, Matthew

    2018-04-01

    GWAS on multi-environment data identified genomic regions associated with trade-offs for grain weight and grain number. Grain yield (GY) can be dissected into its components thousand grain weight (TGW) and grain number (GN), but little has been achieved in assessing the trade-off between them in spring wheat. In the present study, the Wheat Association Mapping Initiative (WAMI) panel of 287 elite spring bread wheat lines was phenotyped for GY, GN, and TGW in ten environments across different wheat growing regions in Mexico, South Asia, and North Africa. The panel genotyped with the 90 K Illumina Infinitum SNP array resulted in 26,814 SNPs for genome-wide association study (GWAS). Statistical analysis of the multi-environmental data for GY, GN, and TGW observed repeatability estimates of 0.76, 0.62, and 0.95, respectively. GWAS on BLUPs of combined environment analysis identified 38 loci associated with the traits. Among them four loci-6A (85 cM), 5A (98 cM), 3B (99 cM), and 2B (96 cM)-were associated with multiple traits. The study identified two loci that showed positive association between GY and TGW, with allelic substitution effects of 4% (GY) and 1.7% (TGW) for 6A locus and 0.2% (GY) and 7.2% (TGW) for 2B locus. The locus in chromosome 6A (79-85 cM) harbored a gene TaGW2-6A. We also identified that a combination of markers associated with GY, TGW, and GN together explained higher variation for GY (32%), than the markers associated with GY alone (27%). The marker-trait associations from the present study can be used for marker-assisted selection (MAS) and to discover the underlying genes for these traits in spring wheat.

  5. Identification of genomic regions associated with resistance to clinical mastitis in US Holstein cattle

    USDA-ARS?s Scientific Manuscript database

    The objective of this research was to identify genomic regions associated with clinical mastitis (MAST) in US Holsteins using producer-reported data. Genome-wide association studies (GWAS) were performed on deregressed PTA using GEMMA v. 0.94. Genotypes included 60,671 SNP for all predictor bulls (n...

  6. Genome-wide association study identifies 74 loci associated with educational attainment

    PubMed Central

    Okbay, Aysu; Beauchamp, Jonathan P.; Fontana, Mark A.; Lee, James J.; Pers, Tune H.; Rietveld, Cornelius A.; Turley, Patrick; Chen, Guo-Bo; Emilsson, Valur; Meddens, S. Fleur W.; Oskarsson, Sven; Pickrell, Joseph K.; Thom, Kevin; Timshel, Pascal; de Vlaming, Ronald; Abdellaoui, Abdel; Ahluwalia, Tarunveer S.; Bacelis, Jonas; Baumbach, Clemens; Bjornsdottir, Gyda; Brandsma, Johannes H.; Concas, Maria Pina; Derringer, Jaime; Furlotte, Nicholas A.; Galesloot, Tessel E.; Girotto, Giorgia; Gupta, Richa; Hall, Leanne M.; Harris, Sarah E.; Hofer, Edith; Horikoshi, Momoko; Huffman, Jennifer E.; Kaasik, Kadri; Kalafati, Ioanna P.; Karlsson, Robert; Kong, Augustine; Lahti, Jari; van der Lee, Sven J.; de Leeuw, Christiaan; Lind, Penelope A.; Lindgren, Karl-Oskar; Liu, Tian; Mangino, Massimo; Marten, Jonathan; Mihailov, Evelin; Miller, Michael B.; van der Most, Peter J.; Oldmeadow, Christopher; Payton, Antony; Pervjakova, Natalia; Peyrot, Wouter J.; Qian, Yong; Raitakari, Olli; Rueedi, Rico; Salvi, Erika; Schmidt, Börge; Schraut, Katharina E.; Shi, Jianxin; Smith, Albert V.; Poot, Raymond A.; Pourcain, Beate; Teumer, Alexander; Thorleifsson, Gudmar; Verweij, Niek; Vuckovic, Dragana; Wellmann, Juergen; Westra, Harm-Jan; Yang, Jingyun; Zhao, Wei; Zhu, Zhihong; Alizadeh, Behrooz Z.; Amin, Najaf; Bakshi, Andrew; Baumeister, Sebastian E.; Biino, Ginevra; Bønnelykke, Klaus; Boyle, Patricia A.; Campbell, Harry; Cappuccio, Francesco P.; Davies, Gail; De Neve, Jan-Emmanuel; Deloukas, Panos; Demuth, Ilja; Ding, Jun; Eibich, Peter; Eisele, Lewin; Eklund, Niina; Evans68, David M.; Faul, Jessica D.; Feitosa, Mary F.; Forstner, Andreas J.; Gandin, Ilaria; Gunnarsson, Bjarni; Halldórsson, Bjarni V.; Harris, Tamara B.; Heath, Andrew C.; Hocking, Lynne J.; Holliday, Elizabeth G.; Homuth, Georg; Horan, Michael A.; Hottenga, Jouke-Jan; de Jager, Philip L.; Joshi, Peter K.; Jugessur, Astanand; Kaakinen, Marika A.; Kähönen, Mika; Kanoni, Stavroula; Keltigangas-Järvinen, Liisa; Kiemeney, Lambertus A.L.M.; Kolcic, Ivana; Koskinen, Seppo; Kraja, Aldi T.; Kroh, Martin; Kutalik, Zoltan; Latvala, Antti; Launer, Lenore J.; Lebreton, Maël P.; Levinson, Douglas F.; Lichtenstein, Paul; Lichtner, Peter; Liewald, David C.M.; Loukola, Anu; Madden, Pamela A.; Mägi, Reedik; Mäki-Opas, Tomi; Marioni, Riccardo E.; Marques-Vidal, Pedro; Meddens, Gerardus A.; McMahon, George; Meisinger, Christa; Meitinger, Thomas; Milaneschi, Yusplitri; Milani, Lili; Montgomery, Grant W.; Myhre, Ronny; Nelson, Christopher P.; Nyholt, Dale R.; Ollier, William E.R.; Palotie, Aarno; Paternoster, Lavinia; Pedersen, Nancy L.; Petrovic, Katja E.; Porteous, David J.; Räikkönen, Katri; Ring, Susan M.; Robino, Antonietta; Rostapshova, Olga; Rudan, Igor; Rustichini, Aldo; Salomaa, Veikko; Sanders, Alan R.; Sarin, Antti-Pekka; Schmidt, Helena; Scott, Rodney J.; Smith, Blair H.; Smith, Jennifer A.; Staessen, Jan A.; Steinhagen-Thiessen, Elisabeth; Strauch, Konstantin; Terracciano, Antonio; Tobin, Martin D.; Ulivi, Sheila; Vaccargiu, Simona; Quaye, Lydia; van Rooij, Frank J.A.; Venturini, Cristina; Vinkhuyzen, Anna A.E.; Völker, Uwe; Völzke, Henry; Vonk, Judith M.; Vozzi, Diego; Waage, Johannes; Ware, Erin B.; Willemsen, Gonneke; Attia, John R.; Bennett, David A.; Berger, Klaus; Bertram, Lars; Bisgaard, Hans; Boomsma, Dorret I.; Borecki, Ingrid B.; Bultmann, Ute; Chabris, Christopher F.; Cucca, Francesco; Cusi, Daniele; Deary, Ian J.; Dedoussis, George V.; van Duijn, Cornelia M.; Eriksson, Johan G.; Franke, Barbara; Franke, Lude; Gasparini, Paolo; Gejman, Pablo V.; Gieger, Christian; Grabe, Hans-Jörgen; Gratten, Jacob; Groenen, Patrick J.F.; Gudnason, Vilmundur; van der Harst, Pim; Hayward, Caroline; Hinds, David A.; Hoffmann, Wolfgang; Hyppönen, Elina; Iacono, William G.; Jacobsson, Bo; Järvelin, Marjo-Riitta; Jöckel, Karl-Heinz; Kaprio, Jaakko; Kardia, Sharon L.R.; Lehtimäki, Terho; Lehrer, Steven F.; Magnusson, Patrik K.E.; Martin, Nicholas G.; McGue, Matt; Metspalu, Andres; Pendleton, Neil; Penninx, Brenda W.J.H.; Perola, Markus; Pirastu, Nicola; Pirastu, Mario; Polasek, Ozren; Posthuma, Danielle; Power, Christine; Province, Michael A.; Samani, Nilesh J.; Schlessinger, David; Schmidt, Reinhold; Sørensen, Thorkild I.A.; Spector, Tim D.; Stefansson, Kari; Thorsteinsdottir, Unnur; Thurik, A. Roy; Timpson, Nicholas J.; Tiemeier, Henning; Tung, Joyce Y.; Uitterlinden, André G.; Vitart, Veronique; Vollenweider, Peter; Weir, David R.; Wilson, James F.; Wright, Alan F.; Conley, Dalton C.; Krueger, Robert F.; Smith, George Davey; Hofman, Albert; Laibson, David I.; Medland, Sarah E.; Meyer, Michelle N.; Yang, Jian; Johannesson, Magnus; Visscher, Peter M.; Esko, Tõnu; Koellinger, Philipp D.; Cesarini, David; Benjamin, Daniel J.

    2016-01-01

    Summary Educational attainment (EA) is strongly influenced by social and other environmental factors, but genetic factors are also estimated to account for at least 20% of the variation across individuals1. We report the results of a genome-wide association study (GWAS) for EA that extends our earlier discovery sample1,2 of 101,069 individuals to 293,723 individuals, and a replication in an independent sample of 111,349 individuals from the UK Biobank. We now identify 74 genome-wide significant loci associated with number of years of schooling completed. Single-nucleotide polymorphisms (SNPs) associated with educational attainment are disproportionately found in genomic regions regulating gene expression in the fetal brain. Candidate genes are preferentially expressed in neural tissue, especially during the prenatal period, and enriched for biological pathways involved in neural development. Our findings demonstrate that, even for a behavioral phenotype that is mostly environmentally determined, a well-powered GWAS identifies replicable associated genetic variants that suggest biologically relevant pathways. Because EA is measured in large numbers of individuals, it will continue to be useful as a proxy phenotype in efforts to characterize the genetic influences of related phenotypes, including cognition and neuropsychiatric disease. PMID:27225129

  7. Genome-wide association study identifies 74 loci associated with educational attainment.

    PubMed

    Okbay, Aysu; Beauchamp, Jonathan P; Fontana, Mark Alan; Lee, James J; Pers, Tune H; Rietveld, Cornelius A; Turley, Patrick; Chen, Guo-Bo; Emilsson, Valur; Meddens, S Fleur W; Oskarsson, Sven; Pickrell, Joseph K; Thom, Kevin; Timshel, Pascal; de Vlaming, Ronald; Abdellaoui, Abdel; Ahluwalia, Tarunveer S; Bacelis, Jonas; Baumbach, Clemens; Bjornsdottir, Gyda; Brandsma, Johannes H; Pina Concas, Maria; Derringer, Jaime; Furlotte, Nicholas A; Galesloot, Tessel E; Girotto, Giorgia; Gupta, Richa; Hall, Leanne M; Harris, Sarah E; Hofer, Edith; Horikoshi, Momoko; Huffman, Jennifer E; Kaasik, Kadri; Kalafati, Ioanna P; Karlsson, Robert; Kong, Augustine; Lahti, Jari; van der Lee, Sven J; deLeeuw, Christiaan; Lind, Penelope A; Lindgren, Karl-Oskar; Liu, Tian; Mangino, Massimo; Marten, Jonathan; Mihailov, Evelin; Miller, Michael B; van der Most, Peter J; Oldmeadow, Christopher; Payton, Antony; Pervjakova, Natalia; Peyrot, Wouter J; Qian, Yong; Raitakari, Olli; Rueedi, Rico; Salvi, Erika; Schmidt, Börge; Schraut, Katharina E; Shi, Jianxin; Smith, Albert V; Poot, Raymond A; St Pourcain, Beate; Teumer, Alexander; Thorleifsson, Gudmar; Verweij, Niek; Vuckovic, Dragana; Wellmann, Juergen; Westra, Harm-Jan; Yang, Jingyun; Zhao, Wei; Zhu, Zhihong; Alizadeh, Behrooz Z; Amin, Najaf; Bakshi, Andrew; Baumeister, Sebastian E; Biino, Ginevra; Bønnelykke, Klaus; Boyle, Patricia A; Campbell, Harry; Cappuccio, Francesco P; Davies, Gail; De Neve, Jan-Emmanuel; Deloukas, Panos; Demuth, Ilja; Ding, Jun; Eibich, Peter; Eisele, Lewin; Eklund, Niina; Evans, David M; Faul, Jessica D; Feitosa, Mary F; Forstner, Andreas J; Gandin, Ilaria; Gunnarsson, Bjarni; Halldórsson, Bjarni V; Harris, Tamara B; Heath, Andrew C; Hocking, Lynne J; Holliday, Elizabeth G; Homuth, Georg; Horan, Michael A; Hottenga, Jouke-Jan; de Jager, Philip L; Joshi, Peter K; Jugessur, Astanand; Kaakinen, Marika A; Kähönen, Mika; Kanoni, Stavroula; Keltigangas-Järvinen, Liisa; Kiemeney, Lambertus A L M; Kolcic, Ivana; Koskinen, Seppo; Kraja, Aldi T; Kroh, Martin; Kutalik, Zoltan; Latvala, Antti; Launer, Lenore J; Lebreton, Maël P; Levinson, Douglas F; Lichtenstein, Paul; Lichtner, Peter; Liewald, David C M; Loukola, Anu; Madden, Pamela A; Mägi, Reedik; Mäki-Opas, Tomi; Marioni, Riccardo E; Marques-Vidal, Pedro; Meddens, Gerardus A; McMahon, George; Meisinger, Christa; Meitinger, Thomas; Milaneschi, Yusplitri; Milani, Lili; Montgomery, Grant W; Myhre, Ronny; Nelson, Christopher P; Nyholt, Dale R; Ollier, William E R; Palotie, Aarno; Paternoster, Lavinia; Pedersen, Nancy L; Petrovic, Katja E; Porteous, David J; Räikkönen, Katri; Ring, Susan M; Robino, Antonietta; Rostapshova, Olga; Rudan, Igor; Rustichini, Aldo; Salomaa, Veikko; Sanders, Alan R; Sarin, Antti-Pekka; Schmidt, Helena; Scott, Rodney J; Smith, Blair H; Smith, Jennifer A; Staessen, Jan A; Steinhagen-Thiessen, Elisabeth; Strauch, Konstantin; Terracciano, Antonio; Tobin, Martin D; Ulivi, Sheila; Vaccargiu, Simona; Quaye, Lydia; van Rooij, Frank J A; Venturini, Cristina; Vinkhuyzen, Anna A E; Völker, Uwe; Völzke, Henry; Vonk, Judith M; Vozzi, Diego; Waage, Johannes; Ware, Erin B; Willemsen, Gonneke; Attia, John R; Bennett, David A; Berger, Klaus; Bertram, Lars; Bisgaard, Hans; Boomsma, Dorret I; Borecki, Ingrid B; Bültmann, Ute; Chabris, Christopher F; Cucca, Francesco; Cusi, Daniele; Deary, Ian J; Dedoussis, George V; van Duijn, Cornelia M; Eriksson, Johan G; Franke, Barbara; Franke, Lude; Gasparini, Paolo; Gejman, Pablo V; Gieger, Christian; Grabe, Hans-Jörgen; Gratten, Jacob; Groenen, Patrick J F; Gudnason, Vilmundur; van der Harst, Pim; Hayward, Caroline; Hinds, David A; Hoffmann, Wolfgang; Hyppönen, Elina; Iacono, William G; Jacobsson, Bo; Järvelin, Marjo-Riitta; Jöckel, Karl-Heinz; Kaprio, Jaakko; Kardia, Sharon L R; Lehtimäki, Terho; Lehrer, Steven F; Magnusson, Patrik K E; Martin, Nicholas G; McGue, Matt; Metspalu, Andres; Pendleton, Neil; Penninx, Brenda W J H; Perola, Markus; Pirastu, Nicola; Pirastu, Mario; Polasek, Ozren; Posthuma, Danielle; Power, Christine; Province, Michael A; Samani, Nilesh J; Schlessinger, David; Schmidt, Reinhold; Sørensen, Thorkild I A; Spector, Tim D; Stefansson, Kari; Thorsteinsdottir, Unnur; Thurik, A Roy; Timpson, Nicholas J; Tiemeier, Henning; Tung, Joyce Y; Uitterlinden, André G; Vitart, Veronique; Vollenweider, Peter; Weir, David R; Wilson, James F; Wright, Alan F; Conley, Dalton C; Krueger, Robert F; Davey Smith, George; Hofman, Albert; Laibson, David I; Medland, Sarah E; Meyer, Michelle N; Yang, Jian; Johannesson, Magnus; Visscher, Peter M; Esko, Tõnu; Koellinger, Philipp D; Cesarini, David; Benjamin, Daniel J

    2016-05-26

    Educational attainment is strongly influenced by social and other environmental factors, but genetic factors are estimated to account for at least 20% of the variation across individuals. Here we report the results of a genome-wide association study (GWAS) for educational attainment that extends our earlier discovery sample of 101,069 individuals to 293,723 individuals, and a replication study in an independent sample of 111,349 individuals from the UK Biobank. We identify 74 genome-wide significant loci associated with the number of years of schooling completed. Single-nucleotide polymorphisms associated with educational attainment are disproportionately found in genomic regions regulating gene expression in the fetal brain. Candidate genes are preferentially expressed in neural tissue, especially during the prenatal period, and enriched for biological pathways involved in neural development. Our findings demonstrate that, even for a behavioural phenotype that is mostly environmentally determined, a well-powered GWAS identifies replicable associated genetic variants that suggest biologically relevant pathways. Because educational attainment is measured in large numbers of individuals, it will continue to be useful as a proxy phenotype in efforts to characterize the genetic influences of related phenotypes, including cognition and neuropsychiatric diseases.

  8. Genome-wide association study for Crohn's disease in the Quebec Founder Population identifies multiple validated disease loci.

    PubMed

    Raelson, John V; Little, Randall D; Ruether, Andreas; Fournier, Hélène; Paquin, Bruno; Van Eerdewegh, Paul; Bradley, W E C; Croteau, Pascal; Nguyen-Huu, Quynh; Segal, Jonathan; Debrus, Sophie; Allard, René; Rosenstiel, Philip; Franke, Andre; Jacobs, Gunnar; Nikolaus, Susanna; Vidal, Jean-Michel; Szego, Peter; Laplante, Nathalie; Clark, Hilary F; Paulussen, René J; Hooper, John W; Keith, Tim P; Belouchi, Abdelmajid; Schreiber, Stefan

    2007-09-11

    Genome-wide association (GWA) studies offer a powerful unbiased method for the identification of multiple susceptibility genes for complex diseases. Here we report the results of a GWA study for Crohn's disease (CD) using family trios from the Quebec Founder Population (QFP). Haplotype-based association analyses identified multiple regions associated with the disease that met the criteria for genome-wide significance, with many containing a gene whose function appears relevant to CD. A proportion of these were replicated in two independent German Caucasian samples, including the established CD loci NOD2 and IBD5. The recently described IL23R locus was also identified and replicated. For this region, multiple individuals with all major haplotypes in the QFP were sequenced and extensive fine mapping performed to identify risk and protective alleles. Several additional loci, including a region on 3p21 containing several plausible candidate genes, a region near JAKMIP1 on 4p16.1, and two larger regions on chromosome 17 were replicated. Together with previously published loci, the spectrum of CD genes identified to date involves biochemical networks that affect epithelial defense mechanisms, innate and adaptive immune response, and the repair or remodeling of tissue.

  9. Comparative Genome Sequence Analysis of the Bpa/Str Region in Mouse and Man

    PubMed Central

    Mallon, A.-M.; Platzer, M.; Bate, R.; Gloeckner, G.; Botcherby, M.R.M.; Nordsiek, G.; Strivens, M.A.; Kioschis, P.; Dangel, A.; Cunningham, D.; Straw, R.N.A.; Weston, P.; Gilbert, M.; Fernando, S.; Goodall, K.; Hunter, G.; Greystrong, J.S.; Clarke, D.; Kimberley, C.; Goerdes, M.; Blechschmidt, K.; Rump, A.; Hinzmann, B.; Mundy, C.R.; Miller, W.; Poustka, A.; Herman, G.E.; Rhodes, M.; Denny, P.; Rosenthal, A.; Brown, S.D.M.

    2000-01-01

    The progress of human and mouse genome sequencing programs presages the possibility of systematic cross-species comparison of the two genomes as a powerful tool for gene and regulatory element identification. As the opportunities to perform comparative sequence analysis emerge, it is important to develop parameters for such analyses and to examine the outcomes of cross-species comparison. Our analysis used gene prediction and a database search of 430 kb of genomic sequence covering the Bpa/Str region of the mouse X chromosome, and 745 kb of genomic sequence from the homologous human X chromosome region. We identified 11 genes in mouse and 13 genes and two pseudogenes in human. In addition, we compared the mouse and human sequences using pairwise alignment and searches for evolutionary conserved regions (ECRs) exceeding a defined threshold of sequence identity. This approach aided the identification of at least four further putative conserved genes in the region. Comparative sequencing revealed that this region is a mosaic in evolutionary terms, with considerably more rearrangement between the two species than realized previously from comparative mapping studies. Surprisingly, this region showed an extremely high LINE and low SINE content, low G+C content, and yet a relatively high gene density, in contrast to the low gene density usually associated with such regions. [The sequence data described in this paper have been submitted to EMBL under the following accession nos.: Mouse Genomic Sequence: Mouse contig A (AL021127), Mouse contig B (AL049866), BAC41M10 (AL136328), PAC303O11(AL136329). Human Genomic Sequence: Human contig 1 (U82671, U82670), Human contig 2 (U82695).] PMID:10854409

  10. New Regions of the Human Genome Linked to Skin Color Variation in Some African Populations

    Cancer.gov

    In the first study of its kind, an international team of genomics researchers has identified new regions of the human genome that are associated with skin color variation in some African populations, opening new avenues for research on skin diseases and cancer in all populations.

  11. TCGA study identifies genomic features of cervical cancer

    Cancer.gov

    Investigators with The Cancer Genome Atlas (TCGA) Research Network have identified novel genomic and molecular characteristics of cervical cancer that will aid in subclassification of the disease and may help target therapies that are most appropriate for each patient.

  12. Immunogenetic mechanisms leading to thyroid autoimmunity: recent advances in identifying susceptibility genes and regions.

    PubMed

    Brand, Oliver J; Gough, Stephen C L

    2011-12-01

    The autoimmune thyroid diseases (AITD) include Graves' disease (GD) and Hashimoto's thyroiditis (HT), which are characterised by a breakdown in immune tolerance to thyroid antigens. Unravelling the genetic architecture of AITD is vital to better understanding of AITD pathogenesis, required to advance therapeutic options in both disease management and prevention. The early whole-genome linkage and candidate gene association studies provided the first evidence that the HLA region and CTLA-4 represented AITD risk loci. Recent improvements in; high throughput genotyping technologies, collection of larger disease cohorts and cataloguing of genome-scale variation have facilitated genome-wide association studies and more thorough screening of candidate gene regions. This has allowed identification of many novel AITD risk genes and more detailed association mapping. The growing number of confirmed AITD susceptibility loci, implicates a number of putative disease mechanisms most of which are tightly linked with aspects of immune system function. The unprecedented advances in genetic study will allow future studies to identify further novel disease risk genes and to identify aetiological variants within specific gene regions, which will undoubtedly lead to a better understanding of AITD patho-physiology.

  13. Immunogenetic Mechanisms Leading to Thyroid Autoimmunity: Recent Advances in Identifying Susceptibility Genes and Regions

    PubMed Central

    Brand, Oliver J; Gough, Stephen C.L

    2011-01-01

    The autoimmune thyroid diseases (AITD) include Graves’ disease (GD) and Hashimoto’s thyroiditis (HT), which are characterised by a breakdown in immune tolerance to thyroid antigens. Unravelling the genetic architecture of AITD is vital to better understanding of AITD pathogenesis, required to advance therapeutic options in both disease management and prevention. The early whole-genome linkage and candidate gene association studies provided the first evidence that the HLA region and CTLA-4 represented AITD risk loci. Recent improvements in; high throughput genotyping technologies, collection of larger disease cohorts and cataloguing of genome-scale variation have facilitated genome-wide association studies and more thorough screening of candidate gene regions. This has allowed identification of many novel AITD risk genes and more detailed association mapping. The growing number of confirmed AITD susceptibility loci, implicates a number of putative disease mechanisms most of which are tightly linked with aspects of immune system function. The unprecedented advances in genetic study will allow future studies to identify further novel disease risk genes and to identify aetiological variants within specific gene regions, which will undoubtedly lead to a better understanding of AITD patho-physiology. PMID:22654554

  14. Comparative Genome Analysis of Ciprofloxacin-Resistant Pseudomonas aeruginosa Reveals Genes Within Newly Identified High Variability Regions Associated With Drug Resistance Development

    PubMed Central

    Su, Hsun-Cheng; Khatun, Jainab; Kanavy, Dona M.

    2013-01-01

    The alarming rise of ciprofloxacin-resistant Pseudomonas aeruginosa has been reported in several clinical studies. Though the mutation of resistance genes and their role in drug resistance has been researched, the process by which the bacterium acquires high-level resistance is still not well understood. How does the genomic evolution of P. aeruginosa affect resistance development? Could the exposure of antibiotics to the bacteria enrich genomic variants that lead to the development of resistance, and if so, how are these variants distributed through the genome? To answer these questions, we performed 454 pyrosequencing and a whole genome analysis both before and after exposure to ciprofloxacin. The comparative sequence data revealed 93 unique resistance strain variation sites, which included a mutation in the DNA gyrase subunit A gene. We generated variation-distribution maps comparing the wild and resistant types, and isolated 19 candidates from three discrete resistance-associated high variability regions that had available transposon mutants, to perform a ciprofloxacin exposure assay. Of these region candidates with transposon disruptions, 79% (15/19) showed a reduction in the ability to gain high-level resistance, suggesting that genes within these high variability regions might enrich for certain functions associated with resistance development. PMID:23808957

  15. GRAbB: Selective Assembly of Genomic Regions, a New Niche for Genomic Research

    PubMed Central

    Zhang, Hao; van Diepeningen, Anne D.; van der Lee, Theo A. J.; Waalwijk, Cees; de Hoog, G. Sybren

    2016-01-01

    GRAbB (Genomic Region Assembly by Baiting) is a new program that is dedicated to assemble specific genomic regions from NGS data. This approach is especially useful when dealing with multi copy regions, such as mitochondrial genome and the rDNA repeat region, parts of the genome that are often neglected or poorly assembled, although they contain interesting information from phylogenetic or epidemiologic perspectives, but also single copy regions can be assembled. The program is capable of targeting multiple regions within a single run. Furthermore, GRAbB can be used to extract specific loci from NGS data, based on homology, like sequences that are used for barcoding. To make the assembly specific, a known part of the region, such as the sequence of a PCR amplicon or a homologous sequence from a related species must be specified. By assembling only the region of interest, the assembly process is computationally much less demanding and may lead to assemblies of better quality. In this study the different applications and functionalities of the program are demonstrated such as: exhaustive assembly (rDNA region and mitochondrial genome), extracting homologous regions or genes (IGS, RPB1, RPB2 and TEF1a), as well as extracting multiple regions within a single run. The program is also compared with MITObim, which is meant for the exhaustive assembly of a single target based on a similar query sequence. GRAbB is shown to be more efficient than MITObim in terms of speed, memory and disk usage. The other functionalities (handling multiple targets simultaneously and extracting homologous regions) of the new program are not matched by other programs. The program is available with explanatory documentation at https://github.com/b-brankovics/grabb. GRAbB has been tested on Ubuntu (12.04 and 14.04), Fedora (23), CentOS (7.1.1503) and Mac OS X (10.7). Furthermore, GRAbB is available as a docker repository: brankovics/grabb (https://hub.docker.com/r/brankovics/grabb/). PMID

  16. GRAbB: Selective Assembly of Genomic Regions, a New Niche for Genomic Research.

    PubMed

    Brankovics, Balázs; Zhang, Hao; van Diepeningen, Anne D; van der Lee, Theo A J; Waalwijk, Cees; de Hoog, G Sybren

    2016-06-01

    GRAbB (Genomic Region Assembly by Baiting) is a new program that is dedicated to assemble specific genomic regions from NGS data. This approach is especially useful when dealing with multi copy regions, such as mitochondrial genome and the rDNA repeat region, parts of the genome that are often neglected or poorly assembled, although they contain interesting information from phylogenetic or epidemiologic perspectives, but also single copy regions can be assembled. The program is capable of targeting multiple regions within a single run. Furthermore, GRAbB can be used to extract specific loci from NGS data, based on homology, like sequences that are used for barcoding. To make the assembly specific, a known part of the region, such as the sequence of a PCR amplicon or a homologous sequence from a related species must be specified. By assembling only the region of interest, the assembly process is computationally much less demanding and may lead to assemblies of better quality. In this study the different applications and functionalities of the program are demonstrated such as: exhaustive assembly (rDNA region and mitochondrial genome), extracting homologous regions or genes (IGS, RPB1, RPB2 and TEF1a), as well as extracting multiple regions within a single run. The program is also compared with MITObim, which is meant for the exhaustive assembly of a single target based on a similar query sequence. GRAbB is shown to be more efficient than MITObim in terms of speed, memory and disk usage. The other functionalities (handling multiple targets simultaneously and extracting homologous regions) of the new program are not matched by other programs. The program is available with explanatory documentation at https://github.com/b-brankovics/grabb. GRAbB has been tested on Ubuntu (12.04 and 14.04), Fedora (23), CentOS (7.1.1503) and Mac OS X (10.7). Furthermore, GRAbB is available as a docker repository: brankovics/grabb (https://hub.docker.com/r/brankovics/grabb/).

  17. Novel genomes and genome constitutions identified by GISH and 5S rDNA and knotted1 genomic sequences in the genus Setaria.

    PubMed

    Zhao, Meicheng; Zhi, Hui; Doust, Andrew N; Li, Wei; Wang, Yongfang; Li, Haiquan; Jia, Guanqing; Wang, Yongqiang; Zhang, Ning; Diao, Xianmin

    2013-04-11

    The Setaria genus is increasingly of interest to researchers, as its two species, S. viridis and S. italica, are being developed as models for understanding C4 photosynthesis and plant functional genomics. The genome constitution of Setaria species has been studied in the diploid species S. viridis, S. adhaerans and S. grisebachii, where three genomes A, B and C were identified respectively. Two allotetraploid species, S. verticillata and S. faberi, were found to have AABB genomes, and one autotetraploid species, S. queenslandica, with an AAAA genome, has also been identified. The genomes and genome constitutions of most other species remain unknown, even though it was thought there are approximately 125 species in the genus distributed world-wide. GISH was performed to detect the genome constitutions of Eurasia species of S. glauca, S. plicata, and S. arenaria, with the known A, B and C genomes as probes. No or very poor hybridization signal was detected indicating that their genomes are different from those already described. GISH was also performed reciprocally between S. glauca, S. plicata, and S. arenaria genomes, but no hybridization signals between each other were found. The two sets of chromosomes of S. lachnea both hybridized strong signals with only the known C genome of S. grisebachii. Chromosomes of Qing 9, an accession formerly considered as S. viridis, hybridized strong signal only to B genome of S. adherans. Phylogenetic trees constructed with 5S rDNA and knotted1 markers, clearly classify the samples in this study into six clusters, matching the GISH results, and suggesting that the F genome of S. arenaria is basal in the genus. Three novel genomes in the Setaria genus were identified and designated as genome D (S. glauca), E (S. plicata) and F (S. arenaria) respectively. The genome constitution of tetraploid S. lachnea is putatively CCC'C'. Qing 9 is a B genome species indigenous to China and is hypothesized to be a newly identified species. The

  18. Novel genomes and genome constitutions identified by GISH and 5S rDNA and knotted1 genomic sequences in the genus Setaria

    PubMed Central

    2013-01-01

    Background The Setaria genus is increasingly of interest to researchers, as its two species, S. viridis and S. italica, are being developed as models for understanding C4 photosynthesis and plant functional genomics. The genome constitution of Setaria species has been studied in the diploid species S. viridis, S. adhaerans and S. grisebachii, where three genomes A, B and C were identified respectively. Two allotetraploid species, S. verticillata and S. faberi, were found to have AABB genomes, and one autotetraploid species, S. queenslandica, with an AAAA genome, has also been identified. The genomes and genome constitutions of most other species remain unknown, even though it was thought there are approximately 125 species in the genus distributed world-wide. Results GISH was performed to detect the genome constitutions of Eurasia species of S. glauca, S. plicata, and S. arenaria, with the known A, B and C genomes as probes. No or very poor hybridization signal was detected indicating that their genomes are different from those already described. GISH was also performed reciprocally between S. glauca, S. plicata, and S. arenaria genomes, but no hybridization signals between each other were found. The two sets of chromosomes of S. lachnea both hybridized strong signals with only the known C genome of S. grisebachii. Chromosomes of Qing 9, an accession formerly considered as S. viridis, hybridized strong signal only to B genome of S. adherans. Phylogenetic trees constructed with 5S rDNA and knotted1 markers, clearly classify the samples in this study into six clusters, matching the GISH results, and suggesting that the F genome of S. arenaria is basal in the genus. Conclusions Three novel genomes in the Setaria genus were identified and designated as genome D (S. glauca), E (S. plicata) and F (S. arenaria) respectively. The genome constitution of tetraploid S. lachnea is putatively CCC’C’. Qing 9 is a B genome species indigenous to China and is hypothesized to be

  19. Comparative genomics identifies distinct lineages of S. Enteritidis from Queensland, Australia.

    PubMed

    Graham, Rikki M A; Hiley, Lester; Rathnayake, Irani U; Jennison, Amy V

    2018-01-01

    Salmonella enterica is a major cause of gastroenteritis and foodborne illness in Australia where notification rates in the state of Queensland are the highest in the country. S. Enteritidis is among the five most common serotypes reported in Queensland and it is a priority for epidemiological surveillance due to concerns regarding its emergence in Australia. Using whole genome sequencing, we have analysed the genomic epidemiology of 217 S. Enteritidis isolates from Queensland, and observed that they fall into three distinct clades, which we have differentiated as Clades A, B and C. Phage types and MLST sequence types differed between the clades and comparative genomic analysis has shown that each has a unique profile of prophage and genomic islands. Several of the phage regions present in the S. Enteritidis reference strain P125109 were absent in Clades A and C, and these clades also had difference in the presence of pathogenicity islands, containing complete SPI-6 and SPI-19 regions, while P125109 does not. Antimicrobial resistance markers were found in 39 isolates, all but one of which belonged to Clade B. Phylogenetic analysis of the Queensland isolates in the context of 170 international strains showed that Queensland Clade B isolates group together with the previously identified global clade, while the other two clades are distinct and appear largely restricted to Australia. Locally sourced environmental isolates included in this analysis all belonged to Clades A and C, which is consistent with the theory that these clades are a source of locally acquired infection, while Clade B isolates are mostly travel related.

  20. Identifiability, genomics and U.K. data protection law.

    PubMed

    Curren, Liam; Boddington, Paula; Gowans, Heather; Hawkins, Naomi; Kanellopoulou, Nadja; Kaye, Jane; Melham, Karen

    2010-09-01

    Analyses of individuals' genomes--their entire DNA sequence--have increased knowledge about the links between genetics and disease. Anticipated advances in 'next generation' DNA-sequencing techniques will see the routine research use of whole genomes, rather than distinct parts, within the next few years. The scientific benefits of genomic research are, however, accompanied by legal and ethical concerns. Despite the assumption that genetic research data can and will be rendered anonymous, participants' identities can sometimes be elucidated, which could cause data protection legislation to apply. We undertake a timely reappraisal of these laws--particularly new penalties--and identifiability in genomic research.

  1. Genome-Wide and Gene-Based Meta-Analyses Identify Novel Loci Influencing Blood Pressure Response to Hydrochlorothiazide.

    PubMed

    Salvi, Erika; Wang, Zhiying; Rizzi, Federica; Gong, Yan; McDonough, Caitrin W; Padmanabhan, Sandosh; Hiltunen, Timo P; Lanzani, Chiara; Zaninello, Roberta; Chittani, Martina; Bailey, Kent R; Sarin, Antti-Pekka; Barcella, Matteo; Melander, Olle; Chapman, Arlene B; Manunta, Paolo; Kontula, Kimmo K; Glorioso, Nicola; Cusi, Daniele; Dominiczak, Anna F; Johnson, Julie A; Barlassina, Cristina; Boerwinkle, Eric; Cooper-DeHoff, Rhonda M; Turner, Stephen T

    2017-01-01

    This study aimed to identify novel loci influencing the antihypertensive response to hydrochlorothiazide monotherapy. A genome-wide meta-analysis of blood pressure (BP) response to hydrochlorothiazide was performed in 1739 white hypertensives from 6 clinical trials within the International Consortium for Antihypertensive Pharmacogenomics Studies, making it the largest study to date of its kind. No signals reached genome-wide significance (P<5×10 - 8 ), and the suggestive regions (P<10 -5 ) were cross-validated in 2 black cohorts treated with hydrochlorothiazide. In addition, a gene-based analysis was performed on candidate genes with previous evidence of involvement in diuretic response, in BP regulation, or in hypertension susceptibility. Using the genome-wide meta-analysis approach, with validation in blacks, we identified 2 suggestive regulatory regions linked to gap junction protein α1 gene (GJA1) and forkhead box A1 gene (FOXA1), relevant for cardiovascular and kidney function. With the gene-based approach, we identified hydroxy-delta-5-steroid dehydrogenase, 3 β- and steroid δ-isomerase 1 gene (HSD3B1) as significantly associated with BP response (P<2.28×10 - 4 ). HSD3B1 encodes the 3β-hydroxysteroid dehydrogenase enzyme and plays a crucial role in the biosynthesis of aldosterone and endogenous ouabain. By amassing all of the available pharmacogenomic studies of BP response to hydrochlorothiazide, and using 2 different analytic approaches, we identified 3 novel loci influencing BP response to hydrochlorothiazide. The gene-based analysis, never before applied to pharmacogenomics of antihypertensive drugs to our knowledge, provided a powerful strategy to identify a locus of interest, which was not identified in the genome-wide meta-analysis because of high allelic heterogeneity. These data pave the way for future investigations on new pathways and drug targets to enhance the current understanding of personalized antihypertensive treatment. © 2016

  2. Telomere maintenance through recruitment of internal genomic regions.

    PubMed

    Seo, Beomseok; Kim, Chuna; Hills, Mark; Sung, Sanghyun; Kim, Hyesook; Kim, Eunkyeong; Lim, Daisy S; Oh, Hyun-Seok; Choi, Rachael Mi Jung; Chun, Jongsik; Shim, Jaegal; Lee, Junho

    2015-09-18

    Cells surviving crisis are often tumorigenic and their telomeres are commonly maintained through the reactivation of telomerase. However, surviving cells occasionally activate a recombination-based mechanism called alternative lengthening of telomeres (ALT). Here we establish stably maintained survivors in telomerase-deleted Caenorhabditis elegans that escape from sterility by activating ALT. ALT survivors trans-duplicate an internal genomic region, which is already cis-duplicated to chromosome ends, across the telomeres of all chromosomes. These 'Template for ALT' (TALT) regions consist of a block of genomic DNA flanked by telomere-like sequences, and are different between two genetic background. We establish a model that an ancestral duplication of a donor TALT region to a proximal telomere region forms a genomic reservoir ready to be incorporated into telomeres on ALT activation.

  3. Genome-wide association analysis identifies 13 new risk loci for schizophrenia.

    PubMed

    Ripke, Stephan; O'Dushlaine, Colm; Chambert, Kimberly; Moran, Jennifer L; Kähler, Anna K; Akterin, Susanne; Bergen, Sarah E; Collins, Ann L; Crowley, James J; Fromer, Menachem; Kim, Yunjung; Lee, Sang Hong; Magnusson, Patrik K E; Sanchez, Nick; Stahl, Eli A; Williams, Stephanie; Wray, Naomi R; Xia, Kai; Bettella, Francesco; Borglum, Anders D; Bulik-Sullivan, Brendan K; Cormican, Paul; Craddock, Nick; de Leeuw, Christiaan; Durmishi, Naser; Gill, Michael; Golimbet, Vera; Hamshere, Marian L; Holmans, Peter; Hougaard, David M; Kendler, Kenneth S; Lin, Kuang; Morris, Derek W; Mors, Ole; Mortensen, Preben B; Neale, Benjamin M; O'Neill, Francis A; Owen, Michael J; Milovancevic, Milica Pejovic; Posthuma, Danielle; Powell, John; Richards, Alexander L; Riley, Brien P; Ruderfer, Douglas; Rujescu, Dan; Sigurdsson, Engilbert; Silagadze, Teimuraz; Smit, August B; Stefansson, Hreinn; Steinberg, Stacy; Suvisaari, Jaana; Tosato, Sarah; Verhage, Matthijs; Walters, James T; Levinson, Douglas F; Gejman, Pablo V; Kendler, Kenneth S; Laurent, Claudine; Mowry, Bryan J; O'Donovan, Michael C; Owen, Michael J; Pulver, Ann E; Riley, Brien P; Schwab, Sibylle G; Wildenauer, Dieter B; Dudbridge, Frank; Holmans, Peter; Shi, Jianxin; Albus, Margot; Alexander, Madeline; Campion, Dominique; Cohen, David; Dikeos, Dimitris; Duan, Jubao; Eichhammer, Peter; Godard, Stephanie; Hansen, Mark; Lerer, F Bernard; Liang, Kung-Yee; Maier, Wolfgang; Mallet, Jacques; Nertney, Deborah A; Nestadt, Gerald; Norton, Nadine; O'Neill, Francis A; Papadimitriou, George N; Ribble, Robert; Sanders, Alan R; Silverman, Jeremy M; Walsh, Dermot; Williams, Nigel M; Wormley, Brandon; Arranz, Maria J; Bakker, Steven; Bender, Stephan; Bramon, Elvira; Collier, David; Crespo-Facorro, Benedicto; Hall, Jeremy; Iyegbe, Conrad; Jablensky, Assen; Kahn, Rene S; Kalaydjieva, Luba; Lawrie, Stephen; Lewis, Cathryn M; Lin, Kuang; Linszen, Don H; Mata, Ignacio; McIntosh, Andrew; Murray, Robin M; Ophoff, Roel A; Powell, John; Rujescu, Dan; Van Os, Jim; Walshe, Muriel; Weisbrod, Matthias; Wiersma, Durk; Donnelly, Peter; Barroso, Ines; Blackwell, Jenefer M; Bramon, Elvira; Brown, Matthew A; Casas, Juan P; Corvin, Aiden P; Deloukas, Panos; Duncanson, Audrey; Jankowski, Janusz; Markus, Hugh S; Mathew, Christopher G; Palmer, Colin N A; Plomin, Robert; Rautanen, Anna; Sawcer, Stephen J; Trembath, Richard C; Viswanathan, Ananth C; Wood, Nicholas W; Spencer, Chris C A; Band, Gavin; Bellenguez, Céline; Freeman, Colin; Hellenthal, Garrett; Giannoulatou, Eleni; Pirinen, Matti; Pearson, Richard D; Strange, Amy; Su, Zhan; Vukcevic, Damjan; Donnelly, Peter; Langford, Cordelia; Hunt, Sarah E; Edkins, Sarah; Gwilliam, Rhian; Blackburn, Hannah; Bumpstead, Suzannah J; Dronov, Serge; Gillman, Matthew; Gray, Emma; Hammond, Naomi; Jayakumar, Alagurevathi; McCann, Owen T; Liddle, Jennifer; Potter, Simon C; Ravindrarajah, Radhi; Ricketts, Michelle; Tashakkori-Ghanbaria, Avazeh; Waller, Matthew J; Weston, Paul; Widaa, Sara; Whittaker, Pamela; Barroso, Ines; Deloukas, Panos; Mathew, Christopher G; Blackwell, Jenefer M; Brown, Matthew A; Corvin, Aiden P; McCarthy, Mark I; Spencer, Chris C A; Bramon, Elvira; Corvin, Aiden P; O'Donovan, Michael C; Stefansson, Kari; Scolnick, Edward; Purcell, Shaun; McCarroll, Steven A; Sklar, Pamela; Hultman, Christina M; Sullivan, Patrick F

    2013-10-01

    Schizophrenia is an idiopathic mental disorder with a heritable component and a substantial public health impact. We conducted a multi-stage genome-wide association study (GWAS) for schizophrenia beginning with a Swedish national sample (5,001 cases and 6,243 controls) followed by meta-analysis with previous schizophrenia GWAS (8,832 cases and 12,067 controls) and finally by replication of SNPs in 168 genomic regions in independent samples (7,413 cases, 19,762 controls and 581 parent-offspring trios). We identified 22 loci associated at genome-wide significance; 13 of these are new, and 1 was previously implicated in bipolar disorder. Examination of candidate genes at these loci suggests the involvement of neuronal calcium signaling. We estimate that 8,300 independent, mostly common SNPs (95% credible interval of 6,300-10,200 SNPs) contribute to risk for schizophrenia and that these collectively account for at least 32% of the variance in liability. Common genetic variation has an important role in the etiology of schizophrenia, and larger studies will allow more detailed understanding of this disorder.

  4. Comparative genomics of Lupinus angustifolius gene-rich regions: BAC library exploration, genetic mapping and cytogenetics

    PubMed Central

    2013-01-01

    Background The narrow-leafed lupin, Lupinus angustifolius L., is a grain legume species with a relatively compact genome. The species has 2n = 40 chromosomes and its genome size is 960 Mbp/1C. During the last decade, L. angustifolius genomic studies have achieved several milestones, such as molecular-marker development, linkage maps, and bacterial artificial chromosome (BAC) libraries. Here, these resources were integratively used to identify and sequence two gene-rich regions (GRRs) of the genome. Results The genome was screened with a probe representing the sequence of a microsatellite fragment length polymorphism (MFLP) marker linked to Phomopsis stem blight resistance. BAC clones selected by hybridization were subjected to restriction fingerprinting and contig assembly, and 232 BAC-ends were sequenced and annotated. BAC fluorescence in situ hybridization (BAC-FISH) identified eight single-locus clones. Based on physical mapping, cytogenetic localization, and BAC-end annotation, five clones were chosen for sequencing. Within the sequences of clones that hybridized in FISH to a single-locus, two large GRRs were identified. The GRRs showed strong and conserved synteny to Glycine max duplicated genome regions, illustrated by both identical gene order and parallel orientation. In contrast, in the clones with dispersed FISH signals, more than one-third of sequences were transposable elements. Sequenced, single-locus clones were used to develop 12 genetic markers, increasing the number of L. angustifolius chromosomes linked to appropriate linkage groups by five pairs. Conclusions In general, probes originating from MFLP sequences can assist genome screening and gene discovery. However, such probes are not useful for positional cloning, because they tend to hybridize to numerous loci. GRRs identified in L. angustifolius contained a low number of interspersed repeats and had a high level of synteny to the genome of the model legume G. max. Our results showed that

  5. Genome-wide association study of body weight in Australian Merino sheep reveals an orthologous region on OAR6 to human and bovine genomic regions affecting height and weight.

    PubMed

    Al-Mamun, Hawlader A; Kwan, Paul; Clark, Samuel A; Ferdosi, Mohammad H; Tellam, Ross; Gondro, Cedric

    2015-08-14

    Body weight (BW) is an important trait for meat production in sheep. Although over the past few years, numerous quantitative trait loci (QTL) have been detected for production traits in cattle, few QTL studies have been reported for sheep, with even fewer on meat production traits. Our objective was to perform a genome-wide association study (GWAS) with the medium-density Illumina Ovine SNP50 BeadChip to identify genomic regions and corresponding haplotypes associated with BW in Australian Merino sheep. A total of 1781 Australian Merino sheep were genotyped using the medium-density Illumina Ovine SNP50 BeadChip. Among the 53 862 single nucleotide polymorphisms (SNPs) on this array, 48 640 were used to perform a GWAS using a linear mixed model approach. Genotypes were phased with hsphase; to estimate SNP haplotype effects, linkage disequilibrium blocks were identified in the detected QTL region. Thirty-nine SNPs were associated with BW at a Bonferroni-corrected genome-wide significance threshold of 1 %. One region on sheep (Ovis aries) chromosome 6 (OAR6) between 36.15 and 38.56 Mb, included 13 significant SNPs that were associated with BW; the most significant SNP was OAR6_41936490.1 (P = 2.37 × 10(-16)) at 37.69 Mb with an allele substitution effect of 2.12 kg, which corresponds to 0.248 phenotypic standard deviations for BW. The region that surrounds this association signal on OAR6 contains three genes: leucine aminopeptidase 3 (LAP3), which is involved in the processing of the oxytocin precursor; NCAPG non-SMC condensin I complex, subunit G (NCAPG), which is associated with foetal growth and carcass size in cattle; and ligand dependent nuclear receptor corepressor-like (LCORL), which is associated with height in humans and cattle. The GWAS analysis detected 39 SNPs associated with BW in sheep and a major QTL region was identified on OAR6. In several other mammalian species, regions that are syntenic with this region have been found to be associated with body

  6. Whole genome analysis using Bayesian models to identify candidate genes for immune response to vaccination

    USDA-ARS?s Scientific Manuscript database

    This study identified genome regions associated with variation in immune response to vaccination against bovine viral diarrhea virus type 2 (BVDV 2) in American Angus calves. Calves were born in the spring or fall of 2006-2008 (n = 620). Two doses of modified live vaccine were administered three wee...

  7. Pan-genome analysis of human gastric pathogen H. pylori: comparative genomics and pathogenomics approaches to identify regions associated with pathogenicity and prediction of potential core therapeutic targets.

    PubMed

    Ali, Amjad; Naz, Anam; Soares, Siomar C; Bakhtiar, Marriam; Tiwari, Sandeep; Hassan, Syed S; Hanan, Fazal; Ramos, Rommel; Pereira, Ulisses; Barh, Debmalya; Figueiredo, Henrique César Pereira; Ussery, David W; Miyoshi, Anderson; Silva, Artur; Azevedo, Vasco

    2015-01-01

    Helicobacter pylori is a human gastric pathogen implicated as the major cause of peptic ulcer and second leading cause of gastric cancer (~70%) around the world. Conversely, an increased resistance to antibiotics and hindrances in the development of vaccines against H. pylori are observed. Pan-genome analyses of the global representative H. pylori isolates consisting of 39 complete genomes are presented in this paper. Phylogenetic analyses have revealed close relationships among geographically diverse strains of H. pylori. The conservation among these genomes was further analyzed by pan-genome approach; the predicted conserved gene families (1,193) constitute ~77% of the average H. pylori genome and 45% of the global gene repertoire of the species. Reverse vaccinology strategies have been adopted to identify and narrow down the potential core-immunogenic candidates. Total of 28 nonhost homolog proteins were characterized as universal therapeutic targets against H. pylori based on their functional annotation and protein-protein interaction. Finally, pathogenomics and genome plasticity analysis revealed 3 highly conserved and 2 highly variable putative pathogenicity islands in all of the H. pylori genomes been analyzed.

  8. A survey of single nucleotide polymorphisms identified from whole-genome sequencing and their functional effect in the porcine genome.

    PubMed

    Keel, B N; Nonneman, D J; Rohrer, G A

    2017-08-01

    Genetic variants detected from sequence have been used to successfully identify causal variants and map complex traits in several organisms. High and moderate impact variants, those expected to alter or disrupt the protein coded by a gene and those that regulate protein production, likely have a more significant effect on phenotypic variation than do other types of genetic variants. Hence, a comprehensive list of these functional variants would be of considerable interest in swine genomic studies, particularly those targeting fertility and production traits. Whole-genome sequence was obtained from 72 of the founders of an intensely phenotyped experimental swine herd at the U.S. Meat Animal Research Center (USMARC). These animals included all 24 of the founding boars (12 Duroc and 12 Landrace) and 48 Yorkshire-Landrace composite sows. Sequence reads were mapped to the Sscrofa10.2 genome build, resulting in a mean of 6.1 fold (×) coverage per genome. A total of 22 342 915 high confidence SNPs were identified from the sequenced genomes. These included 21 million previously reported SNPs and 79% of the 62 163 SNPs on the PorcineSNP60 BeadChip assay. Variation was detected in the coding sequence or untranslated regions (UTRs) of 87.8% of the genes in the porcine genome: loss-of-function variants were predicted in 504 genes, 10 202 genes contained nonsynonymous variants, 10 773 had variation in UTRs and 13 010 genes contained synonymous variants. Approximately 139 000 SNPs were classified as loss-of-function, nonsynonymous or regulatory, which suggests that over 99% of the variation detected in our pigs could potentially be ignored, allowing us to focus on a much smaller number of functional SNPs during future analyses. Published 2017. This article is a U.S. Government work and is in the public domain in the USA.

  9. A Cryptosporidium parvum genomic region encoding hemolytic activity.

    PubMed Central

    Steele, M I; Kuhls, T L; Nida, K; Meka, C S; Halabi, I M; Mosier, D A; Elliott, W; Crawford, D L; Greenfield, R A

    1995-01-01

    Successful parasitization by Cryptosporidium parvum requires multiple disruptions in both host and protozoan cell membranes as cryptosporidial sporozoites invade intestinal epithelial cells and subsequently develop into asexual and sexual life stages. To identify cryptosporidial proteins which may play a role in these membrane alterations, hemolytic activity was used as a marker to screen a C. parvum genomic expression library. A stable hemolytic clone (H4) containing a 5.5-kb cryptosporidial genomic fragment was identified. The hemolytic activity encoded on H4 was mapped to a 1-kb region that contained a complete 690-bp open reading frame (hemA) ending in a common stop codon. A 21-kDa plasmid-encoded recombinant protein was expressed in maxicells containing H4. Subclones of H4 which contained only a portion of hemA did not induce hemolysis on blood agar or promote expression of the recombinant protein in maxicells. Reverse transcriptase-mediated PCR analysis of total RNA isolated from excysted sporozoites and the intestines of infected adult mice with severe combined immunodeficiency demonstrated that hemA is actively transcribed during the cryptosporidial life cycle. PMID:7558289

  10. Identification of genomic regions contributing to etoposide-induced cytotoxicity

    PubMed Central

    Bleibel, Wasim K.; Duan, Shiwei; Huang, R. Stephanie; Kistner, Emily O.; Shukla, Sunita J.; Wu, Xiaolin; Badner, Judith A.

    2009-01-01

    Etoposide is routinely used in combination based chemotherapy for testicular cancer and small-cell lung cancer; however, myelosuppression, therapy-related leukemia and neurotoxicity limit its utility. To determine the genetic contribution to cellular sensitivity to etoposide, we evaluated cell growth inhibition in Centre d’ Etude du Polymorphisme Humain lymphoblastoid cell lines from 24 multi-generational pedigrees (321 samples) following treatment with 0.02–2.5 µM etoposide for 72 h. Heritability analysis showed that genetic variation contributes significantly to the cytotoxic phenotypes (h2 = 0.17–0.25, P = 4.9 × 10−5−7.3 × 10−3). Whole genome linkage scans uncovered 8 regions with peak LOD scores ranging from 1.57 to 2.55, with the most significant signals being found on chromosome 5 (LOD = 2.55) and chromosome 6 (LOD = 2.52). Linkage-directed association was performed on a subset of HapMap samples within the pedigrees to find 22 SNPs significantly associated with etoposide cytotoxicity at one or more treatment concentrations. UVRAG, a DNA repair gene, SEMA5A, SLC7A6 and PRMT7 are implicated from these unbiased studies. Our findings suggest that susceptibility to etoposide-induced cytotoxicity is heritable and using an integrated genomics approach we identified both genomic regions and SNPs associated with the cytotoxic phenotypes. PMID:19089452

  11. Identification of genomic regions contributing to etoposide-induced cytotoxicity.

    PubMed

    Bleibel, Wasim K; Duan, Shiwei; Huang, R Stephanie; Kistner, Emily O; Shukla, Sunita J; Wu, Xiaolin; Badner, Judith A; Dolan, M Eileen

    2009-03-01

    Etoposide is routinely used in combination-based chemotherapy for testicular cancer and small-cell lung cancer; however, myelosuppression, therapy-related leukemia and neurotoxicity limit its utility. To determine the genetic contribution to cellular sensitivity to etoposide, we evaluated cell growth inhibition in Centre d' Etude du Polymorphisme Humain lymphoblastoid cell lines from 24 multi-generational pedigrees (321 samples) following treatment with 0.02-2.5 microM etoposide for 72 h. Heritability analysis showed that genetic variation contributes significantly to the cytotoxic phenotypes (h (2) = 0.17-0.25, P = 4.9 x 10(-5)-7.3 x 10(-3)). Whole genome linkage scans uncovered 8 regions with peak LOD scores ranging from 1.57 to 2.55, with the most significant signals being found on chromosome 5 (LOD = 2.55) and chromosome 6 (LOD = 2.52). Linkage-directed association was performed on a subset of HapMap samples within the pedigrees to find 22 SNPs significantly associated with etoposide cytotoxicity at one or more treatment concentrations. UVRAG, a DNA repair gene, SEMA5A, SLC7A6 and PRMT7 are implicated from these unbiased studies. Our findings suggest that susceptibility to etoposide-induced cytotoxicity is heritable and using an integrated genomics approach we identified both genomic regions and SNPs associated with the cytotoxic phenotypes.

  12. Use of deep whole-genome sequencing data to identify structure risk variants in breast cancer susceptibility genes.

    PubMed

    Guo, Xingyi; Shi, Jiajun; Cai, Qiuyin; Shu, Xiao-Ou; He, Jing; Wen, Wanqing; Allen, Jamie; Pharoah, Paul; Dunning, Alison; Hunter, David J; Kraft, Peter; Easton, Douglas F; Zheng, Wei; Long, Jirong

    2018-03-01

    Functional disruptions of susceptibility genes by large genomic structure variant (SV) deletions in germlines are known to be associated with cancer risk. However, few studies have been conducted to systematically search for SV deletions in breast cancer susceptibility genes. We analysed deep (> 30x) whole-genome sequencing (WGS) data generated in blood samples from 128 breast cancer patients of Asian and European descent with either a strong family history of breast cancer or early cancer onset disease. To identify SV deletions in known or suspected breast cancer susceptibility genes, we used multiple SV calling tools including Genome STRiP, Delly, Manta, BreakDancer and Pindel. SV deletions were detected by at least three of these bioinformatics tools in five genes. Specifically, we identified heterozygous deletions covering a fraction of the coding regions of BRCA1 (with approximately 80kb in two patients), and TP53 genes (with ∼1.6 kb in two patients), and of intronic regions (∼1 kb) of the PALB2 (one patient), PTEN (three patients) and RAD51C genes (one patient). We confirmed the presence of these deletions using real-time quantitative PCR (qPCR). Our study identified novel SV deletions in breast cancer susceptibility genes and the identification of such SV deletions may improve clinical testing.

  13. Genome-wide association analysis identifies a meningioma risk locus at 11p15.5.

    PubMed

    Claus, Elizabeth B; Cornish, Alex J; Broderick, Peter; Schildkraut, Joellen M; Dobbins, Sara E; Holroyd, Amy; Calvocoressi, Lisa; Lu, Lingeng; Hansen, Helen M; Smirnov, Ivan; Walsh, Kyle M; Schramm, Johannes; Hoffmann, Per; Nöthen, Markus M; Jöckel, Karl-Heinz; Swerdlow, Anthony; Larsen, Signe Benzon; Johansen, Christoffer; Simon, Matthias; Bondy, Melissa; Wrensch, Margaret; Houlston, Richard; Wiemels, Joseph L

    2018-05-12

    Meningioma are adult brain tumors originating in the meningeal coverings of the brain and spinal cord, with significant heritable basis. Genome-wide association studies (GWAS) have previously identified only a single risk locus for meningioma, at 10p12.31. To identify a susceptibility locus for meningioma, we conducted a meta-analysis of two GWAS, imputed using a merged reference panel of 1,000 Genomes and UK10K data, with validation in two independent sample series totaling 2,138 cases and 12,081 controls. We identified a new susceptibility locus for meningioma at 11p15.5 (rs2686876, odds ratio = 1.44, P = 9.86 × 10-9). A number of genes localize to the region of linkage disequilibrium encompassing rs2686876, including RIC8A, which plays a central role in the development of neural crest-derived structures, such as the meninges. This finding advances our understanding of the genetic basis of meningioma development and provides additional support for a polygenic model of meningioma.

  14. Identifying signatures of natural selection in Tibetan and Andean populations using dense genome scan data.

    PubMed

    Bigham, Abigail; Bauchet, Marc; Pinto, Dalila; Mao, Xianyun; Akey, Joshua M; Mei, Rui; Scherer, Stephen W; Julian, Colleen G; Wilson, Megan J; López Herráez, David; Brutsaert, Tom; Parra, Esteban J; Moore, Lorna G; Shriver, Mark D

    2010-09-09

    High-altitude hypoxia (reduced inspired oxygen tension due to decreased barometric pressure) exerts severe physiological stress on the human body. Two high-altitude regions where humans have lived for millennia are the Andean Altiplano and the Tibetan Plateau. Populations living in these regions exhibit unique circulatory, respiratory, and hematological adaptations to life at high altitude. Although these responses have been well characterized physiologically, their underlying genetic basis remains unknown. We performed a genome scan to identify genes showing evidence of adaptation to hypoxia. We looked across each chromosome to identify genomic regions with previously unknown function with respect to altitude phenotypes. In addition, groups of genes functioning in oxygen metabolism and sensing were examined to test the hypothesis that particular pathways have been involved in genetic adaptation to altitude. Applying four population genetic statistics commonly used for detecting signatures of natural selection, we identified selection-nominated candidate genes and gene regions in these two populations (Andeans and Tibetans) separately. The Tibetan and Andean patterns of genetic adaptation are largely distinct from one another, with both populations showing evidence of positive natural selection in different genes or gene regions. Interestingly, one gene previously known to be important in cellular oxygen sensing, EGLN1 (also known as PHD2), shows evidence of positive selection in both Tibetans and Andeans. However, the pattern of variation for this gene differs between the two populations. Our results indicate that several key HIF-regulatory and targeted genes are responsible for adaptation to high altitude in Andeans and Tibetans, and several different chromosomal regions are implicated in the putative response to selection. These data suggest a genetic role in high-altitude adaption and provide a basis for future genotype/phenotype association studies necessary

  15. Identifying Signatures of Natural Selection in Tibetan and Andean Populations Using Dense Genome Scan Data

    PubMed Central

    Bigham, Abigail; Bauchet, Marc; Pinto, Dalila; Mao, Xianyun; Akey, Joshua M.; Mei, Rui; Scherer, Stephen W.; Julian, Colleen G.; Wilson, Megan J.; López Herráez, David; Brutsaert, Tom; Parra, Esteban J.; Moore, Lorna G.; Shriver, Mark D.

    2010-01-01

    High-altitude hypoxia (reduced inspired oxygen tension due to decreased barometric pressure) exerts severe physiological stress on the human body. Two high-altitude regions where humans have lived for millennia are the Andean Altiplano and the Tibetan Plateau. Populations living in these regions exhibit unique circulatory, respiratory, and hematological adaptations to life at high altitude. Although these responses have been well characterized physiologically, their underlying genetic basis remains unknown. We performed a genome scan to identify genes showing evidence of adaptation to hypoxia. We looked across each chromosome to identify genomic regions with previously unknown function with respect to altitude phenotypes. In addition, groups of genes functioning in oxygen metabolism and sensing were examined to test the hypothesis that particular pathways have been involved in genetic adaptation to altitude. Applying four population genetic statistics commonly used for detecting signatures of natural selection, we identified selection-nominated candidate genes and gene regions in these two populations (Andeans and Tibetans) separately. The Tibetan and Andean patterns of genetic adaptation are largely distinct from one another, with both populations showing evidence of positive natural selection in different genes or gene regions. Interestingly, one gene previously known to be important in cellular oxygen sensing, EGLN1 (also known as PHD2), shows evidence of positive selection in both Tibetans and Andeans. However, the pattern of variation for this gene differs between the two populations. Our results indicate that several key HIF-regulatory and targeted genes are responsible for adaptation to high altitude in Andeans and Tibetans, and several different chromosomal regions are implicated in the putative response to selection. These data suggest a genetic role in high-altitude adaption and provide a basis for future genotype/phenotype association studies necessary

  16. In silico screening of the chicken genome for overlaps between genomic regions: microRNA genes, coding and non-coding transcriptional units, QTL, and genetic variations.

    PubMed

    Zorc, Minja; Kunej, Tanja

    2016-05-01

    MicroRNAs (miRNAs) are a class of non-coding RNAs involved in posttranscriptional regulation of target genes. Regulation requires complementarity between target mRNA and the mature miRNA seed region, responsible for their recognition and binding. It has been estimated that each miRNA targets approximately 200 genes, and genetic variability of miRNA genes has been reported to affect phenotypic variability and disease susceptibility in humans, livestock species, and model organisms. Polymorphisms in miRNA genes could therefore represent biomarkers for phenotypic traits in livestock animals. In our previous study, we collected polymorphisms within miRNA genes in chicken. In the present study, we identified miRNA-related genomic overlaps to prioritize genomic regions of interest for further functional studies and biomarker discovery. Overlapping genomic regions in chicken were analyzed using the following bioinformatics tools and databases: miRNA SNiPer, Ensembl, miRBase, NCBI Blast, and QTLdb. Out of 740 known pre-miRNA genes, 263 (35.5 %) contain polymorphisms; among them, 35 contain more than three polymorphisms The most polymorphic miRNA genes in chicken are gga-miR-6662, containing 23 single nucleotide polymorphisms (SNPs) within the pre-miRNA region, including five consecutive SNPs, and gga-miR-6688, containing ten polymorphisms including three consecutive polymorphisms. Several miRNA-related genomic hotspots have been revealed in chicken genome; polymorphic miRNA genes are located within protein-coding and/or non-coding transcription units and quantitative trait loci (QTL) associated with production traits. The present study includes the first description of an exonic miRNA in a chicken genome, an overlap between the miRNA gene and the exon of the protein-coding gene (gga-miR-6578/HADHB), and the first report of a missense polymorphism located within a mature miRNA seed region. Identified miRNA-related genomic hotspots in chicken can serve researchers as a

  17. Is mammalian chromosomal evolution driven by regions of genome fragility?

    PubMed Central

    Ruiz-Herrera, Aurora; Castresana, Jose; Robinson, Terence J

    2006-01-01

    Background A fundamental question in comparative genomics concerns the identification of mechanisms that underpin chromosomal change. In an attempt to shed light on the dynamics of mammalian genome evolution, we analyzed the distribution of syntenic blocks, evolutionary breakpoint regions, and evolutionary breakpoints taken from public databases available for seven eutherian species (mouse, rat, cattle, dog, pig, cat, and horse) and the chicken, and examined these for correspondence with human fragile sites and tandem repeats. Results Our results confirm previous investigations that showed the presence of chromosomal regions in the human genome that have been repeatedly used as illustrated by a high breakpoint accumulation in certain chromosomes and chromosomal bands. We show, however, that there is a striking correspondence between fragile site location, the positions of evolutionary breakpoints, and the distribution of tandem repeats throughout the human genome, which similarly reflect a non-uniform pattern of occurrence. Conclusion These observations provide further evidence that certain chromosomal regions in the human genome have been repeatedly used in the evolutionary process. As a consequence, the genome is a composite of fragile regions prone to reorganization that have been conserved in different lineages, and genomic tracts that do not exhibit the same levels of evolutionary plasticity. PMID:17156441

  18. Comparative genomics of pathogenic lineages of Vibrio nigripulchritudo identifies virulence-associated traits

    PubMed Central

    Goudenège, David; Labreuche, Yannick; Krin, Evelyne; Ansquer, Dominique; Mangenot, Sophie; Calteau, Alexandra; Médigue, Claudine; Mazel, Didier; Polz, Martin F; Le Roux, Frédérique

    2013-01-01

    Vibrio nigripulchritudo is an emerging pathogen of farmed shrimp in New Caledonia and other regions in the Indo-Pacific. The molecular determinants of V. nigripulchritudo pathogenicity are unknown; however, molecular epidemiological studies have suggested that pathogenicity is linked to particular lineages. Here, we performed high-throughput sequencing-based comparative genome analysis of 16 V. nigripulchritudo strains to explore the genomic diversity and evolutionary history of pathogen-containing lineages and to identify pathogen-specific genetic elements. Our phylogenetic analysis revealed three pathogen-containing V. nigripulchritudo clades, including two clades previously identified from New Caledonia and one novel clade comprising putatively pathogenic isolates from septicemic shrimp in Madagascar. The similar genetic distance between the three clades indicates that they have diverged from an ancestral population roughly at the same time and recombination analysis indicates that these genomes have, in the past, shared a common gene pool and exchanged genes. As each contemporary lineage is comprised of nearly identical strains, comparative genomics allowed differentiation of genetic elements specific to shrimp pathogenesis of varying severity. Notably, only a large plasmid present in all highly pathogenic (HP) strains encodes a toxin. Although less/non-pathogenic strains contain related plasmids, these are differentiated by a putative toxin locus. Expression of this gene by a non-pathogenic V. nigripulchritudo strain resulted in production of toxic culture supernatant, normally an exclusive feature of HP strains. Thus, this protein, here termed ‘nigritoxin', is implicated to an extent that remains to be precisely determined in the toxicity of V. nigripulchritudo. PMID:23739050

  19. Genome skimming identifies polymorphism in tern populations and species

    PubMed Central

    2012-01-01

    Background Terns (Charadriiformes: Sterninae) are a lineage of cosmopolitan shorebirds with a disputed evolutionary history that comprises several species of conservation concern. As a non-model system in genetics, previous study has left most of the nuclear genome unexplored, and population-level studies are limited to only 15% of the world's species of terns and noddies. Screening of polymorphic nuclear sequence markers is needed to enhance genetic resolution because of supposed low mitochondrial mutation rate, documentation of nuclear insertion of hypervariable mitochondrial regions, and limited success of microsatellite enrichment in terns. Here, we investigated the phylogenetic and population genetic utility for terns and relatives of a variety of nuclear markers previously developed for other birds and spanning the nuclear genome. Markers displaying a variety of mutation rates from both the nuclear and mitochondrial genome were tested and prioritized according to optimal cross-species amplification and extent of genetic polymorphism between (1) the main tern clades and (2) individual Royal Terns (Thalasseus maxima) breeding on the US East Coast. Results Results from this genome skimming effort yielded four new nuclear sequence-based markers for tern phylogenetics and 11 intra-specific polymorphic markers. Further, comparison between the two genomes indicated a phylogenetic conflict at the base of terns, involving the inclusion (mitochondrial) or exclusion (nuclear) of the Angel Tern (Gygis alba). Although limited mitochondrial variation was confirmed, both nuclear markers and a short tandem repeat in the mitochondrial control region indicated the presence of considerable genetic variation in Royal Terns at a regional scale. Conclusions These data document the value of intronic markers to the study of terns and allies. We expect that these and additional markers attained through next-generation sequencing methods will accurately map the genetic origin and

  20. Genomic organization of the canine herpesvirus US region.

    PubMed

    Haanes, E J; Tomlinson, C C

    1998-02-01

    Canine herpesvirus (CHV) is an alpha-herpesvirus of limited pathogenicity in healthy adult dogs and infectivity of the virus appears to be largely limited to cells of canine origin. CHV's low virulence and species specificity make it an attractive candidate for a recombinant vaccine vector to protect dogs against a variety of pathogens. As part of the analysis of the CHV genome, the authors determined the complete nucleotide sequence of the CHV US region as well as portions of the flanking inverted repeats. Seven full open reading frames (ORFs) encoding proteins larger than 100 amino acids were identified within, or partially within the CHV US: cUS2, cUS3, cUS4, cUS6, cUS7, cUS8 and cUS9; which are homologs of the herpes simplex virus type-1 US2; protein kinase; gG, gD, gI, gE; and US9 genes, respectively. An eighth ORF was identified in the inverted repeat region, cIR6, a homolog of the equine herpesvirus type-1 IR6 gene. The authors identified and mapped most of the major transcripts for the predicted CHV US ORFs by Northern analysis.

  1. Complete genome sequence of a novel extrachromosomal virus-like element identified in planarian Girardia tigrina

    PubMed Central

    Rebrikov, Denis V; Bulina, Maria E; Bogdanova, Ekaterina A; Vagner, Loura L; Lukyanov, Sergey A

    2002-01-01

    Background Freshwater planarians are widely used as models for investigation of pattern formation and studies on genetic variation in populations. Despite extensive information on the biology and genetics of planaria, the occurrence and distribution of viruses in these animals remains an unexplored area of research. Results Using a combination of Suppression Subtractive Hybridization (SSH) and Mirror Orientation Selection (MOS), we compared the genomes of two strains of freshwater planarian, Girardia tigrina. The novel extrachromosomal DNA-containing virus-like element denoted PEVE (Planarian Extrachromosomal Virus-like Element) was identified in one planarian strain. The PEVE genome (about 7.5 kb) consists of two unique regions (Ul and Us) flanked by inverted repeats. Sequence analyses reveal that PEVE comprises two helicase-like sequences in the genome, of which the first is a homolog of a circoviral replication initiator protein (Rep), and the second is similar to the papillomavirus E1 helicase domain. PEVE genome exists in at least two variant forms with different arrangements of single-stranded and double-stranded DNA stretches that correspond to the Us and Ul regions. Using PCR analysis and whole-mount in situ hybridization, we characterized PEVE distribution and expression in the planarian body. Conclusions PEVE is the first viral element identified in free-living flatworms. This element differs from all known viruses and viral elements, and comprises two potential helicases that are homologous to proteins from distant viral phyla. PEVE is unevenly distributed in the worm body, and is detected in specific parenchyma cells. PMID:12065025

  2. The genomic landscape at a late stage of stickleback speciation: High genomic divergence interspersed by small localized regions of introgression.

    PubMed

    Ravinet, Mark; Yoshida, Kohta; Shigenobu, Shuji; Toyoda, Atsushi; Fujiyama, Asao; Kitano, Jun

    2018-05-01

    Speciation is a continuous process and analysis of species pairs at different stages of divergence provides insight into how it unfolds. Previous genomic studies on young species pairs have revealed peaks of divergence and heterogeneous genomic differentiation. Yet less known is how localised peaks of differentiation progress to genome-wide divergence during the later stages of speciation in the presence of persistent gene flow. Spanning the speciation continuum, stickleback species pairs are ideal for investigating how genomic divergence builds up during speciation. However, attention has largely focused on young postglacial species pairs, with little knowledge of the genomic signatures of divergence and introgression in older stickleback systems. The Japanese stickleback species pair, composed of the Pacific Ocean three-spined stickleback (Gasterosteus aculeatus) and the Japan Sea stickleback (G. nipponicus), which co-occur in the Japanese islands, is at a late stage of speciation. Divergence likely started well before the end of the last glacial period and crosses between Japan Sea females and Pacific Ocean males result in hybrid male sterility. Here we use coalescent analyses and Approximate Bayesian Computation to show that the two species split approximately 0.68-1 million years ago but that they have continued to exchange genes at a low rate throughout divergence. Population genomic data revealed that, despite gene flow, a high level of genomic differentiation is maintained across the majority of the genome. However, we identified multiple, small regions of introgression, occurring mainly in areas of low recombination rate. Our results demonstrate that a high level of genome-wide divergence can establish in the face of persistent introgression and that gene flow can be localized to small genomic regions at the later stages of speciation with gene flow.

  3. New families of human regulatory RNA structures identified by comparative analysis of vertebrate genomes.

    PubMed

    Parker, Brian J; Moltke, Ida; Roth, Adam; Washietl, Stefan; Wen, Jiayu; Kellis, Manolis; Breaker, Ronald; Pedersen, Jakob Skou

    2011-11-01

    Regulatory RNA structures are often members of families with multiple paralogous instances across the genome. Family members share functional and structural properties, which allow them to be studied as a whole, facilitating both bioinformatic and experimental characterization. We have developed a comparative method, EvoFam, for genome-wide identification of families of regulatory RNA structures, based on primary sequence and secondary structure similarity. We apply EvoFam to a 41-way genomic vertebrate alignment. Genome-wide, we identify 220 human, high-confidence families outside protein-coding regions comprising 725 individual structures, including 48 families with known structural RNA elements. Known families identified include both noncoding RNAs, e.g., miRNAs and the recently identified MALAT1/MEN β lincRNA family; and cis-regulatory structures, e.g., iron-responsive elements. We also identify tens of new families supported by strong evolutionary evidence and other statistical evidence, such as GO term enrichments. For some of these, detailed analysis has led to the formulation of specific functional hypotheses. Examples include two hypothesized auto-regulatory feedback mechanisms: one involving six long hairpins in the 3'-UTR of MAT2A, a key metabolic gene that produces the primary human methyl donor S-adenosylmethionine; the other involving a tRNA-like structure in the intron of the tRNA maturation gene POP1. We experimentally validate the predicted MAT2A structures. Finally, we identify potential new regulatory networks, including large families of short hairpins enriched in immunity-related genes, e.g., TNF, FOS, and CTLA4, which include known transcript destabilizing elements. Our findings exemplify the diversity of post-transcriptional regulation and provide a resource for further characterization of new regulatory mechanisms and families of noncoding RNAs.

  4. Scanning genomic areas under selection sweep and association mapping as tools to identify horticultural important genes in watermelon

    USDA-ARS?s Scientific Manuscript database

    Watermelon (Citrullus lanatus var. lanatus) contains 88% water, sugars, and several important health-related compounds, including lycopene, citrulline, arginine, and glutathione. The current genetic diversity study uses microsatellites with known map positions to identify genomic regions that under...

  5. A statistical framework to predict functional non-coding regions in the human genome through integrated analysis of annotation data.

    PubMed

    Lu, Qiongshi; Hu, Yiming; Sun, Jiehuan; Cheng, Yuwei; Cheung, Kei-Hoi; Zhao, Hongyu

    2015-05-27

    Identifying functional regions in the human genome is a major goal in human genetics. Great efforts have been made to functionally annotate the human genome either through computational predictions, such as genomic conservation, or high-throughput experiments, such as the ENCODE project. These efforts have resulted in a rich collection of functional annotation data of diverse types that need to be jointly analyzed for integrated interpretation and annotation. Here we present GenoCanyon, a whole-genome annotation method that performs unsupervised statistical learning using 22 computational and experimental annotations thereby inferring the functional potential of each position in the human genome. With GenoCanyon, we are able to predict many of the known functional regions. The ability of predicting functional regions as well as its generalizable statistical framework makes GenoCanyon a unique and powerful tool for whole-genome annotation. The GenoCanyon web server is available at http://genocanyon.med.yale.edu.

  6. Meta-genome-wide association studies identify a locus on chromosome 1 and multiple variants in the MHC region for serum C-peptide in type 1 diabetes.

    PubMed

    Roshandel, Delnaz; Gubitosi-Klug, Rose; Bull, Shelley B; Canty, Angelo J; Pezzolesi, Marcus G; King, George L; Keenan, Hillary A; Snell-Bergeon, Janet K; Maahs, David M; Klein, Ronald; Klein, Barbara E K; Orchard, Trevor J; Costacou, Tina; Weedon, Michael N; Oram, Richard A; Paterson, Andrew D

    2018-05-01

    The aim of this study was to identify genetic variants associated with beta cell function in type 1 diabetes, as measured by serum C-peptide levels, through meta-genome-wide association studies (meta-GWAS). We performed a meta-GWAS to combine the results from five studies in type 1 diabetes with cross-sectionally measured stimulated, fasting or random C-peptide levels, including 3479 European participants. The p values across studies were combined, taking into account sample size and direction of effect. We also performed separate meta-GWAS for stimulated (n = 1303), fasting (n = 2019) and random (n = 1497) C-peptide levels. In the meta-GWAS for stimulated/fasting/random C-peptide levels, a SNP on chromosome 1, rs559047 (Chr1:238753916, T>A, minor allele frequency [MAF] 0.24-0.26), was associated with C-peptide (p = 4.13 × 10 -8 ), meeting the genome-wide significance threshold (p < 5 × 10 -8 ). In the same meta-GWAS, a locus in the MHC region (rs9260151) was close to the genome-wide significance threshold (Chr6:29911030, C>T, MAF 0.07-0.10, p = 8.43 × 10 -8 ). In the stimulated C-peptide meta-GWAS, rs61211515 (Chr6:30100975, T/-, MAF 0.17-0.19) in the MHC region was associated with stimulated C-peptide (β [SE] = - 0.39 [0.07], p = 9.72 × 10 -8 ). rs61211515 was also associated with the rate of stimulated C-peptide decline over time in a subset of individuals (n = 258) with annual repeated measures for up to 6 years (p = 0.02). In the meta-GWAS of random C-peptide, another MHC region, SNP rs3135002 (Chr6:32668439, C>A, MAF 0.02-0.06), was associated with C-peptide (p = 3.49 × 10 -8 ). Conditional analyses suggested that the three identified variants in the MHC region were independent of each other. rs9260151 and rs3135002 have been associated with type 1 diabetes, whereas rs559047 and rs61211515 have not been associated with a risk of developing type 1 diabetes. We identified a locus on

  7. Determining coding CpG islands by identifying regions significant for pattern statistics on Markov chains.

    PubMed

    Singer, Meromit; Engström, Alexander; Schönhuth, Alexander; Pachter, Lior

    2011-09-23

    Recent experimental and computational work confirms that CpGs can be unmethylated inside coding exons, thereby showing that codons may be subjected to both genomic and epigenomic constraint. It is therefore of interest to identify coding CpG islands (CCGIs) that are regions inside exons enriched for CpGs. The difficulty in identifying such islands is that coding exons exhibit sequence biases determined by codon usage and constraints that must be taken into account. We present a method for finding CCGIs that showcases a novel approach we have developed for identifying regions of interest that are significant (with respect to a Markov chain) for the counts of any pattern. Our method begins with the exact computation of tail probabilities for the number of CpGs in all regions contained in coding exons, and then applies a greedy algorithm for selecting islands from among the regions. We show that the greedy algorithm provably optimizes a biologically motivated criterion for selecting islands while controlling the false discovery rate. We applied this approach to the human genome (hg18) and annotated CpG islands in coding exons. The statistical criterion we apply to evaluating islands reduces the number of false positives in existing annotations, while our approach to defining islands reveals significant numbers of undiscovered CCGIs in coding exons. Many of these appear to be examples of functional epigenetic specialization in coding exons.

  8. Genome-wide association study in Chinese identifies novel loci for blood pressure and hypertension

    PubMed Central

    Lu, Xiangfeng; Wang, Laiyuan; Lin, Xu; Huang, Jianfeng; Charles Gu, C.; He, Meian; Shen, Hongbing; He, Jiang; Zhu, Jingwen; Li, Huaixing; Hixson, James E.; Wu, Tangchun; Dai, Juncheng; Lu, Ling; Shen, Chong; Chen, Shufeng; He, Lin; Mo, Zengnan; Hao, Yongchen; Mo, Xingbo; Yang, Xueli; Li, Jianxin; Cao, Jie; Chen, Jichun; Fan, Zhongjie; Li, Ying; Zhao, Liancheng; Li, Hongfan; Lu, Fanghong; Yao, Cailiang; Yu, Lin; Xu, Lihua; Mu, Jianjun; Wu, Xianping; Deng, Ying; Hu, Dongsheng; Zhang, Weidong; Ji, Xu; Guo, Dongshuang; Guo, Zhirong; Zhou, Zhengyuan; Yang, Zili; Wang, Renping; Yang, Jun; Zhou, Xiaoyang; Yan, Weili; Sun, Ningling; Gao, Pingjin; Gu, Dongfeng

    2015-01-01

    Hypertension is a common disorder and the leading risk factor for cardiovascular disease and premature deaths worldwide. Genome-wide association studies (GWASs) in the European population have identified multiple chromosomal regions associated with blood pressure, and the identified loci altogether explain only a small fraction of the variance for blood pressure. The differences in environmental exposures and genetic background between Chinese and European populations might suggest potential different pathways of blood pressure regulation. To identify novel genetic variants affecting blood pressure variation, we conducted a meta-analysis of GWASs of blood pressure and hypertension in 11 816 subjects followed by replication studies including 69 146 additional individuals. We identified genome-wide significant (P < 5.0 × 10−8) associations with blood pressure, which included variants at three new loci (CACNA1D, CYP21A2, and MED13L) and a newly discovered variant near SLC4A7. We also replicated 14 previously reported loci, 8 (CASZ1, MOV10, FGF5, CYP17A1, SOX6, ATP2B1, ALDH2, and JAG1) at genome-wide significance, and 6 (FIGN, ULK4, GUCY1A3, HFE, TBX3-TBX5, and TBX3) at a suggestive level of P = 1.81 × 10−3 to 5.16 × 10−8. These findings provide new mechanistic insights into the regulation of blood pressure and potential targets for treatments. PMID:25249183

  9. Genomic and protein expression profiling identifies CDK6 as novel independent prognostic marker in medulloblastoma.

    PubMed

    Mendrzyk, Frank; Radlwimmer, Bernhard; Joos, Stefan; Kokocinski, Felix; Benner, Axel; Stange, Daniel E; Neben, Kai; Fiegler, Heike; Carter, Nigel P; Reifenberger, Guido; Korshunov, Andrey; Lichter, Peter

    2005-12-01

    Medulloblastoma is the most common malignant brain tumor in children. Despite multimodal aggressive treatment, nearly half of the patients die as a result of this tumor. Identification of molecular markers for prognosis and development of novel pathogenesis-based therapies depends crucially on a better understanding of medulloblastoma pathomechanisms. We performed genome-wide analysis of DNA copy number imbalances in 47 medulloblastomas using comparative genomic hybridization to large insert DNA microarrays (matrix-CGH). The expression of selected candidate genes identified by matrix-CGH was analyzed immunohistochemically on tissue microarrays representing medulloblastomas from 189 clinically well-documented patients. To identify novel prognostic markers, genomic findings and protein expression data were correlated to patient survival. Matrix-CGH analysis revealed frequent DNA copy number alterations of several novel candidate regions. Among these, gains at 17q23.2-qter (P < .01) and losses at 17p13.1 to 17p13.3 (P = .04) were significantly correlated to poor prognosis. Within 17q23.2-qter and 7q21.2, two of the most frequently gained chromosomal regions, confined amplicons were identified that contained the PPM1D and CDK6 genes, respectively. Immunohistochemistry revealed strong expression of PPM1D in 148 (88%) of 168 and CDK6 in 50 (30%) of 169 medulloblastomas. Overexpression of CDK6 correlated significantly with poor prognosis (P < .01) and represented an independent prognostic marker of overall survival on multivariate analysis (P = .02). We identified CDK6 as a novel molecular marker that can be determined by immunohistochemistry on routinely processed tissue specimens and may facilitate the prognostic assessment of medulloblastoma patients. Furthermore, increased protein-levels of PPM1D and CDK6 may link the TP53 and RB1 tumor suppressor pathways to medulloblastoma pathomechanisms.

  10. Characterization of genome-wide association-identified variants for atrial fibrillation in African Americans.

    PubMed

    Delaney, Jessica T; Jeff, Janina M; Brown, Nancy J; Pretorius, Mias; Okafor, Henry E; Darbar, Dawood; Roden, Dan M; Crawford, Dana C

    2012-01-01

    Despite a greater burden of risk factors, atrial fibrillation (AF) is less common among African Americans than European-descent populations. Genome-wide association studies (GWAS) for AF in European-descent populations have identified three predominant genomic regions associated with increased risk (1q21, 4q25, and 16q22). The contribution of these loci to AF risk in African American is unknown. We studied 73 African Americans with AF from the Vanderbilt-Meharry AF registry and 71 African American controls, with no history of AF including after cardiac surgery. Tests of association were performed for 148 SNPs across the three regions associated with AF, and 22 SNPs were significantly associated with AF (P<0.05). The SNPs with the strongest associations in African Americans were both different from the index SNPs identified in European-descent populations and independent from the index European-descent population SNPs (r(2)<0.40 in HapMap CEU): 1q21 rs4845396 (odds ratio [OR] 0.30, 95% confidence interval [CI] 0.13-0.67, P = 0.003), 4q25 rs4631108 (OR 3.43, 95% CI 1.59-7.42, P = 0.002), and 16q22 rs16971547 (OR 8.1, 95% CI 1.46-45.4, P = 0.016). Estimates of European ancestry were similar among cases (23.6%) and controls (23.8%). Accordingly, the probability of having two copies of the European derived chromosomes at each region did not differ between cases and controls. Variable European admixture at known AF loci does not explain decreased AF susceptibility in African Americans. These data support the role of 1q21, 4q25, and 16q22 variants in AF risk for African Americans, although the index SNPs differ from those identified in European-descent populations.

  11. A genome-wide association study identifies risk loci for spirometric measures among smokers of European and African ancestry.

    PubMed

    Lutz, Sharon M; Cho, Michael H; Young, Kendra; Hersh, Craig P; Castaldi, Peter J; McDonald, Merry-Lynn; Regan, Elizabeth; Mattheisen, Manuel; DeMeo, Dawn L; Parker, Margaret; Foreman, Marilyn; Make, Barry J; Jensen, Robert L; Casaburi, Richard; Lomas, David A; Bhatt, Surya P; Bakke, Per; Gulsvik, Amund; Crapo, James D; Beaty, Terri H; Laird, Nan M; Lange, Christoph; Hokanson, John E; Silverman, Edwin K

    2015-12-03

    Pulmonary function decline is a major contributor to morbidity and mortality among smokers. Post bronchodilator FEV1 and FEV1/FVC ratio are considered the standard assessment of airflow obstruction. We performed a genome-wide association study (GWAS) in 9919 current and former smokers in the COPDGene study (6659 non-Hispanic Whites [NHW] and 3260 African Americans [AA]) to identify associations with spirometric measures (post-bronchodilator FEV1 and FEV1/FVC). We also conducted meta-analysis of FEV1 and FEV1/FVC GWAS in the COPDGene, ECLIPSE, and GenKOLS cohorts (total n = 13,532). Among NHW in the COPDGene cohort, both measures of pulmonary function were significantly associated with SNPs at the 15q25 locus [containing CHRNA3/5, AGPHD1, IREB2, CHRNB4] (lowest p-value = 2.17 × 10(-11)), and FEV1/FVC was associated with a genomic region on chromosome 4 [upstream of HHIP] (lowest p-value = 5.94 × 10(-10)); both regions have been previously associated with COPD. For the meta-analysis, in addition to confirming associations to the regions near CHRNA3/5 and HHIP, genome-wide significant associations were identified for FEV1 on chromosome 1 [TGFB2] (p-value = 8.99 × 10(-9)), 9 [DBH] (p-value = 9.69 × 10(-9)) and 19 [CYP2A6/7] (p-value = 3.49 × 10(-8)) and for FEV1/FVC on chromosome 1 [TGFB2] (p-value = 8.99 × 10(-9)), 4 [FAM13A] (p-value = 3.88 × 10(-12)), 11 [MMP3/12] (p-value = 3.29 × 10(-10)) and 14 [RIN3] (p-value = 5.64 × 10(-9)). In a large genome-wide association study of lung function in smokers, we found genome-wide significant associations at several previously described loci with lung function or COPD. We additionally identified a novel genome-wide significant locus with FEV1 on chromosome 9 [DBH] in a meta-analysis of three study populations.

  12. Genome-wide screening identifies a KCNIP1 copy number variant as a genetic predictor for atrial fibrillation

    PubMed Central

    Tsai, Chia-Ti; Hsieh, Chia-Shan; Chang, Sheng-Nan; Chuang, Eric Y.; Ueng, Kwo-Chang; Tsai, Chin-Feng; Lin, Tsung-Hsien; Wu, Cho-Kai; Lee, Jen-Kuang; Lin, Lian-Yu; Wang, Yi-Chih; Yu, Chih-Chieh; Lai, Ling-Ping; Tseng, Chuen-Den; Hwang, Juey-Jen; Chiang, Fu-Tien; Lin, Jiunn-Lee

    2016-01-01

    Atrial fibrillation (AF) is the most common sustained cardiac arrhythmia. Previous genome-wide association studies had identified single-nucleotide polymorphisms in several genomic regions to be associated with AF. In human genome, copy number variations (CNVs) are known to contribute to disease susceptibility. Using a genome-wide multistage approach to identify AF susceptibility CNVs, we here show a common 4,470-bp diallelic CNV in the first intron of potassium interacting channel 1 gene (KCNIP1) is strongly associated with AF in Taiwanese populations (odds ratio=2.27 for insertion allele; P=6.23 × 10−24). KCNIP1 insertion is associated with higher KCNIP1 mRNA expression. KCNIP1-encoded protein potassium interacting channel 1 (KCHIP1) is physically associated with potassium Kv channels and modulates atrial transient outward current in cardiac myocytes. Overexpression of KCNIP1 results in inducible AF in zebrafish. In conclusions, a common CNV in KCNIP1 gene is a genetic predictor of AF risk possibly pointing to a functional pathway. PMID:26831368

  13. Harnessing genomics to improve health in the Eastern Mediterranean Region – an executive course in genomics policy

    PubMed Central

    Acharya, Tara; Rab, Mohammed Abdur; Singer, Peter A; Daar, Abdallah S

    2005-01-01

    Background While innovations in medicine, science and technology have resulted in improved health and quality of life for many people, the benefits of modern medicine continue to elude millions of people in many parts of the world. To assess the potential of genomics to address health needs in EMR, the World Health Organization's Eastern Mediterranean Regional Office and the University of Toronto Joint Centre for Bioethics jointly organized a Genomics and Public Health Policy Executive Course, held September 20th–23rd, 2003, in Muscat, Oman. The 4-day course was sponsored by WHO-EMRO with additional support from the Canadian Program in Genomics and Global Health. The overall objective of the course was to collectively explore how to best harness genomics to improve health in the region. This article presents the course findings and recommendations for genomics policy in EMR. Methods The course brought together senior representatives from academia, biotechnology companies, regulatory bodies, media, voluntary, and legal organizations to engage in discussion. Topics covered included scientific advances in genomics, followed by innovations in business models, public sector perspectives, ethics, legal issues and national innovation systems. Results A set of recommendations, summarized below, was formulated for the Regional Office, the Member States and for individuals. • Advocacy for genomics and biotechnology for political leadership; • Networking between member states to share information, expertise, training, and regional cooperation in biotechnology; coordination of national surveys for assessment of health biotechnology innovation systems, science capacity, government policies, legislation and regulations, intellectual property policies, private sector activity; • Creation in each member country of an effective National Body on genomics, biotechnology and health to: - formulate national biotechnology strategies - raise biotechnology awareness - encourage

  14. MVisAGe Identifies Concordant and Discordant Genomic Alterations of Driver Genes in Squamous Tumors.

    PubMed

    Walter, Vonn; Du, Ying; Danilova, Ludmila; Hayward, Michele C; Hayes, D Neil

    2018-06-15

    Integrated analyses of multiple genomic datatypes are now common in cancer profiling studies. Such data present opportunities for numerous computational experiments, yet analytic pipelines are limited. Tools such as the cBioPortal and Regulome Explorer, although useful, are not easy to access programmatically or to implement locally. Here, we introduce the MVisAGe R package, which allows users to quantify gene-level associations between two genomic datatypes to investigate the effect of genomic alterations (e.g., DNA copy number changes on gene expression). Visualizing Pearson/Spearman correlation coefficients according to the genomic positions of the underlying genes provides a powerful yet novel tool for conducting exploratory analyses. We demonstrate its utility by analyzing three publicly available cancer datasets. Our approach highlights canonical oncogenes in chr11q13 that displayed the strongest associations between expression and copy number, including CCND1 and CTTN , genes not identified by copy number analysis in the primary reports. We demonstrate highly concordant usage of shared oncogenes on chr3q, yet strikingly diverse oncogene usage on chr11q as a function of HPV infection status. Regions of chr19 that display remarkable associations between methylation and gene expression were identified, as were previously unreported miRNA-gene expression associations that may contribute to the epithelial-to-mesenchymal transition. Significance: This study presents an important bioinformatics tool that will enable integrated analyses of multiple genomic datatypes. Cancer Res; 78(12); 3375-85. ©2018 AACR . ©2018 American Association for Cancer Research.

  15. Genome-wide association study identifies multiple susceptibility loci for pulmonary fibrosis.

    PubMed

    Fingerlin, Tasha E; Murphy, Elissa; Zhang, Weiming; Peljto, Anna L; Brown, Kevin K; Steele, Mark P; Loyd, James E; Cosgrove, Gregory P; Lynch, David; Groshong, Steve; Collard, Harold R; Wolters, Paul J; Bradford, Williamson Z; Kossen, Karl; Seiwert, Scott D; du Bois, Roland M; Garcia, Christine Kim; Devine, Megan S; Gudmundsson, Gunnar; Isaksson, Helgi J; Kaminski, Naftali; Zhang, Yingze; Gibson, Kevin F; Lancaster, Lisa H; Cogan, Joy D; Mason, Wendi R; Maher, Toby M; Molyneaux, Philip L; Wells, Athol U; Moffatt, Miriam F; Selman, Moises; Pardo, Annie; Kim, Dong Soon; Crapo, James D; Make, Barry J; Regan, Elizabeth A; Walek, Dinesha S; Daniel, Jerry J; Kamatani, Yoichiro; Zelenika, Diana; Smith, Keith; McKean, David; Pedersen, Brent S; Talbert, Janet; Kidd, Raven N; Markin, Cheryl R; Beckman, Kenneth B; Lathrop, Mark; Schwarz, Marvin I; Schwartz, David A

    2013-06-01

    We performed a genome-wide association study of non-Hispanic, white individuals with fibrotic idiopathic interstitial pneumonias (IIPs; n = 1,616) and controls (n = 4,683), with follow-up replication analyses in 876 cases and 1,890 controls. We confirmed association with TERT at 5p15, MUC5B at 11p15 and the 3q26 region near TERC, and we identified seven newly associated loci (Pmeta = 2.4 × 10(-8) to 1.1 × 10(-19)), including FAM13A (4q22), DSP (6p24), OBFC1 (10q24), ATP11A (13q34), DPP9 (19p13) and chromosomal regions 7q22 and 15q14-15. Our results suggest that genes involved in host defense, cell-cell adhesion and DNA repair contribute to risk of fibrotic IIPs.

  16. Read clouds uncover variation in complex regions of the human genome

    PubMed Central

    Bishara, Alex; Liu, Yuling; Weng, Ziming; Kashef-Haghighi, Dorna; Newburger, Daniel E.; West, Robert; Sidow, Arend; Batzoglou, Serafim

    2015-01-01

    Although an increasing amount of human genetic variation is being identified and recorded, determining variants within repeated sequences of the human genome remains a challenge. Most population and genome-wide association studies have therefore been unable to consider variation in these regions. Core to the problem is the lack of a sequencing technology that produces reads with sufficient length and accuracy to enable unique mapping. Here, we present a novel methodology of using read clouds, obtained by accurate short-read sequencing of DNA derived from long fragment libraries, to confidently align short reads within repeat regions and enable accurate variant discovery. Our novel algorithm, Random Field Aligner (RFA), captures the relationships among the short reads governed by the long read process via a Markov Random Field. We utilized a modified version of the Illumina TruSeq synthetic long-read protocol, which yielded shallow-sequenced read clouds. We test RFA through extensive simulations and apply it to discover variants on the NA12878 human sample, for which shallow TruSeq read cloud sequencing data are available, and on an invasive breast carcinoma genome that we sequenced using the same method. We demonstrate that RFA facilitates accurate recovery of variation in 155 Mb of the human genome, including 94% of 67 Mb of segmental duplication sequence and 96% of 11 Mb of transcribed sequence, that are currently hidden from short-read technologies. PMID:26286554

  17. Read clouds uncover variation in complex regions of the human genome.

    PubMed

    Bishara, Alex; Liu, Yuling; Weng, Ziming; Kashef-Haghighi, Dorna; Newburger, Daniel E; West, Robert; Sidow, Arend; Batzoglou, Serafim

    2015-10-01

    Although an increasing amount of human genetic variation is being identified and recorded, determining variants within repeated sequences of the human genome remains a challenge. Most population and genome-wide association studies have therefore been unable to consider variation in these regions. Core to the problem is the lack of a sequencing technology that produces reads with sufficient length and accuracy to enable unique mapping. Here, we present a novel methodology of using read clouds, obtained by accurate short-read sequencing of DNA derived from long fragment libraries, to confidently align short reads within repeat regions and enable accurate variant discovery. Our novel algorithm, Random Field Aligner (RFA), captures the relationships among the short reads governed by the long read process via a Markov Random Field. We utilized a modified version of the Illumina TruSeq synthetic long-read protocol, which yielded shallow-sequenced read clouds. We test RFA through extensive simulations and apply it to discover variants on the NA12878 human sample, for which shallow TruSeq read cloud sequencing data are available, and on an invasive breast carcinoma genome that we sequenced using the same method. We demonstrate that RFA facilitates accurate recovery of variation in 155 Mb of the human genome, including 94% of 67 Mb of segmental duplication sequence and 96% of 11 Mb of transcribed sequence, that are currently hidden from short-read technologies. © 2015 Bishara et al.; Published by Cold Spring Harbor Laboratory Press.

  18. Transethnic genome-wide scan identifies novel Alzheimer's disease loci.

    PubMed

    Jun, Gyungah R; Chung, Jaeyoon; Mez, Jesse; Barber, Robert; Beecham, Gary W; Bennett, David A; Buxbaum, Joseph D; Byrd, Goldie S; Carrasquillo, Minerva M; Crane, Paul K; Cruchaga, Carlos; De Jager, Philip; Ertekin-Taner, Nilufer; Evans, Denis; Fallin, M Danielle; Foroud, Tatiana M; Friedland, Robert P; Goate, Alison M; Graff-Radford, Neill R; Hendrie, Hugh; Hall, Kathleen S; Hamilton-Nelson, Kara L; Inzelberg, Rivka; Kamboh, M Ilyas; Kauwe, John S K; Kukull, Walter A; Kunkle, Brian W; Kuwano, Ryozo; Larson, Eric B; Logue, Mark W; Manly, Jennifer J; Martin, Eden R; Montine, Thomas J; Mukherjee, Shubhabrata; Naj, Adam; Reiman, Eric M; Reitz, Christiane; Sherva, Richard; St George-Hyslop, Peter H; Thornton, Timothy; Younkin, Steven G; Vardarajan, Badri N; Wang, Li-San; Wendlund, Jens R; Winslow, Ashley R; Haines, Jonathan; Mayeux, Richard; Pericak-Vance, Margaret A; Schellenberg, Gerard; Lunetta, Kathryn L; Farrer, Lindsay A

    2017-07-01

    Genetic loci for Alzheimer's disease (AD) have been identified in whites of European ancestry, but the genetic architecture of AD among other populations is less understood. We conducted a transethnic genome-wide association study (GWAS) for late-onset AD in Stage 1 sample including whites of European Ancestry, African-Americans, Japanese, and Israeli-Arabs assembled by the Alzheimer's Disease Genetics Consortium. Suggestive results from Stage 1 from novel loci were followed up using summarized results in the International Genomics Alzheimer's Project GWAS dataset. Genome-wide significant (GWS) associations in single-nucleotide polymorphism (SNP)-based tests (P < 5 × 10 -8 ) were identified for SNPs in PFDN1/HBEGF, USP6NL/ECHDC3, and BZRAP1-AS1 and for the interaction of the (apolipoprotein E) APOE ε4 allele with NFIC SNP. We also obtained GWS evidence (P < 2.7 × 10 -6 ) for gene-based association in the total sample with a novel locus, TPBG (P = 1.8 × 10 -6 ). Our findings highlight the value of transethnic studies for identifying novel AD susceptibility loci. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  19. RGmatch: matching genomic regions to proximal genes in omics data integration.

    PubMed

    Furió-Tarí, Pedro; Conesa, Ana; Tarazona, Sonia

    2016-11-22

    The integrative analysis of multiple genomics data often requires that genome coordinates-based signals have to be associated with proximal genes. The relative location of a genomic region with respect to the gene (gene area) is important for functional data interpretation; hence algorithms that match regions to genes should be able to deliver insight into this information. In this work we review the tools that are publicly available for making region-to-gene associations. We also present a novel method, RGmatch, a flexible and easy-to-use Python tool that computes associations either at the gene, transcript, or exon level, applying a set of rules to annotate each region-gene association with the region location within the gene. RGmatch can be applied to any organism as long as genome annotation is available. Furthermore, we qualitatively and quantitatively compare RGmatch to other tools. RGmatch simplifies the association of a genomic region with its closest gene. At the same time, it is a powerful tool because the rules used to annotate these associations are very easy to modify according to the researcher's specific interests. Some important differences between RGmatch and other similar tools already in existence are RGmatch's flexibility, its wide range of user options, compatibility with any annotatable organism, and its comprehensive and user-friendly output.

  20. Whole-genome sequencing identifies recurrent mutations in chronic lymphocytic leukaemia

    PubMed Central

    Puente, Xose S.; Pinyol, Magda; Quesada, Víctor; Conde, Laura; Ordóñez, Gonzalo R.; Villamor, Neus; Escaramis, Georgia; Jares, Pedro; Beà, Sílvia; González-Díaz, Marcos; Bassaganyas, Laia; Baumann, Tycho; Juan, Manel; López-Guerra, Mónica; Colomer, Dolors; Tubío, José M. C.; López, Cristina; Navarro, Alba; Tornador, Cristian; Aymerich, Marta; Rozman, María; Hernández, Jesús M.; Puente, Diana A.; Freije, José M. P.; Velasco, Gloria; Gutiérrez-Fernández, Ana; Costa, Dolors; Carrió, Anna; Guijarro, Sara; Enjuanes, Anna; Hernández, Lluís; Yagüe, Jordi; Nicolás, Pilar; Romeo-Casabona, Carlos M.; Himmelbauer, Heinz; Castillo, Ester; Dohm, Juliane C.; de Sanjosé, Silvia; Piris, Miguel A.; de Alava, Enrique; Miguel, Jesús San; Royo, Romina; Gelpí, Josep L.; Torrents, David; Orozco, Modesto; Pisano, David G.; Valencia, Alfonso; Guigó, Roderic; Bayés, Mónica; Heath, Simon; Gut, Marta; Klatt, Peter; Marshall, John; Raine, Keiran; Stebbings, Lucy A.; Futreal, P. Andrew; Stratton, Michael R.; Campbell, Peter J.; Gut, Ivo; López-Guillermo, Armando; Estivill, Xavier; Montserrat, Emili; López-Otín, Carlos; Campo, Elías

    2012-01-01

    Chronic lymphocytic leukaemia (CLL), the most frequent leukaemia in adults in Western countries, is a heterogeneous disease with variable clinical presentation and evolution1,2. Two major molecular subtypes can be distinguished, characterized respectively by a high or low number of somatic hypermutations in the variable region of immunoglobulin genes3,4. The molecular changes leading to the pathogenesis of the disease are still poorly understood. Here we performed whole-genome sequencing of four cases of CLL and identified 46 somatic mutations that potentially affect gene function. Further analysis of these mutations in 363 patients with CLL identified four genes that are recurrently mutated: notch 1 (NOTCH1), exportin 1 (XPO1), myeloid differentiation primary response gene 88 (MYD88) and kelch-like 6 (KLHL6). Mutations in MYD88 and KLHL6 are predominant in cases of CLL with mutated immunoglobulin genes, whereas NOTCH1 and XPO1 mutations are mainly detected in patients with unmutated immunoglobulins. The patterns of somatic mutation, supported by functional and clinical analyses, strongly indicate that the recurrent NOTCH1, MYD88 and XPO1 mutations are oncogenic changes that contribute to the clinical evolution of the disease. To our knowledge, this is the first comprehensive analysis of CLL combining whole-genome sequencing with clinical characteristics and clinical outcomes. It highlights the usefulness of this approach for the identification of clinically relevant mutations in cancer. PMID:21642962

  1. Overview Article: Identifying transcriptional cis-regulatory modules in animal genomes

    PubMed Central

    Suryamohan, Kushal; Halfon, Marc S.

    2014-01-01

    Gene expression is regulated through the activity of transcription factors and chromatin modifying proteins acting on specific DNA sequences, referred to as cis-regulatory elements. These include promoters, located at the transcription initiation sites of genes, and a variety of distal cis-regulatory modules (CRMs), the most common of which are transcriptional enhancers. Because regulated gene expression is fundamental to cell differentiation and acquisition of new cell fates, identifying, characterizing, and understanding the mechanisms of action of CRMs is critical for understanding development. CRM discovery has historically been challenging, as CRMs can be located far from the genes they regulate, have few readily-identifiable sequence characteristics, and for many years were not amenable to high-throughput discovery methods. However, the recent availability of complete genome sequences and the development of next-generation sequencing methods has led to an explosion of both computational and empirical methods for CRM discovery in model and non-model organisms alike. Experimentally, CRMs can be identified through chromatin immunoprecipitation directed against transcription factors or histone post-translational modifications, identification of nucleosome-depleted “open” chromatin regions, or sequencing-based high-throughput functional screening. Computational methods include comparative genomics, clustering of known or predicted transcription factor binding sites, and supervised machine-learning approaches trained on known CRMs. All of these methods have proven effective for CRM discovery, but each has its own considerations and limitations, and each is subject to a greater or lesser number of false-positive identifications. Experimental confirmation of predictions is essential, although shortcomings in current methods suggest that additional means of validation need to be developed. PMID:25704908

  2. TSSer: an automated method to identify transcription start sites in prokaryotic genomes from differential RNA sequencing data.

    PubMed

    Jorjani, Hadi; Zavolan, Mihaela

    2014-04-01

    Accurate identification of transcription start sites (TSSs) is an essential step in the analysis of transcription regulatory networks. In higher eukaryotes, the capped analysis of gene expression technology enabled comprehensive annotation of TSSs in genomes such as those of mice and humans. In bacteria, an equivalent approach, termed differential RNA sequencing (dRNA-seq), has recently been proposed, but the application of this approach to a large number of genomes is hindered by the paucity of computational analysis methods. With few exceptions, when the method has been used, annotation of TSSs has been largely done manually. In this work, we present a computational method called 'TSSer' that enables the automatic inference of TSSs from dRNA-seq data. The method rests on a probabilistic framework for identifying both genomic positions that are preferentially enriched in the dRNA-seq data as well as preferentially captured relative to neighboring genomic regions. Evaluating our approach for TSS calling on several publicly available datasets, we find that TSSer achieves high consistency with the curated lists of annotated TSSs, but identifies many additional TSSs. Therefore, TSSer can accelerate genome-wide identification of TSSs in bacterial genomes and can aid in further characterization of bacterial transcription regulatory networks. TSSer is freely available under GPL license at http://www.clipz.unibas.ch/TSSer/index.php

  3. Estimation of (co)variances for genomic regions of flexible sizes: application to complex infectious udder diseases in dairy cattle

    PubMed Central

    2012-01-01

    Background Multi-trait genomic models in a Bayesian context can be used to estimate genomic (co)variances, either for a complete genome or for genomic regions (e.g. per chromosome) for the purpose of multi-trait genomic selection or to gain further insight into the genomic architecture of related traits such as mammary disease traits in dairy cattle. Methods Data on progeny means of six traits related to mastitis resistance in dairy cattle (general mastitis resistance and five pathogen-specific mastitis resistance traits) were analyzed using a bivariate Bayesian SNP-based genomic model with a common prior distribution for the marker allele substitution effects and estimation of the hyperparameters in this prior distribution from the progeny means data. From the Markov chain Monte Carlo samples of the allele substitution effects, genomic (co)variances were calculated on a whole-genome level, per chromosome, and in regions of 100 SNP on a chromosome. Results Genomic proportions of the total variance differed between traits. Genomic correlations were lower than pedigree-based genetic correlations and they were highest between general mastitis and pathogen-specific traits because of the part-whole relationship between these traits. The chromosome-wise genomic proportions of the total variance differed between traits, with some chromosomes explaining higher or lower values than expected in relation to chromosome size. Few chromosomes showed pleiotropic effects and only chromosome 19 had a clear effect on all traits, indicating the presence of QTL with a general effect on mastitis resistance. The region-wise patterns of genomic variances differed between traits. Peaks indicating QTL were identified but were not very distinctive because a common prior for the marker effects was used. There was a clear difference in the region-wise patterns of genomic correlation among combinations of traits, with distinctive peaks indicating the presence of pleiotropic QTL. Conclusions

  4. High-density marker profiling confirms ancestral genomes of Avena species and identifies D-genome chromosomes of hexaploid oat.

    PubMed

    Yan, Honghai; Bekele, Wubishet A; Wight, Charlene P; Peng, Yuanying; Langdon, Tim; Latta, Robert G; Fu, Yong-Bi; Diederichsen, Axel; Howarth, Catherine J; Jellen, Eric N; Boyle, Brian; Wei, Yuming; Tinker, Nicholas A

    2016-11-01

    Genome analysis of 27 oat species identifies ancestral groups, delineates the D genome, and identifies ancestral origin of 21 mapped chromosomes in hexaploid oat. We investigated genomic relationships among 27 species of the genus Avena using high-density genetic markers revealed by genotyping-by-sequencing (GBS). Two methods of GBS analysis were used: one based on tag-level haplotypes that were previously mapped in cultivated hexaploid oat (A. sativa), and one intended to sample and enumerate tag-level haplotypes originating from all species under investigation. Qualitatively, both methods gave similar predictions regarding the clustering of species and shared ancestral genomes. Furthermore, results were consistent with previous phylogenies of the genus obtained with conventional approaches, supporting the robustness of whole genome GBS analysis. Evidence is presented to justify the final and definitive classification of the tetraploids A. insularis, A. maroccana (=A. magna), and A. murphyi as containing D-plus-C genomes, and not A-plus-C genomes, as is most often specified in past literature. Through electronic painting of the 21 chromosome representations in the hexaploid oat consensus map, we show how the relative frequency of matches between mapped hexaploid-derived haplotypes and AC (DC)-genome tetraploids vs. A- and C-genome diploids can accurately reveal the genome origin of all hexaploid chromosomes, including the approximate positions of inter-genome translocations. Evidence is provided that supports the continued classification of a diverged B genome in AB tetraploids, and it is confirmed that no extant A-genome diploids, including A. canariensis, are similar enough to the D genome of tetraploid and hexaploid oat to warrant consideration as a D-genome diploid.

  5. Divergent genome evolution caused by regional variation in DNA gain and loss between human and mouse

    PubMed Central

    Kortschak, R. Daniel

    2018-01-01

    The forces driving the accumulation and removal of non-coding DNA and ultimately the evolution of genome size in complex organisms are intimately linked to genome structure and organisation. Our analysis provides a novel method for capturing the regional variation of lineage-specific DNA gain and loss events in their respective genomic contexts. To further understand this connection we used comparative genomics to identify genome-wide individual DNA gain and loss events in the human and mouse genomes. Focusing on the distribution of DNA gains and losses, relationships to important structural features and potential impact on biological processes, we found that in autosomes, DNA gains and losses both followed separate lineage-specific accumulation patterns. However, in both species chromosome X was particularly enriched for DNA gain, consistent with its high L1 retrotransposon content required for X inactivation. We found that DNA loss was associated with gene-rich open chromatin regions and DNA gain events with gene-poor closed chromatin regions. Additionally, we found that DNA loss events tended to be smaller than DNA gain events suggesting that they were able to accumulate in gene-rich open chromatin regions due to their reduced capacity to interrupt gene regulatory architecture. GO term enrichment showed that mouse loss hotspots were strongly enriched for terms related to developmental processes. However, these genes were also located in regions with a high density of conserved elements, suggesting that despite high levels of DNA loss, gene regulatory architecture remained conserved. This is consistent with a model in which DNA gain and loss results in turnover or “churning” in regulatory element dense regions of open chromatin, where interruption of regulatory elements is selected against. PMID:29677183

  6. Genome-wide Association Study Identifies New Loci for Resistance to Leptosphaeria maculans in Canola

    PubMed Central

    Raman, Harsh; Raman, Rosy; Coombes, Neil; Song, Jie; Diffey, Simon; Kilian, Andrzej; Lindbeck, Kurt; Barbulescu, Denise M.; Batley, Jacqueline; Edwards, David; Salisbury, Phil A.; Marcroft, Steve

    2016-01-01

    Key message “We identified both quantitative and quantitative resistance loci to Leptosphaeria maculans, a fungal pathogen, causing blackleg disease in canola. Several genome-wide significant associations were detected at known and new loci for blackleg resistance. We further validated statistically significant associations in four genetic mapping populations, demonstrating that GWAS marker loci are indeed associated with resistance to L. maculans. One of the novel loci identified for the first time, Rlm12, conveys adult plant resistance in canola.” Blackleg, caused by Leptosphaeria maculans, is a significant disease which affects the sustainable production of canola (Brassica napus). This study reports a genome-wide association study based on 18,804 polymorphic SNPs to identify loci associated with qualitative and quantitative resistance to L. maculans. Genomic regions delimited with 694 significant SNP markers, that are associated with resistance evaluated using 12 single spore isolates and pathotypes from four canola stubble were identified. Several significant associations were detected at known disease resistance loci including in the vicinity of recently cloned Rlm2/LepR3 genes, and at new loci on chromosomes A01/C01, A02/C02, A03/C03, A05/C05, A06, A08, and A09. In addition, we validated statistically significant associations on A01, A07, and A10 in four genetic mapping populations, demonstrating that GWAS marker loci are indeed associated with resistance to L. maculans. One of the novel loci identified for the first time, Rlm12, conveys adult plant resistance and mapped within 13.2 kb from Arabidopsis R gene of TIR-NBS class. We showed that resistance loci are located in the vicinity of R genes of Arabidopsis thaliana and Brassica napus on the sequenced genome of B. napus cv. Darmor-bzh. Significantly associated SNP markers provide a valuable tool to enrich germplasm for favorable alleles in order to improve the level of resistance to L. maculans in

  7. Genome-wide Association Study Identifies Loci for the Polled Phenotype in Yak

    PubMed Central

    Wu, Xiaoyun; Wang, Kun; Ding, Xuezhi; Wang, Mingcheng; Chu, Min; Xie, Xiuyue; Qiu, Qiang; Yan, Ping

    2016-01-01

    The absence of horns, known as the polled phenotype, is an economically important trait in modern yak husbandry, but the genomic structure and genetic basis of this phenotype have yet to be discovered. Here, we conducted a genome-wide association study with a panel of 10 horned and 10 polled yaks using whole genome sequencing. We mapped the POLLED locus to a 200-kb interval, which comprises three protein-coding genes. Further characterization of the candidate region showed recent artificial selection signals resulting from the breeding process. We suggest that expressional variations rather than structural variations in protein probably contribute to the polled phenotype. Our results not only represent the first and important step in establishing the genomic structure of the polled region in yak, but also add to our understanding of the polled trait in bovid species. PMID:27389700

  8. Tandem repeat regions within the Burkholderia pseudomallei genome and their application for high resolution genotyping.

    PubMed

    U'Ren, Jana M; Schupp, James M; Pearson, Talima; Hornstra, Heidie; Friedman, Christine L Clark; Smith, Kimothy L; Daugherty, Rebecca R Leadem; Rhoton, Shane D; Leadem, Ben; Georgia, Shalamar; Cardon, Michelle; Huynh, Lynn Y; DeShazer, David; Harvey, Steven P; Robison, Richard; Gal, Daniel; Mayo, Mark J; Wagner, David; Currie, Bart J; Keim, Paul

    2007-03-30

    The facultative, intracellular bacterium Burkholderia pseudomallei is the causative agent of melioidosis, a serious infectious disease of humans and animals. We identified and categorized tandem repeat arrays and their distribution throughout the genome of B. pseudomallei strain K96243 in order to develop a genetic typing method for B. pseudomallei. We then screened 104 of the potentially polymorphic loci across a diverse panel of 31 isolates including B. pseudomallei, B. mallei and B. thailandensis in order to identify loci with varying degrees of polymorphism. A subset of these tandem repeat arrays were subsequently developed into a multiple-locus VNTR analysis to examine 66 B. pseudomallei and 21 B. mallei isolates from around the world, as well as 95 lineages from a serial transfer experiment encompassing ~18,000 generations. B. pseudomallei contains a preponderance of tandem repeat loci throughout its genome, many of which are duplicated elsewhere in the genome. The majority of these loci are composed of repeat motif lengths of 6 to 9 bp with 4 to 10 repeat units and are predominately located in intergenic regions of the genome. Across geographically diverse B. pseudomallei and B.mallei isolates, the 32 VNTR loci displayed between 7 and 28 alleles, with Nei's diversity values ranging from 0.47 and 0.94. Mutation rates for these loci are comparable (>10-5 per locus per generation) to that of the most diverse tandemly repeated regions found in other less diverse bacteria. The frequency, location and duplicate nature of tandemly repeated regions within the B. pseudomallei genome indicate that these tandem repeat regions may play a role in generating and maintaining adaptive genomic variation. Multiple-locus VNTR analysis revealed extensive diversity within the global isolate set containing B. pseudomallei and B. mallei, and it detected genotypic differences within clonal lineages of both species that were identical using previous typing methods. Given the health

  9. Genome-wide association study identifies loci influencing concentrations of liver enzymes in plasma

    PubMed Central

    Chambers, John C; Zhang, Weihua; Sehmi, Joban; Li, Xinzhong; Wass, Mark N; Van der Harst, Pim; Holm, Hilma; Sanna, Serena; Kavousi, Maryam; Baumeister, Sebastian E; Coin, Lachlan J; Deng, Guohong; Gieger, Christian; Heard-Costa, Nancy L; Hottenga, Jouke-Jan; Kühnel, Brigitte; Kumar, Vinod; Lagou, Vasiliki; Liang, Liming; Luan, Jian’an; Vidal, Pedro Marques; Leach, Irene Mateo; O’Reilly, Paul F; Peden, John F; Rahmioglu, Nilufer; Soininen, Pasi; Speliotes, Elizabeth K; Yuan, Xin; Thorleifsson, Gudmar; Alizadeh, Behrooz Z; Atwood, Larry D; Borecki, Ingrid B; Brown, Morris J; Charoen, Pimphen; Cucca, Francesco; Das, Debashish; de Geus, Eco J C; Dixon, Anna L; Döring, Angela; Ehret, Georg; Eyjolfsson, Gudmundur I; Farrall, Martin; Forouhi, Nita G; Friedrich, Nele; Goessling, Wolfram; Gudbjartsson, Daniel F; Harris, Tamara B; Hartikainen, Anna-Liisa; Heath, Simon; Hirschfield, Gideon M; Hofman, Albert; Homuth, Georg; Hyppönen, Elina; Janssen, Harry L A; Johnson, Toby; Kangas, Antti J; Kema, Ido P; Kühn, Jens P; Lai, Sandra; Lathrop, Mark; Lerch, Markus M; Li, Yun; Liang, T Jake; Lin, Jing-Ping; Loos, Ruth J F; Martin, Nicholas G; Moffatt, Miriam F; Montgomery, Grant W; Munroe, Patricia B; Musunuru, Kiran; Nakamura, Yusuke; O’Donnell, Christopher J; Olafsson, Isleifur; Penninx, Brenda W; Pouta, Anneli; Prins, Bram P; Prokopenko, Inga; Puls, Ralf; Ruokonen, Aimo; Savolainen, Markku J; Schlessinger, David; Schouten, Jeoffrey N L; Seedorf, Udo; Sen-Chowdhry, Srijita; Siminovitch, Katherine A; Smit, Johannes H; Spector, Timothy D; Tan, Wenting; Teslovich, Tanya M; Tukiainen, Taru; Uitterlinden, Andre G; Van der Klauw, Melanie M; Vasan, Ramachandran S; Wallace, Chris; Wallaschofski, Henri; Wichmann, H-Erich; Willemsen, Gonneke; Würtz, Peter; Xu, Chun; Yerges-Armstrong, Laura M; Abecasis, Goncalo R; Ahmadi, Kourosh R; Boomsma, Dorret I; Caulfield, Mark; Cookson, William O; van Duijn, Cornelia M; Froguel, Philippe; Matsuda, Koichi; McCarthy, Mark I; Meisinger, Christa; Mooser, Vincent; Pietiläinen, Kirsi H; Schumann, Gunter; Snieder, Harold; Sternberg, Michael J E; Stolk, Ronald P; Thomas, Howard C; Thorsteinsdottir, Unnur; Uda, Manuela; Waeber, Gérard; Wareham, Nicholas J; Waterworth, Dawn M; Watkins, Hugh; Whitfield, John B; Witteman, Jacqueline C M; Wolffenbuttel, Bruce H R; Fox, Caroline S; Ala-Korpela, Mika; Stefansson, Kari; Vollenweider, Peter; Völzke, Henry; Schadt, Eric E; Scott, James; Järvelin, Marjo-Riitta; Elliott, Paul; Kooner, Jaspal S

    2012-01-01

    Concentrations of liver enzymes in plasma are widely used as indicators of liver disease. We carried out a genome-wide association study in 61,089 individuals, identifying 42 loci associated with concentrations of liver enzymes in plasma, of which 32 are new associations (P = 10−8 to P = 10−190). We used functional genomic approaches including metabonomic profiling and gene expression analyses to identify probable candidate genes at these regions. We identified 69 candidate genes, including genes involved in biliary transport (ATP8B1 and ABCB11), glucose, carbohydrate and lipid metabolism (FADS1, FADS2, GCKR, JMJD1C, HNF1A, MLXIPL, PNPLA3, PPP1R3B, SLC2A2 and TRIB1), glycoprotein biosynthesis and cell surface glycobiology (ABO, ASGR1, FUT2, GPLD1 and ST3GAL4), inflammation and immunity (CD276, CDH6, GCKR, HNF1A, HPR, ITGA1, RORA and STAT4) and glutathione metabolism (GSTT1, GSTT2 and GGT), as well as several genes of uncertain or unknown function (including ABHD12, EFHD1, EFNA1, EPHA2, MICAL3 and ZNF827). Our results provide new insight into genetic mechanisms and pathways influencing markers of liver function. PMID:22001757

  10. Genome-wide association study identifies loci influencing concentrations of liver enzymes in plasma.

    PubMed

    Chambers, John C; Zhang, Weihua; Sehmi, Joban; Li, Xinzhong; Wass, Mark N; Van der Harst, Pim; Holm, Hilma; Sanna, Serena; Kavousi, Maryam; Baumeister, Sebastian E; Coin, Lachlan J; Deng, Guohong; Gieger, Christian; Heard-Costa, Nancy L; Hottenga, Jouke-Jan; Kühnel, Brigitte; Kumar, Vinod; Lagou, Vasiliki; Liang, Liming; Luan, Jian'an; Vidal, Pedro Marques; Mateo Leach, Irene; O'Reilly, Paul F; Peden, John F; Rahmioglu, Nilufer; Soininen, Pasi; Speliotes, Elizabeth K; Yuan, Xin; Thorleifsson, Gudmar; Alizadeh, Behrooz Z; Atwood, Larry D; Borecki, Ingrid B; Brown, Morris J; Charoen, Pimphen; Cucca, Francesco; Das, Debashish; de Geus, Eco J C; Dixon, Anna L; Döring, Angela; Ehret, Georg; Eyjolfsson, Gudmundur I; Farrall, Martin; Forouhi, Nita G; Friedrich, Nele; Goessling, Wolfram; Gudbjartsson, Daniel F; Harris, Tamara B; Hartikainen, Anna-Liisa; Heath, Simon; Hirschfield, Gideon M; Hofman, Albert; Homuth, Georg; Hyppönen, Elina; Janssen, Harry L A; Johnson, Toby; Kangas, Antti J; Kema, Ido P; Kühn, Jens P; Lai, Sandra; Lathrop, Mark; Lerch, Markus M; Li, Yun; Liang, T Jake; Lin, Jing-Ping; Loos, Ruth J F; Martin, Nicholas G; Moffatt, Miriam F; Montgomery, Grant W; Munroe, Patricia B; Musunuru, Kiran; Nakamura, Yusuke; O'Donnell, Christopher J; Olafsson, Isleifur; Penninx, Brenda W; Pouta, Anneli; Prins, Bram P; Prokopenko, Inga; Puls, Ralf; Ruokonen, Aimo; Savolainen, Markku J; Schlessinger, David; Schouten, Jeoffrey N L; Seedorf, Udo; Sen-Chowdhry, Srijita; Siminovitch, Katherine A; Smit, Johannes H; Spector, Timothy D; Tan, Wenting; Teslovich, Tanya M; Tukiainen, Taru; Uitterlinden, Andre G; Van der Klauw, Melanie M; Vasan, Ramachandran S; Wallace, Chris; Wallaschofski, Henri; Wichmann, H-Erich; Willemsen, Gonneke; Würtz, Peter; Xu, Chun; Yerges-Armstrong, Laura M; Abecasis, Goncalo R; Ahmadi, Kourosh R; Boomsma, Dorret I; Caulfield, Mark; Cookson, William O; van Duijn, Cornelia M; Froguel, Philippe; Matsuda, Koichi; McCarthy, Mark I; Meisinger, Christa; Mooser, Vincent; Pietiläinen, Kirsi H; Schumann, Gunter; Snieder, Harold; Sternberg, Michael J E; Stolk, Ronald P; Thomas, Howard C; Thorsteinsdottir, Unnur; Uda, Manuela; Waeber, Gérard; Wareham, Nicholas J; Waterworth, Dawn M; Watkins, Hugh; Whitfield, John B; Witteman, Jacqueline C M; Wolffenbuttel, Bruce H R; Fox, Caroline S; Ala-Korpela, Mika; Stefansson, Kari; Vollenweider, Peter; Völzke, Henry; Schadt, Eric E; Scott, James; Järvelin, Marjo-Riitta; Elliott, Paul; Kooner, Jaspal S

    2011-10-16

    Concentrations of liver enzymes in plasma are widely used as indicators of liver disease. We carried out a genome-wide association study in 61,089 individuals, identifying 42 loci associated with concentrations of liver enzymes in plasma, of which 32 are new associations (P = 10(-8) to P = 10(-190)). We used functional genomic approaches including metabonomic profiling and gene expression analyses to identify probable candidate genes at these regions. We identified 69 candidate genes, including genes involved in biliary transport (ATP8B1 and ABCB11), glucose, carbohydrate and lipid metabolism (FADS1, FADS2, GCKR, JMJD1C, HNF1A, MLXIPL, PNPLA3, PPP1R3B, SLC2A2 and TRIB1), glycoprotein biosynthesis and cell surface glycobiology (ABO, ASGR1, FUT2, GPLD1 and ST3GAL4), inflammation and immunity (CD276, CDH6, GCKR, HNF1A, HPR, ITGA1, RORA and STAT4) and glutathione metabolism (GSTT1, GSTT2 and GGT), as well as several genes of uncertain or unknown function (including ABHD12, EFHD1, EFNA1, EPHA2, MICAL3 and ZNF827). Our results provide new insight into genetic mechanisms and pathways influencing markers of liver function.

  11. The use of genomic coancestry matrices in the optimisation of contributions to maintain genetic diversity at specific regions of the genome.

    PubMed

    Gómez-Romano, Fernando; Villanueva, Beatriz; Fernández, Jesús; Woolliams, John A; Pong-Wong, Ricardo

    2016-01-13

    Optimal contribution methods have proved to be very efficient for controlling the rates at which coancestry and inbreeding increase and therefore, for maintaining genetic diversity. These methods have usually relied on pedigree information for estimating genetic relationships between animals. However, with the large amount of genomic information now available such as high-density single nucleotide polymorphism (SNP) chips that contain thousands of SNPs, it becomes possible to calculate more accurate estimates of relationships and to target specific regions in the genome where there is a particular interest in maximising genetic diversity. The objective of this study was to investigate the effectiveness of using genomic coancestry matrices for: (1) minimising the loss of genetic variability at specific genomic regions while restricting the overall loss in the rest of the genome; or (2) maximising the overall genetic diversity while restricting the loss of diversity at specific genomic regions. Our study shows that the use of genomic coancestry was very successful at minimising the loss of diversity and outperformed the use of pedigree-based coancestry (genetic diversity even increased in some scenarios). The results also show that genomic information allows a targeted optimisation to maintain diversity at specific genomic regions, whether they are linked or not. The level of variability maintained increased when the targeted regions were closely linked. However, such targeted management leads to an important loss of diversity in the rest of the genome and, thus, it is necessary to take further actions to constrain this loss. Optimal contribution methods also proved to be effective at restricting the loss of diversity in the rest of the genome, although the resulting rate of coancestry was higher than the constraint imposed. The use of genomic matrices when optimising contributions permits the control of genetic diversity and inbreeding at specific regions of the

  12. Computational approaches to identify functional genetic variants in cancer genomes

    PubMed Central

    Gonzalez-Perez, Abel; Mustonen, Ville; Reva, Boris; Ritchie, Graham R.S.; Creixell, Pau; Karchin, Rachel; Vazquez, Miguel; Fink, J. Lynn; Kassahn, Karin S.; Pearson, John V.; Bader, Gary; Boutros, Paul C.; Muthuswamy, Lakshmi; Ouellette, B.F. Francis; Reimand, Jüri; Linding, Rune; Shibata, Tatsuhiro; Valencia, Alfonso; Butler, Adam; Dronov, Serge; Flicek, Paul; Shannon, Nick B.; Carter, Hannah; Ding, Li; Sander, Chris; Stuart, Josh M.; Stein, Lincoln D.; Lopez-Bigas, Nuria

    2014-01-01

    The International Cancer Genome Consortium (ICGC) aims to catalog genomic abnormalities in tumors from 50 different cancer types. Genome sequencing reveals hundreds to thousands of somatic mutations in each tumor, but only a minority drive tumor progression. We present the result of discussions within the ICGC on how to address the challenge of identifying mutations that contribute to oncogenesis, tumor maintenance or response to therapy, and recommend computational techniques to annotate somatic variants and predict their impact on cancer phenotype. PMID:23900255

  13. Genome wide approaches to identify protein-DNA interactions.

    PubMed

    Ma, Tao; Ye, Zhenqing; Wang, Liguo

    2018-05-29

    Transcription factors are DNA-binding proteins that play key roles in many fundamental biological processes. Unraveling their interactions with DNA is essential to identify their target genes and understand the regulatory network. Genome-wide identification of their binding sites became feasible thanks to recent progress in experimental and computational approaches. ChIP-chip, ChIP-seq, and ChIP-exo are three widely used techniques to demarcate genome-wide transcription factor binding sites. This review aims to provide an overview of these three techniques including their experiment procedures, computational approaches, and popular analytic tools. ChIP-chip, ChIP-seq, and ChIP-exo have been the major techniques to study genome-wide in vivo protein-DNA interaction. Due to the rapid development of next-generation sequencing technology, array-based ChIP-chip is deprecated and ChIP-seq has become the most widely used technique to identify transcription factor binding sites in genome-wide. The newly developed ChIP-exo further improves the spatial resolution to single nucleotide. Numerous tools have been developed to analyze ChIP-chip, ChIP-seq and ChIP-exo data. However, different programs may employ different mechanisms or underlying algorithms thus each will inherently include its own set of statistical assumption and bias. So choosing the most appropriate analytic program for a given experiment needs careful considerations. Moreover, most programs only have command line interface so their installation and usage will require basic computation expertise in Unix/Linux. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  14. Pedigree-based analysis of derivation of genome segments of an elite rice reveals key regions during its breeding.

    PubMed

    Zhou, Degui; Chen, Wei; Lin, Zechuan; Chen, Haodong; Wang, Chongrong; Li, Hong; Yu, Renbo; Zhang, Fengyun; Zhen, Gang; Yi, Junliang; Li, Kanghuo; Liu, Yaoguang; Terzaghi, William; Tang, Xiaoyan; He, Hang; Zhou, Shaochuan; Deng, Xing Wang

    2016-02-01

    Analyses of genome variations with high-throughput assays have improved our understanding of genetic basis of crop domestication and identified the selected genome regions, but little is known about that of modern breeding, which has limited the usefulness of massive elite cultivars in further breeding. Here we deploy pedigree-based analysis of an elite rice, Huanghuazhan, to exploit key genome regions during its breeding. The cultivars in the pedigree were resequenced with 7.6× depth on average, and 2.1 million high-quality single nucleotide polymorphisms (SNPs) were obtained. Tracing the derivation of genome blocks with pedigree and information on SNPs revealed the chromosomal recombination during breeding, which showed that 26.22% of Huanghuazhan genome are strictly conserved key regions. These major effect regions were further supported by a QTL mapping of 260 recombinant inbred lines derived from the cross of Huanghuazhan and a very dissimilar cultivar, Shuanggui 36, and by the genome profile of eight cultivars and 36 elite lines derived from Huanghuazhan. Hitting these regions with the cloned genes revealed they include numbers of key genes, which were then applied to demonstrate how Huanghuazhan were bred after 30 years of effort and to dissect the deficiency of artificial selection. We concluded the regions are helpful to the further breeding based on this pedigree and performing breeding by design. Our study provides genetic dissection of modern rice breeding and sheds new light on how to perform genomewide breeding by design. © 2015 Society for Experimental Biology, Association of Applied Biologists and John Wiley & Sons Ltd.

  15. Integrated genomic and interfacility patient-transfer data reveal the transmission pathways of multidrug-resistant Klebsiella pneumoniae in a regional outbreak.

    PubMed

    Snitkin, Evan S; Won, Sarah; Pirani, Ali; Lapp, Zena; Weinstein, Robert A; Lolans, Karen; Hayden, Mary K

    2017-11-22

    Development of effective strategies to limit the proliferation of multidrug-resistant organisms requires a thorough understanding of how such organisms spread among health care facilities. We sought to uncover the chains of transmission underlying a 2008 U.S. regional outbreak of carbapenem-resistant Klebsiella pneumoniae by performing an integrated analysis of genomic and interfacility patient-transfer data. Genomic analysis yielded a high-resolution transmission network that assigned directionality to regional transmission events and discriminated between intra- and interfacility transmission when epidemiologic data were ambiguous or misleading. Examining the genomic transmission network in the context of interfacility patient transfers (patient-sharing networks) supported the role of patient transfers in driving the outbreak, with genomic analysis revealing that a small subset of patient-transfer events was sufficient to explain regional spread. Further integration of the genomic and patient-sharing networks identified one nursing home as an important bridge facility early in the outbreak-a role that was not apparent from analysis of genomic or patient-transfer data alone. Last, we found that when simulating a real-time regional outbreak, our methodology was able to accurately infer the facility at which patients acquired their infections. This approach has the potential to identify facilities with high rates of intra- or interfacility transmission, data that will be useful for triggering targeted interventions to prevent further spread of multidrug-resistant organisms. Copyright © 2017 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.

  16. The Cancer Genome Atlas Clinical Explorer: a web and mobile interface for identifying clinical-genomic driver associations.

    PubMed

    Lee, HoJoon; Palm, Jennifer; Grimes, Susan M; Ji, Hanlee P

    2015-10-27

    The Cancer Genome Atlas (TCGA) project has generated genomic data sets covering over 20 malignancies. These data provide valuable insights into the underlying genetic and genomic basis of cancer. However, exploring the relationship among TCGA genomic results and clinical phenotype remains a challenge, particularly for individuals lacking formal bioinformatics training. Overcoming this hurdle is an important step toward the wider clinical translation of cancer genomic/proteomic data and implementation of precision cancer medicine. Several websites such as the cBio portal or University of California Santa Cruz genome browser make TCGA data accessible but lack interactive features for querying clinically relevant phenotypic associations with cancer drivers. To enable exploration of the clinical-genomic driver associations from TCGA data, we developed the Cancer Genome Atlas Clinical Explorer. The Cancer Genome Atlas Clinical Explorer interface provides a straightforward platform to query TCGA data using one of the following methods: (1) searching for clinically relevant genes, micro RNAs, and proteins by name, cancer types, or clinical parameters; (2) searching for genomic/proteomic profile changes by clinical parameters in a cancer type; or (3) testing two-hit hypotheses. SQL queries run in the background and results are displayed on our portal in an easy-to-navigate interface according to user's input. To derive these associations, we relied on elastic-net estimates of optimal multiple linear regularized regression and clinical parameters in the space of multiple genomic/proteomic features provided by TCGA data. Moreover, we identified and ranked gene/micro RNA/protein predictors of each clinical parameter for each cancer. The robustness of the results was estimated by bootstrapping. Overall, we identify associations of potential clinical relevance among genes/micro RNAs/proteins using our statistical analysis from 25 cancer types and 18 clinical parameters that

  17. Whole genome population genetics analysis of Sudanese goats identifies regions harboring genes associated with major traits.

    PubMed

    Rahmatalla, Siham A; Arends, Danny; Reissmann, Monika; Said Ahmed, Ammar; Wimmers, Klaus; Reyer, Henry; Brockmann, Gudrun A

    2017-10-23

    Sudan is endowed with a variety of indigenous goat breeds which are used for meat and milk production and which are well adapted to the local environment. The aim of the present study was to determine the genetic diversity and relationship within and between the four main Sudanese breeds of Nubian, Desert, Taggar and Nilotic goats. Using the 50 K SNP chip, 24 animals of each breed were genotyped. More than 96% of high quality SNPs were polymorphic with an average minor allele frequency of 0.3. In all breeds, no significant difference between observed (0.4) and expected (0.4) heterozygosity was found and the inbreeding coefficients (F IS ) did not differ from zero. F st coefficients for the genetic distance between breeds also did not significantly deviate from zero. In addition, the analysis of molecular variance revealed that 93% of the total variance in the examined population can be explained by differences among individuals, while only 7% result from differences between the breeds. These findings provide evidence for high genetic diversity and little inbreeding within breeds on one hand, and low diversity between breeds on the other hand. Further examinations using Nei's genetic distance and STRUCTURE analysis clustered Taggar goats distinct from the other breeds. In a principal component (PC) analysis, PC1 could separate Taggar, Nilotic and a mix of Nubian and Desert goats into three groups. The SNPs that contributed strongly to PC1 showed high F st values in Taggar goat versus the other goat breeds. PCA allowed us to identify target genomic regions which contain genes known to influence growth, development, bone formation and the immune system. The information on the genetic variability and diversity in this study confirmed that Taggar goat is genetically different from the other goat breeds in Sudan. The SNPs identified by the first principal components show high F st values in Taggar goat and allowed to identify candidate genes which can be used in the

  18. Genome-wide association study identifies multiple loci associated with both mammographic density and breast cancer risk

    PubMed Central

    Lindström, Sara; Thompson, Deborah J.; Paterson, Andrew D.; Li, Jingmei; Gierach, Gretchen L.; Scott, Christopher; Stone, Jennifer; Douglas, Julie A.; dos-Santos-Silva, Isabel; Fernandez-Navarro, Pablo; Verghase, Jajini; Smith, Paula; Brown, Judith; Luben, Robert; Wareham, Nicholas J.; Loos, Ruth J.F.; Heit, John A.; Pankratz, V. Shane; Norman, Aaron; Goode, Ellen L.; Cunningham, Julie M.; deAndrade, Mariza; Vierkant, Robert A.; Czene, Kamila; Fasching, Peter A.; Baglietto, Laura; Southey, Melissa C.; Giles, Graham G.; Shah, Kaanan P.; Chan, Heang-Ping; Helvie, Mark A.; Beck, Andrew H.; Knoblauch, Nicholas W.; Hazra, Aditi; Hunter, David J.; Kraft, Peter; Pollan, Marina; Figueroa, Jonine D.; Couch, Fergus J.; Hopper, John L.; Hall, Per; Easton, Douglas F.; Boyd, Norman F.; Vachon, Celine M.; Tamimi, Rulla M.

    2015-01-01

    Mammographic density reflects the amount of stromal and epithelial tissues in relation to adipose tissue in the breast and is a strong risk factor for breast cancer. Here we report the results from meta-analysis of genome-wide association studies (GWAS) of three mammographic density phenotypes: dense area, non-dense area and percent density in up to 7,916 women in stage 1 and an additional 10,379 women in stage 2. We identify genome-wide significant (P<5×10−8) loci for dense area (AREG, ESR1, ZNF365, LSP1/TNNT3, IGF1, TMEM184B, SGSM3/MKL1), non-dense area (8p11.23) and percent density (PRDM6, 8p11.23, TMEM184B). Four of these regions are known breast cancer susceptibility loci, and four additional regions were found to be associated with breast cancer (P<0.05) in a large meta-analysis. These results provide further evidence of a shared genetic basis between mammographic density and breast cancer and illustrate the power of studying intermediate quantitative phenotypes to identify putative disease susceptibility loci. PMID:25342443

  19. Identifying and mitigating batch effects in whole genome sequencing data.

    PubMed

    Tom, Jennifer A; Reeder, Jens; Forrest, William F; Graham, Robert R; Hunkapiller, Julie; Behrens, Timothy W; Bhangale, Tushar R

    2017-07-24

    Large sample sets of whole genome sequencing with deep coverage are being generated, however assembling datasets from different sources inevitably introduces batch effects. These batch effects are not well understood and can be due to changes in the sequencing protocol or bioinformatics tools used to process the data. No systematic algorithms or heuristics exist to detect and filter batch effects or remove associations impacted by batch effects in whole genome sequencing data. We describe key quality metrics, provide a freely available software package to compute them, and demonstrate that identification of batch effects is aided by principal components analysis of these metrics. To mitigate batch effects, we developed new site-specific filters that identified and removed variants that falsely associated with the phenotype due to batch effect. These include filtering based on: a haplotype based genotype correction, a differential genotype quality test, and removing sites with missing genotype rate greater than 30% after setting genotypes with quality scores less than 20 to missing. This method removed 96.1% of unconfirmed genome-wide significant SNP associations and 97.6% of unconfirmed genome-wide significant indel associations. We performed analyses to demonstrate that: 1) These filters impacted variants known to be disease associated as 2 out of 16 confirmed associations in an AMD candidate SNP analysis were filtered, representing a reduction in power of 12.5%, 2) In the absence of batch effects, these filters removed only a small proportion of variants across the genome (type I error rate of 3%), and 3) in an independent dataset, the method removed 90.2% of unconfirmed genome-wide SNP associations and 89.8% of unconfirmed genome-wide indel associations. Researchers currently do not have effective tools to identify and mitigate batch effects in whole genome sequencing data. We developed and validated methods and filters to address this deficiency.

  20. Identifying structural variation in haploid microbial genomes from short-read resequencing data using breseq.

    PubMed

    Barrick, Jeffrey E; Colburn, Geoffrey; Deatherage, Daniel E; Traverse, Charles C; Strand, Matthew D; Borges, Jordan J; Knoester, David B; Reba, Aaron; Meyer, Austin G

    2014-11-29

    Mutations that alter chromosomal structure play critical roles in evolution and disease, including in the origin of new lifestyles and pathogenic traits in microbes. Large-scale rearrangements in genomes are often mediated by recombination events involving new or existing copies of mobile genetic elements, recently duplicated genes, or other repetitive sequences. Most current software programs for predicting structural variation from short-read DNA resequencing data are intended primarily for use on human genomes. They typically disregard information in reads mapping to repeat sequences, and significant post-processing and manual examination of their output is often required to rule out false-positive predictions and precisely describe mutational events. We have implemented an algorithm for identifying structural variation from DNA resequencing data as part of the breseq computational pipeline for predicting mutations in haploid microbial genomes. Our method evaluates the support for new sequence junctions present in a clonal sample from split-read alignments to a reference genome, including matches to repeat sequences. Then, it uses a statistical model of read coverage evenness to accept or reject these predictions. Finally, breseq combines predictions of new junctions and deleted chromosomal regions to output biologically relevant descriptions of mutations and their effects on genes. We demonstrate the performance of breseq on simulated Escherichia coli genomes with deletions generating unique breakpoint sequences, new insertions of mobile genetic elements, and deletions mediated by mobile elements. Then, we reanalyze data from an E. coli K-12 mutation accumulation evolution experiment in which structural variation was not previously identified. Transposon insertions and large-scale chromosomal changes detected by breseq account for ~25% of spontaneous mutations in this strain. In all cases, we find that breseq is able to reliably predict structural variation

  1. GenomeVista

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Poliakov, Alexander; Couronne, Olivier

    2002-11-04

    Aligning large vertebrate genomes that are structurally complex poses a variety of problems not encountered on smaller scales. Such genomes are rich in repetitive elements and contain multiple segmental duplications, which increases the difficulty of identifying true orthologous SNA segments in alignments. The sizes of the sequences make many alignment algorithms designed for comparing single proteins extremely inefficient when processing large genomic intervals. We integrated both local and global alignment tools and developed a suite of programs for automatically aligning large vertebrate genomes and identifying conserved non-coding regions in the alignments. Our method uses the BLAT local alignment program tomore » find anchors on the base genome to identify regions of possible homology for a query sequence. These regions are postprocessed to find the best candidates which are then globally aligned using the AVID global alignment program. In the last step conserved non-coding segments are identified using VISTA. Our methods are fast and the resulting alignments exhibit a high degree of sensitivity, covering more than 90% of known coding exons in the human genome. The GenomeVISTA software is a suite of Perl programs that is built on a MySQL database platform. The scheduler gets control data from the database, builds a queve of jobs, and dispatches them to a PC cluster for execution. The main program, running on each node of the cluster, processes individual sequences. A Perl library acts as an interface between the database and the above programs. The use of a separate library allows the programs to function independently of the database schema. The library also improves on the standard Perl MySQL database interfere package by providing auto-reconnect functionality and improved error handling.« less

  2. Comparative Analysis of the Full Genome of Helicobacter pylori Isolate Sahul64 Identifies Genes of High Divergence

    PubMed Central

    Lu, Wei; Wise, Michael J.; Tay, Chin Yen; Windsor, Helen M.; Marshall, Barry J.; Peacock, Christopher

    2014-01-01

    Isolates of Helicobacter pylori can be classified phylogeographically. High genetic diversity and rapid microevolution are a hallmark of H. pylori genomes, a phenomenon that is proposed to play a functional role in persistence and colonization of diverse human populations. To provide further genomic evidence in the lineage of H. pylori and to further characterize diverse strains of this pathogen in different human populations, we report the finished genome sequence of Sahul64, an H. pylori strain isolated from an indigenous Australian. Our analysis identified genes that were highly divergent compared to the 38 publically available genomes, which include genes involved in the biosynthesis and modification of lipopolysaccharide, putative prophage genes, restriction modification components, and hypothetical genes. Furthermore, the virulence-associated vacA locus is a pseudogene and the cag pathogenicity island (cagPAI) is not present. However, the genome does contain a gene cluster associated with pathogenicity, including dupA. Our analysis found that with the addition of Sahul64 to the 38 genomes, the core genome content of H. pylori is reduced by approximately 14% (∼170 genes) and the pan-genome has expanded from 2,070 to 2,238 genes. We have identified three putative horizontally acquired regions, including one that is likely to have been acquired from the closely related Helicobacter cetorum prior to speciation. Our results suggest that Sahul64, with the absence of cagPAI, highly divergent cell envelope proteins, and a predicted nontransportable VacA protein, could be more highly adapted to ancient indigenous Australian people but with lower virulence potential compared to other sequenced and cagPAI-positive H. pylori strains. PMID:24375107

  3. Comparative analysis of the full genome of Helicobacter pylori isolate Sahul64 identifies genes of high divergence.

    PubMed

    Lu, Wei; Wise, Michael J; Tay, Chin Yen; Windsor, Helen M; Marshall, Barry J; Peacock, Christopher; Perkins, Tim

    2014-03-01

    Isolates of Helicobacter pylori can be classified phylogeographically. High genetic diversity and rapid microevolution are a hallmark of H. pylori genomes, a phenomenon that is proposed to play a functional role in persistence and colonization of diverse human populations. To provide further genomic evidence in the lineage of H. pylori and to further characterize diverse strains of this pathogen in different human populations, we report the finished genome sequence of Sahul64, an H. pylori strain isolated from an indigenous Australian. Our analysis identified genes that were highly divergent compared to the 38 publically available genomes, which include genes involved in the biosynthesis and modification of lipopolysaccharide, putative prophage genes, restriction modification components, and hypothetical genes. Furthermore, the virulence-associated vacA locus is a pseudogene and the cag pathogenicity island (cagPAI) is not present. However, the genome does contain a gene cluster associated with pathogenicity, including dupA. Our analysis found that with the addition of Sahul64 to the 38 genomes, the core genome content of H. pylori is reduced by approximately 14% (∼170 genes) and the pan-genome has expanded from 2,070 to 2,238 genes. We have identified three putative horizontally acquired regions, including one that is likely to have been acquired from the closely related Helicobacter cetorum prior to speciation. Our results suggest that Sahul64, with the absence of cagPAI, highly divergent cell envelope proteins, and a predicted nontransportable VacA protein, could be more highly adapted to ancient indigenous Australian people but with lower virulence potential compared to other sequenced and cagPAI-positive H. pylori strains.

  4. Characterization of Genome-Wide Association-Identified Variants for Atrial Fibrillation in African Americans

    PubMed Central

    Delaney, Jessica T.; Jeff, Janina M.; Brown, Nancy J.; Pretorius, Mias; Okafor, Henry E.; Darbar, Dawood; Roden, Dan M.; Crawford, Dana C.

    2012-01-01

    Background Despite a greater burden of risk factors, atrial fibrillation (AF) is less common among African Americans than European-descent populations. Genome-wide association studies (GWAS) for AF in European-descent populations have identified three predominant genomic regions associated with increased risk (1q21, 4q25, and 16q22). The contribution of these loci to AF risk in African American is unknown. Methodology/Principal Findings We studied 73 African Americans with AF from the Vanderbilt-Meharry AF registry and 71 African American controls, with no history of AF including after cardiac surgery. Tests of association were performed for 148 SNPs across the three regions associated with AF, and 22 SNPs were significantly associated with AF (P<0.05). The SNPs with the strongest associations in African Americans were both different from the index SNPs identified in European-descent populations and independent from the index European-descent population SNPs (r2<0.40 in HapMap CEU): 1q21 rs4845396 (odds ratio [OR] 0.30, 95% confidence interval [CI] 0.13–0.67, P = 0.003), 4q25 rs4631108 (OR 3.43, 95% CI 1.59–7.42, P = 0.002), and 16q22 rs16971547 (OR 8.1, 95% CI 1.46–45.4, P = 0.016). Estimates of European ancestry were similar among cases (23.6%) and controls (23.8%). Accordingly, the probability of having two copies of the European derived chromosomes at each region did not differ between cases and controls. Conclusions/Significance Variable European admixture at known AF loci does not explain decreased AF susceptibility in African Americans. These data support the role of 1q21, 4q25, and 16q22 variants in AF risk for African Americans, although the index SNPs differ from those identified in European-descent populations. PMID:22384221

  5. Genome-Wide Association Studies Identify CHRNA5/3 and HTR4 in the Development of Airflow Obstruction

    PubMed Central

    Shrine, Nick R. G.; Loehr, Laura R.; Zhao, Jing Hua; Manichaikul, Ani; Lopez, Lorna M.; Smith, Albert Vernon; Heckbert, Susan R.; Smolonska, Joanna; Tang, Wenbo; Loth, Daan W.; Curjuric, Ivan; Hui, Jennie; Latourelle, Jeanne C.; Henry, Amanda P.; Aldrich, Melinda; Bakke, Per; Beaty, Terri H.; Bentley, Amy R.; Borecki, Ingrid B.; Brusselle, Guy G.; Burkart, Kristin M.; Chen, Ting-hsu; Couper, David; Crapo, James D.; Davies, Gail; Dupuis, Josée; Franceschini, Nora; Gulsvik, Amund; Hancock, Dana B.; Harris, Tamara B.; Hofman, Albert; Imboden, Medea; James, Alan L.; Khaw, Kay-Tee; Lahousse, Lies; Launer, Lenore J.; Litonjua, Augusto; Liu, Yongmei; Lohman, Kurt K.; Lomas, David A.; Lumley, Thomas; Marciante, Kristin D.; McArdle, Wendy L.; Meibohm, Bernd; Morrison, Alanna C.; Musk, Arthur W.; Myers, Richard H.; North, Kari E.; Postma, Dirkje S.; Psaty, Bruce M.; Rich, Stephen S.; Rivadeneira, Fernando; Rochat, Thierry; Rotter, Jerome I.; Artigas, María Soler; Starr, John M.; Uitterlinden, André G.; Wareham, Nicholas J.; Wijmenga, Cisca; Zanen, Pieter; Province, Michael A.; Silverman, Edwin K.; Deary, Ian J.; Palmer, Lyle J.; Cassano, Patricia A.; Gudnason, Vilmundur; Barr, R. Graham; Loos, Ruth J. F.; Strachan, David P.; London, Stephanie J.; Boezen, H. Marike; Probst-Hensch, Nicole; Gharib, Sina A.; Hall, Ian P.; O’Connor, George T.; Tobin, Martin D.; Stricker, Bruno H.

    2012-01-01

    Rationale: Genome-wide association studies (GWAS) have identified loci influencing lung function, but fewer genes influencing chronic obstructive pulmonary disease (COPD) are known. Objectives: Perform meta-analyses of GWAS for airflow obstruction, a key pathophysiologic characteristic of COPD assessed by spirometry, in population-based cohorts examining all participants, ever smokers, never smokers, asthma-free participants, and more severe cases. Methods: Fifteen cohorts were studied for discovery (3,368 affected; 29,507 unaffected), and a population-based family study and a meta-analysis of case-control studies were used for replication and regional follow-up (3,837 cases; 4,479 control subjects). Airflow obstruction was defined as FEV1 and its ratio to FVC (FEV1/FVC) both less than their respective lower limits of normal as determined by published reference equations. Measurements and Main Results: The discovery meta-analyses identified one region on chromosome 15q25.1 meeting genome-wide significance in ever smokers that includes AGPHD1, IREB2, and CHRNA5/CHRNA3 genes. The region was also modestly associated among never smokers. Gene expression studies confirmed the presence of CHRNA5/3 in lung, airway smooth muscle, and bronchial epithelial cells. A single-nucleotide polymorphism in HTR4, a gene previously related to FEV1/FVC, achieved genome-wide statistical significance in combined meta-analysis. Top single-nucleotide polymorphisms in ADAM19, RARB, PPAP2B, and ADAMTS19 were nominally replicated in the COPD meta-analysis. Conclusions: These results suggest an important role for the CHRNA5/3 region as a genetic risk factor for airflow obstruction that may be independent of smoking and implicate the HTR4 gene in the etiology of airflow obstruction. PMID:22837378

  6. 1000 Genomes-based meta-analysis identifies 10 novel loci for kidney function

    PubMed Central

    Gorski, Mathias; van der Most, Peter J.; Teumer, Alexander; Chu, Audrey Y.; Li, Man; Mijatovic, Vladan; Nolte, Ilja M.; Cocca, Massimiliano; Taliun, Daniel; Gomez, Felicia; Li, Yong; Tayo, Bamidele; Tin, Adrienne; Feitosa, Mary F.; Aspelund, Thor; Attia, John; Biffar, Reiner; Bochud, Murielle; Boerwinkle, Eric; Borecki, Ingrid; Bottinger, Erwin P.; Chen, Ming-Huei; Chouraki, Vincent; Ciullo, Marina; Coresh, Josef; Cornelis, Marilyn C.; Curhan, Gary C.; d’Adamo, Adamo Pio; Dehghan, Abbas; Dengler, Laura; Ding, Jingzhong; Eiriksdottir, Gudny; Endlich, Karlhans; Enroth, Stefan; Esko, Tõnu; Franco, Oscar H.; Gasparini, Paolo; Gieger, Christian; Girotto, Giorgia; Gottesman, Omri; Gudnason, Vilmundur; Gyllensten, Ulf; Hancock, Stephen J.; Harris, Tamara B.; Helmer, Catherine; Höllerer, Simon; Hofer, Edith; Hofman, Albert; Holliday, Elizabeth G.; Homuth, Georg; Hu, Frank B.; Huth, Cornelia; Hutri-Kähönen, Nina; Hwang, Shih-Jen; Imboden, Medea; Johansson, Åsa; Kähönen, Mika; König, Wolfgang; Kramer, Holly; Krämer, Bernhard K.; Kumar, Ashish; Kutalik, Zoltan; Lambert, Jean-Charles; Launer, Lenore J.; Lehtimäki, Terho; de Borst, Martin; Navis, Gerjan; Swertz, Morris; Liu, Yongmei; Lohman, Kurt; Loos, Ruth J. F.; Lu, Yingchang; Lyytikäinen, Leo-Pekka; McEvoy, Mark A.; Meisinger, Christa; Meitinger, Thomas; Metspalu, Andres; Metzger, Marie; Mihailov, Evelin; Mitchell, Paul; Nauck, Matthias; Oldehinkel, Albertine J.; Olden, Matthias; WJH Penninx, Brenda; Pistis, Giorgio; Pramstaller, Peter P.; Probst-Hensch, Nicole; Raitakari, Olli T.; Rettig, Rainer; Ridker, Paul M.; Rivadeneira, Fernando; Robino, Antonietta; Rosas, Sylvia E.; Ruderfer, Douglas; Ruggiero, Daniela; Saba, Yasaman; Sala, Cinzia; Schmidt, Helena; Schmidt, Reinhold; Scott, Rodney J.; Sedaghat, Sanaz; Smith, Albert V.; Sorice, Rossella; Stengel, Benedicte; Stracke, Sylvia; Strauch, Konstantin; Toniolo, Daniela; Uitterlinden, Andre G.; Ulivi, Sheila; Viikari, Jorma S.; Völker, Uwe; Vollenweider, Peter; Völzke, Henry; Vuckovic, Dragana; Waldenberger, Melanie; Jin Wang, Jie; Yang, Qiong; Chasman, Daniel I.; Tromp, Gerard; Snieder, Harold; Heid, Iris M.; Fox, Caroline S.; Köttgen, Anna; Pattaro, Cristian; Böger, Carsten A.; Fuchsberger, Christian

    2017-01-01

    HapMap imputed genome-wide association studies (GWAS) have revealed >50 loci at which common variants with minor allele frequency >5% are associated with kidney function. GWAS using more complete reference sets for imputation, such as those from The 1000 Genomes project, promise to identify novel loci that have been missed by previous efforts. To investigate the value of such a more complete variant catalog, we conducted a GWAS meta-analysis of kidney function based on the estimated glomerular filtration rate (eGFR) in 110,517 European ancestry participants using 1000 Genomes imputed data. We identified 10 novel loci with p-value < 5 × 10−8 previously missed by HapMap-based GWAS. Six of these loci (HOXD8, ARL15, PIK3R1, EYA4, ASTN2, and EPB41L3) are tagged by common SNPs unique to the 1000 Genomes reference panel. Using pathway analysis, we identified 39 significant (FDR < 0.05) genes and 127 significantly (FDR < 0.05) enriched gene sets, which were missed by our previous analyses. Among those, the 10 identified novel genes are part of pathways of kidney development, carbohydrate metabolism, cardiac septum development and glucose metabolism. These results highlight the utility of re-imputing from denser reference panels, until whole-genome sequencing becomes feasible in large samples. PMID:28452372

  7. 1000 Genomes-based meta-analysis identifies 10 novel loci for kidney function.

    PubMed

    Gorski, Mathias; van der Most, Peter J; Teumer, Alexander; Chu, Audrey Y; Li, Man; Mijatovic, Vladan; Nolte, Ilja M; Cocca, Massimiliano; Taliun, Daniel; Gomez, Felicia; Li, Yong; Tayo, Bamidele; Tin, Adrienne; Feitosa, Mary F; Aspelund, Thor; Attia, John; Biffar, Reiner; Bochud, Murielle; Boerwinkle, Eric; Borecki, Ingrid; Bottinger, Erwin P; Chen, Ming-Huei; Chouraki, Vincent; Ciullo, Marina; Coresh, Josef; Cornelis, Marilyn C; Curhan, Gary C; d'Adamo, Adamo Pio; Dehghan, Abbas; Dengler, Laura; Ding, Jingzhong; Eiriksdottir, Gudny; Endlich, Karlhans; Enroth, Stefan; Esko, Tõnu; Franco, Oscar H; Gasparini, Paolo; Gieger, Christian; Girotto, Giorgia; Gottesman, Omri; Gudnason, Vilmundur; Gyllensten, Ulf; Hancock, Stephen J; Harris, Tamara B; Helmer, Catherine; Höllerer, Simon; Hofer, Edith; Hofman, Albert; Holliday, Elizabeth G; Homuth, Georg; Hu, Frank B; Huth, Cornelia; Hutri-Kähönen, Nina; Hwang, Shih-Jen; Imboden, Medea; Johansson, Åsa; Kähönen, Mika; König, Wolfgang; Kramer, Holly; Krämer, Bernhard K; Kumar, Ashish; Kutalik, Zoltan; Lambert, Jean-Charles; Launer, Lenore J; Lehtimäki, Terho; de Borst, Martin; Navis, Gerjan; Swertz, Morris; Liu, Yongmei; Lohman, Kurt; Loos, Ruth J F; Lu, Yingchang; Lyytikäinen, Leo-Pekka; McEvoy, Mark A; Meisinger, Christa; Meitinger, Thomas; Metspalu, Andres; Metzger, Marie; Mihailov, Evelin; Mitchell, Paul; Nauck, Matthias; Oldehinkel, Albertine J; Olden, Matthias; Wjh Penninx, Brenda; Pistis, Giorgio; Pramstaller, Peter P; Probst-Hensch, Nicole; Raitakari, Olli T; Rettig, Rainer; Ridker, Paul M; Rivadeneira, Fernando; Robino, Antonietta; Rosas, Sylvia E; Ruderfer, Douglas; Ruggiero, Daniela; Saba, Yasaman; Sala, Cinzia; Schmidt, Helena; Schmidt, Reinhold; Scott, Rodney J; Sedaghat, Sanaz; Smith, Albert V; Sorice, Rossella; Stengel, Benedicte; Stracke, Sylvia; Strauch, Konstantin; Toniolo, Daniela; Uitterlinden, Andre G; Ulivi, Sheila; Viikari, Jorma S; Völker, Uwe; Vollenweider, Peter; Völzke, Henry; Vuckovic, Dragana; Waldenberger, Melanie; Jin Wang, Jie; Yang, Qiong; Chasman, Daniel I; Tromp, Gerard; Snieder, Harold; Heid, Iris M; Fox, Caroline S; Köttgen, Anna; Pattaro, Cristian; Böger, Carsten A; Fuchsberger, Christian

    2017-04-28

    HapMap imputed genome-wide association studies (GWAS) have revealed >50 loci at which common variants with minor allele frequency >5% are associated with kidney function. GWAS using more complete reference sets for imputation, such as those from The 1000 Genomes project, promise to identify novel loci that have been missed by previous efforts. To investigate the value of such a more complete variant catalog, we conducted a GWAS meta-analysis of kidney function based on the estimated glomerular filtration rate (eGFR) in 110,517 European ancestry participants using 1000 Genomes imputed data. We identified 10 novel loci with p-value < 5 × 10 -8 previously missed by HapMap-based GWAS. Six of these loci (HOXD8, ARL15, PIK3R1, EYA4, ASTN2, and EPB41L3) are tagged by common SNPs unique to the 1000 Genomes reference panel. Using pathway analysis, we identified 39 significant (FDR < 0.05) genes and 127 significantly (FDR < 0.05) enriched gene sets, which were missed by our previous analyses. Among those, the 10 identified novel genes are part of pathways of kidney development, carbohydrate metabolism, cardiac septum development and glucose metabolism. These results highlight the utility of re-imputing from denser reference panels, until whole-genome sequencing becomes feasible in large samples.

  8. Genome-wide association study identifies Loci and candidate genes for body composition and meat quality traits in Beijing-You chickens.

    PubMed

    Liu, Ranran; Sun, Yanfa; Zhao, Guiping; Wang, Fangjie; Wu, Dan; Zheng, Maiqing; Chen, Jilan; Zhang, Lei; Hu, Yaodong; Wen, Jie

    2013-01-01

    Body composition and meat quality traits are important economic traits of chickens. The development of high-throughput genotyping platforms and relevant statistical methods have enabled genome-wide association studies in chickens. In order to identify molecular markers and candidate genes associated with body composition and meat quality traits, genome-wide association studies were conducted using the Illumina 60 K SNP Beadchip to genotype 724 Beijing-You chickens. For each bird, a total of 16 traits were measured, including carcass weight (CW), eviscerated weight (EW), dressing percentage, breast muscle weight (BrW) and percentage (BrP), thigh muscle weight and percentage, abdominal fat weight and percentage, dry matter and intramuscular fat contents of breast and thigh muscle, ultimate pH, and shear force of the pectoralis major muscle at 100 d of age. The SNPs that were significantly associated with the phenotypic traits were identified using both simple (GLM) and compressed mixed linear (MLM) models. For nine of ten body composition traits studied, SNPs showing genome wide significance (P<2.59E-6) have been identified. A consistent region on chicken (Gallus gallus) chromosome 4 (GGA4), including seven significant SNPs and four candidate genes (LCORL, LAP3, LDB2, TAPT1), were found to be associated with CW and EW. Another 0.65 Mb region on GGA3 for BrW and BrP was identified. After measuring the mRNA content in beast muscle for five genes located in this region, the changes in GJA1 expression were found to be consistent with that of breast muscle weight across development. It is highly possible that GJA1 is a functional gene for breast muscle development in chickens. For meat quality traits, several SNPs reaching suggestive association were identified and possible candidate genes with their functions were discussed.

  9. Tracking genes of ecological relevance using a genome scan in two independent regional population samples of Arabis alpina.

    PubMed

    Poncet, Bénédicte N; Herrmann, Doris; Gugerli, Felix; Taberlet, Pierre; Holderegger, Rolf; Gielly, Ludovic; Rioux, Delphine; Thuiller, Wilfried; Aubert, Serge; Manel, Stéphanie

    2010-07-01

    Understanding the genetic basis of adaptation in response to environmental variation is fundamental as adaptation plays a key role in the extension of ecological niches to marginal habitats and in ecological speciation. Based on the assumption that some genomic markers are correlated to environmental variables, we aimed to detect loci of ecological relevance in the alpine plant Arabis alpina L. sampled in two regions, the French (99 locations) and the Swiss (109 locations) Alps. We used an unusually large genome scan [825 amplified fragment length polymorphism loci (AFLPs)] and four environmental variables related to temperature, precipitation and topography. We detected linkage disequilibrium among only 3.5% of the considered AFLP loci. A population structure analysis identified no admixture in the study regions, and the French and Swiss Alps were differentiated and therefore could be considered as two independent regions. We applied generalized estimating equations (GEE) to detect ecologically relevant loci separately in the French and Swiss Alps. We identified 78 loci of ecological relevance (9%), which were mainly related to mean annual minimum temperature. Only four of these loci were common across the French and Swiss Alps. Finally, we discuss that the genomic characterization of these ecologically relevant loci, as identified in this study, opens up new perspectives for studying functional ecology in A. alpina, its relatives and other alpine plant species.

  10. Comparison of Genome-Wide Binding of MyoD in Normal Human Myogenic Cells and Rhabdomyosarcomas Identifies Regional and Local Suppression of Promyogenic Transcription Factors

    PubMed Central

    MacQuarrie, Kyle L.; Yao, Zizhen; Fong, Abraham P.; Diede, Scott J.; Rudzinski, Erin R.; Hawkins, Douglas S.

    2013-01-01

    Rhabdomyosarcoma is a pediatric tumor of skeletal muscle that expresses the myogenic basic helix-loop-helix protein MyoD but fails to undergo terminal differentiation. Prior work has determined that DNA binding by MyoD occurs in the tumor cells, but myogenic targets fail to activate. Using MyoD chromatin immunoprecipitation coupled to high-throughput sequencing and gene expression analysis in both primary human muscle cells and RD rhabdomyosarcoma cells, we demonstrate that MyoD binds in a similar genome-wide pattern in both tumor and normal cells but binds poorly at a subset of myogenic genes that fail to activate in the tumor cells. Binding differences are found both across genomic regions and locally at specific sites that are associated with binding motifs for RUNX1, MEF2C, JDP2, and NFIC. These factors are expressed at lower levels in RD cells than muscle cells and rescue myogenesis when expressed in RD cells. MEF2C is located in a genomic region that exhibits poor MyoD binding in RD cells, whereas JDP2 exhibits local DNA hypermethylation in its promoter in both RD cells and primary tumor samples. These results demonstrate that regional and local silencing of differentiation factors contributes to the differentiation defect in rhabdomyosarcomas. PMID:23230269

  11. Genome-wide association study in Asia-adapted tropical maize reveals novel and explored genomic regions for sorghum downy mildew resistance.

    PubMed

    Rashid, Zerka; Singh, Pradeep Kumar; Vemuri, Hindu; Zaidi, Pervez Haider; Prasanna, Boddupalli Maruthi; Nair, Sudha Krishnan

    2018-01-10

    Globally, downy mildews are among the important foliar diseases of maize that cause significant yield losses. We conducted a genome-wide association study for sorghum downy mildew (SDM; Peronosclerospora sorghi) resistance in a panel of 368 inbred lines adapted to the Asian tropics. High density SNPs from Genotyping-by-sequencing were used in GWAS after controlling for population structure and kinship in the panel using a single locus mixed model. The study identified a set of 26 SNPs that were significantly associated with SDM resistance, with Bonferroni corrected P values ≤ 0.05. Among all the identified SNPs, the minor alleles were found to be favorable to SDM resistance in the mapping panel. Trend regression analysis with 16 independent genetic variants including 12 SNPs and four haplotype blocks identified SNP S2_6154311 on chromosome 2 with P value 2.61E-24 and contributing 26.7% of the phenotypic variation. Six of the SNPs/haplotypes were within the same chromosomal bins as the QTLs for SDM resistance mapped in previous studies. Apart from this, eight novel genomic regions for SDM resistance were identified in this study; they need further validation before being applied in the breeding pipeline. Ten SNPs identified in this study were co-located in reported mildew resistance genes.

  12. Novel genomic findings in multiple myeloma identified through routine diagnostic sequencing.

    PubMed

    Ryland, Georgina L; Jones, Kate; Chin, Melody; Markham, John; Aydogan, Elle; Kankanige, Yamuna; Caruso, Marisa; Guinto, Jerick; Dickinson, Michael; Prince, H Miles; Yong, Kwee; Blombery, Piers

    2018-05-14

    Multiple myeloma is a genomically complex haematological malignancy with many genomic alterations recognised as important in diagnosis, prognosis and therapeutic decision making. Here, we provide a summary of genomic findings identified through routine diagnostic next-generation sequencing at our centre. A cohort of 86 patients with multiple myeloma underwent diagnostic sequencing using a custom hybridisation-based panel targeting 104 genes. Sequence variants, genome-wide copy number changes and structural rearrangements were detected using an inhouse-developed bioinformatics pipeline. At least one mutation was found in 69 (80%) patients. Frequently mutated genes included TP53 (36%), KRAS (22.1%), NRAS (15.1%), FAM46C/DIS3 (8.1%) and TET2/FGFR3 (5.8%), including multiple mutations not previously described in myeloma. Importantly we observed TP53 mutations in the absence of a 17 p deletion in 8% of the cohort, highlighting the need for sequencing-based assessment in addition to cytogenetics to identify these high-risk patients. Multiple novel copy number changes and immunoglobulin heavy chain translocations are also discussed. Our results demonstrate that many clinically relevant genomic findings remain in multiple myeloma which have not yet been identified through large-scale sequencing efforts, and provide important mechanistic insights into plasma cell pathobiology. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  13. Identification, characterization, and utilization of genome-wide simple sequence repeats to identify a QTL for acidity in apple

    PubMed Central

    2012-01-01

    Background Apple is an economically important fruit crop worldwide. Developing a genetic linkage map is a critical step towards mapping and cloning of genes responsible for important horticultural traits in apple. To facilitate linkage map construction, we surveyed and characterized the distribution and frequency of perfect microsatellites in assembled contig sequences of the apple genome. Results A total of 28,538 SSRs have been identified in the apple genome, with an overall density of 40.8 SSRs per Mb. Di-nucleotide repeats are the most frequent microsatellites in the apple genome, accounting for 71.9% of all microsatellites. AT/TA repeats are the most frequent in genomic regions, accounting for 38.3% of all the G-SSRs, while AG/GA dimers prevail in transcribed sequences, and account for 59.4% of all EST-SSRs. A total set of 310 SSRs is selected to amplify eight apple genotypes. Of these, 245 (79.0%) are found to be polymorphic among cultivars and wild species tested. AG/GA motifs in genomic regions have detected more alleles and higher PIC values than AT/TA or AC/CA motifs. Moreover, AG/GA repeats are more variable than any other dimers in apple, and should be preferentially selected for studies, such as genetic diversity and linkage map construction. A total of 54 newly developed apple SSRs have been genetically mapped. Interestingly, clustering of markers with distorted segregation is observed on linkage groups 1, 2, 10, 15, and 16. A QTL responsible for malic acid content of apple fruits is detected on linkage group 8, and accounts for ~13.5% of the observed phenotypic variation. Conclusions This study demonstrates that di-nucleotide repeats are prevalent in the apple genome and that AT/TA and AG/GA repeats are the most frequent in genomic and transcribed sequences of apple, respectively. All SSR motifs identified in this study as well as those newly mapped SSRs will serve as valuable resources for pursuing apple genetic studies, aiding the apple breeding

  14. Whole genome sequencing of Oryza sativa L. cv. Seeragasamba identifies a new fragrance allele in rice

    PubMed Central

    Bindusree, Ganigara; Natarajan, Purushothaman; Kalva, Sukesh

    2017-01-01

    Fragrance of rice is an important trait that confers a large economic benefit to the farmers who cultivate aromatic rice varieties. Several aromatic rice varieties have limited geographic distribution, and are endowed with variety-specific unique fragrances. BADH2 was identified as a fragrance gene in 2005, and it is essential to identify the fragrance alleles from diverse geographical locations and genetic backgrounds. Seeragasamba is a short-grain aromatic rice variety of the indica type, which is cultivated in a limited area in India. Whole genome sequencing of this variety identified a new badh2 allele (badh2-p) with an 8 bp insertion in the promoter region of the BADH2 gene. When the whole genome sequences of 76 aromatic varieties in the 3000 rice genome project were analyzed, the badh2-p allele was present in 13 varieties (approximately 17%) of both indica and japonica types. In addition, the badh2-p allele was present in 17 varieties that already had the loss-of-function allele, badh2-E7. Taken together, the frequency of badh2-p allele (approximately 40%) was found to be greater than that of the badh2-E7 allele (approximately 34%) among the aromatic rice varieties. Therefore, it is suggested to include badh2-p as a predominant allele when screening for fragrance alleles in aromatic rice varieties. PMID:29190814

  15. A high quality assembly of the Nile Tilapia (Oreochromis niloticus) genome reveals the structure of two sex determination regions.

    PubMed

    Conte, Matthew A; Gammerdinger, William J; Bartie, Kerry L; Penman, David J; Kocher, Thomas D

    2017-05-02

    Tilapias are the second most farmed fishes in the world and a sustainable source of food. Like many other fish, tilapias are sexually dimorphic and sex is a commercially important trait in these fish. In this study, we developed a significantly improved assembly of the tilapia genome using the latest genome sequencing methods and show how it improves the characterization of two sex determination regions in two tilapia species. A homozygous clonal XX female Nile tilapia (Oreochromis niloticus) was sequenced to 44X coverage using Pacific Biosciences (PacBio) SMRT sequencing. Dozens of candidate de novo assemblies were generated and an optimal assembly (contig NG50 of 3.3Mbp) was selected using principal component analysis of likelihood scores calculated from several paired-end sequencing libraries. Comparison of the new assembly to the previous O. niloticus genome assembly reveals that recently duplicated portions of the genome are now well represented. The overall number of genes in the new assembly increased by 27.3%, including a 67% increase in pseudogenes. The new tilapia genome assembly correctly represents two recent vasa gene duplication events that have been verified with BAC sequencing. At total of 146Mbp of additional transposable element sequence are now assembled, a large proportion of which are recent insertions. Large centromeric satellite repeats are assembled and annotated in cichlid fish for the first time. Finally, the new assembly identifies the long-range structure of both a ~9Mbp XY sex determination region on LG1 in O. niloticus, and a ~50Mbp WZ sex determination region on LG3 in the related species O. aureus. This study highlights the use of long read sequencing to correctly assemble recent duplications and to characterize repeat-filled regions of the genome. The study serves as an example of the need for high quality genome assemblies and provides a framework for identifying sex determining genes in tilapia and related fish species.

  16. Genome assemblies for 11 Yersinia pestis strains isolated in the Caucasus region

    DOE PAGES

    Zhgenti, Ekaterine; Johnson, Shannon L.; Davenport, Karen W.; ...

    2015-09-17

    Yersinia pestis, the causative agent of plague, is endemic to the Caucasus region but few reference strain genome sequences from that region are available. We present the improved draft or finished assembled genomes from 11 strains isolated in the nation of Georgia and surrounding countries.

  17. [Comparative analysis of variable regions in the genomes of variola virus].

    PubMed

    Babkin, I V; Nepomniashchikh, T S; Maksiutov, R A; Gutorov, V V; Babkina, I N; Shchelkunov, S N

    2008-01-01

    Nucleotide sequences of two extended segments of the terminal variable regions in variola virus genome were determined. The size of the left segment was 13.5 kbp and of the right, 10.5 kbp. Totally, over 540 kbp were sequenced for 22 variola virus strains. The conducted phylogenetic analysis and the data published earlier allowed us to find the interrelations between 70 variola virus isolates, the character of their clustering, and the degree of intergroup and intragroup variations of the clusters of variola virus strains. The most polymorphic loci of the genome segments studied were determined. It was demonstrated that that these loci are localized to either noncoding genome regions or to the regions of destroyed open reading frames, characteristic of the ancestor virus. These loci are promising for development of the strategy for genotyping variola virus strains. Analysis of recombination using various methods demonstrated that, with the only exception, no statistically significant recombinational events in the genomes of variola virus strains studied were detectable.

  18. Combining population genomics and fitness QTLs to identify the genetics of local adaptation in Arabidopsis thaliana.

    PubMed

    Price, Nicholas; Moyers, Brook T; Lopez, Lua; Lasky, Jesse R; Monroe, J Grey; Mullen, Jack L; Oakley, Christopher G; Lin, Junjiang; Ågren, Jon; Schrider, Daniel R; Kern, Andrew D; McKay, John K

    2018-05-08

    Evidence for adaptation to different climates in the model species Arabidopsis thaliana is seen in reciprocal transplant experiments, but the genetic basis of this adaptation remains poorly understood. Field-based quantitative trait locus (QTL) studies provide direct but low-resolution evidence for the genetic basis of local adaptation. Using high-resolution population genomic approaches, we examine local adaptation along previously identified genetic trade-off (GT) and conditionally neutral (CN) QTLs for fitness between locally adapted Italian and Swedish A. thaliana populations [Ågren J, et al. (2013) Proc Natl Acad Sci USA 110:21077-21082]. We find that genomic regions enriched in high F ST SNPs colocalize with GT QTL peaks. Many of these high F ST regions also colocalize with regions enriched for SNPs significantly correlated to climate in Eurasia and evidence of recent selective sweeps in Sweden. Examining unfolded site frequency spectra across genes containing high F ST SNPs suggests GTs may be due to more recent adaptation in Sweden than Italy. Finally, we collapse a list of thousands of genes spanning GT QTLs to 42 genes that likely underlie the observed GTs and explore potential biological processes driving these trade-offs, from protein phosphorylation, to seed dormancy and longevity. Our analyses link population genomic analyses and field-based QTL studies of local adaptation, and emphasize that GTs play an important role in the process of local adaptation. Copyright © 2018 the Author(s). Published by PNAS.

  19. Genome-wide association study identifies the SERPINB gene cluster as a susceptibility locus for food allergy.

    PubMed

    Marenholz, Ingo; Grosche, Sarah; Kalb, Birgit; Rüschendorf, Franz; Blümchen, Katharina; Schlags, Rupert; Harandi, Neda; Price, Mareike; Hansen, Gesine; Seidenberg, Jürgen; Röblitz, Holger; Yürek, Songül; Tschirner, Sebastian; Hong, Xiumei; Wang, Xiaobin; Homuth, Georg; Schmidt, Carsten O; Nöthen, Markus M; Hübner, Norbert; Niggemann, Bodo; Beyer, Kirsten; Lee, Young-Ae

    2017-10-20

    Genetic factors and mechanisms underlying food allergy are largely unknown. Due to heterogeneity of symptoms a reliable diagnosis is often difficult to make. Here, we report a genome-wide association study on food allergy diagnosed by oral food challenge in 497 cases and 2387 controls. We identify five loci at genome-wide significance, the clade B serpin (SERPINB) gene cluster at 18q21.3, the cytokine gene cluster at 5q31.1, the filaggrin gene, the C11orf30/LRRC32 locus, and the human leukocyte antigen (HLA) region. Stratifying the results for the causative food demonstrates that association of the HLA locus is peanut allergy-specific whereas the other four loci increase the risk for any food allergy. Variants in the SERPINB gene cluster are associated with SERPINB10 expression in leukocytes. Moreover, SERPINB genes are highly expressed in the esophagus. All identified loci are involved in immunological regulation or epithelial barrier function, emphasizing the role of both mechanisms in food allergy.

  20. Genome-association analysis of Korean Holstein milk traits using genomic estimated breeding value.

    PubMed

    Shin, Donghyun; Lee, Chul; Park, Kyoung-Do; Kim, Heebal; Cho, Kwang-Hyeon

    2017-03-01

    Holsteins are known as the world's highest-milk producing dairy cattle. The purpose of this study was to identify genetic regions strongly associated with milk traits (milk production, fat, and protein) using Korean Holstein data. This study was performed using single nucleotide polymorphism (SNP) chip data (Illumina BovineSNP50 Beadchip) of 911 Korean Holstein individuals. We inferred each genomic estimated breeding values based on best linear unbiased prediction (BLUP) and ridge regression using BLUPF90 and R. We then performed a genome-wide association study and identified genetic regions related to milk traits. We identified 9, 6, and 17 significant genetic regions related to milk production, fat and protein, respectively. These genes are newly reported in the genetic association with milk traits of Holstein. This study complements a recent Holstein genome-wide association studies that identified other SNPs and genes as the most significant variants. These results will help to expand the knowledge of the polygenic nature of milk production in Holsteins.

  1. Genomic prediction and genome-wide association analysis of female longevity in a composite beef cattle breed.

    PubMed

    Hamidi Hay, E; Roberts, A

    2017-04-01

    Longevity is a highly important trait to the efficiency of beef cattle production. The objective of this study was to evaluate the genomic prediction of longevity and identify genomic regions associated with this trait. The data used in this study consisted of 547 Composite Gene Combination cows (1/2 Red Angus, 1/4 Charolais, 1/4 Tarentaise) born from 2002 to 2011 genotyped with Illumina BovineSNP50 BeadChip. Three models were used to assess genomic prediction: Bayes A, Bayes B and GBLUP using a genomic relationship matrix. To identify genomic regions associated with longevity 2 approaches were adopted: single marker genome wide association and Bayesian approach using GenSel software. The genomic prediction accuracy was low 0.28, 0.25, and 0.22 for Bayes A, Bayes B and GBLUP, respectively. The single-marker genome wide association study (GWAS)identified 5 loci with -value less than 0.05 after false discovery correction: UA-IFASA-7571 on chromosome 19 (58.03 Mb), ARS-BFGL-BAC-15059 on BTA 1 (28.8 Mb), ARS-BFGL-NGS-104159 on BTA3 (29.4 Mb), ARS-BFGL-NGS-32882 on BTA9 (104.07 Mb) and ARS-BFGL-NGS-32883 on BTA25 (33.77 Mb). The Bayesian GWAS yielded 4 genomic regions overlapping with the single marker GWAS results. The region with the highest percentage of genomic variance (3.73%) was detected on chromosome 19. Both GWAS approaches adopted in this study showed evidence for association with various chromosomal locations.

  2. Multi-ethnic genome-wide association study identifies novel locus for type 2 diabetes susceptibility

    PubMed Central

    Cook, James P; Morris, Andrew P

    2016-01-01

    Genome-wide association studies (GWAS) have traditionally been undertaken in homogeneous populations from the same ancestry group. However, with the increasing availability of GWAS in large-scale multi-ethnic cohorts, we have evaluated a framework for detecting association of genetic variants with complex traits, allowing for population structure, and developed a powerful test of heterogeneity in allelic effects between ancestry groups. We have applied the methodology to identify and characterise loci associated with susceptibility to type 2 diabetes (T2D) using GWAS data from the Resource for Genetic Epidemiology on Adult Health and Aging, a large multi-ethnic population-based cohort, created for investigating the genetic and environmental basis of age-related diseases. We identified a novel locus for T2D susceptibility at genome-wide significance (P<5 × 10−8) that maps to TOMM40-APOE, a region previously implicated in lipid metabolism and Alzheimer's disease. We have also confirmed previous reports that single-nucleotide polymorphisms at the TCF7L2 locus demonstrate the greatest extent of heterogeneity in allelic effects between ethnic groups, with the lowest risk observed in populations of East Asian ancestry. PMID:27189021

  3. Differential contribution of genomic regions to marked genetic variation and prediction of quantitative traits in broiler chickens.

    PubMed

    Abdollahi-Arpanahi, Rostam; Morota, Gota; Valente, Bruno D; Kranis, Andreas; Rosa, Guilherme J M; Gianola, Daniel

    2016-02-03

    Genome-wide association studies in humans have found enrichment of trait-associated single nucleotide polymorphisms (SNPs) in coding regions of the genome and depletion of these in intergenic regions. However, a recent release of the ENCyclopedia of DNA elements showed that ~80 % of the human genome has a biochemical function. Similar studies on the chicken genome are lacking, thus assessing the relative contribution of its genic and non-genic regions to variation is relevant for biological studies and genetic improvement of chicken populations. A dataset including 1351 birds that were genotyped with the 600K Affymetrix platform was used. We partitioned SNPs according to genome annotation data into six classes to characterize the relative contribution of genic and non-genic regions to genetic variation as well as their predictive power using all available quality-filtered SNPs. Target traits were body weight, ultrasound measurement of breast muscle and hen house egg production in broiler chickens. Six genomic regions were considered: intergenic regions, introns, missense, synonymous, 5' and 3' untranslated regions, and regions that are located 5 kb upstream and downstream of coding genes. Genomic relationship matrices were constructed for each genomic region and fitted in the models, separately or simultaneously. Kernel-based ridge regression was used to estimate variance components and assess predictive ability. Contribution of each class of genomic regions to dominance variance was also considered. Variance component estimates indicated that all genomic regions contributed to marked additive genetic variation and that the class of synonymous regions tended to have the greatest contribution. The marked dominance genetic variation explained by each class of genomic regions was similar and negligible (~0.05). In terms of prediction mean-square error, the whole-genome approach showed the best predictive ability. All genic and non-genic regions contributed to

  4. Identifying novel biomarkers in sarcoidosis using genome-based approaches

    PubMed Central

    Knox, Kenneth S.; Garcia, Joe G.N.

    2015-01-01

    Synopsis We briefly review conventional biomarkers used clinically to 1) support a diagnosis and 2) monitor disease progression in patients with sarcoidosis. We describe potential new biomarkers identified by genome-wide screening and the approaches to discover these biomarkers. PMID:26593137

  5. Genomic variation in Plasmodium vivax malaria reveals regions under selective pressure.

    PubMed

    Diez Benavente, Ernest; Ward, Zoe; Chan, Wilson; Mohareb, Fady R; Sutherland, Colin J; Roper, Cally; Campino, Susana; Clark, Taane G

    2017-01-01

    Although Plasmodium vivax contributes to almost half of all malaria cases outside Africa, it has been relatively neglected compared to the more deadly P. falciparum. It is known that P. vivax populations possess high genetic diversity, differing geographically potentially due to different vector species, host genetics and environmental factors. We analysed the high-quality genomic data for 46 P. vivax isolates spanning 10 countries across 4 continents. Using population genetic methods we identified hotspots of selection pressure, including the previously reported MRP1 and DHPS genes, both putative drug resistance loci. Extra copies and deletions in the promoter region of another drug resistance candidate, MDR1 gene, and duplications in the Duffy binding protein gene (PvDBP) potentially involved in erythrocyte invasion, were also identified. For surveillance applications, continental-informative markers were found in putative drug resistance loci, and we show that organellar polymorphisms could classify P. vivax populations across continents and differentiate between Plasmodia spp. This study has shown that genomic diversity that lies within and between P. vivax populations can be used to elucidate potential drug resistance and invasion mechanisms, as well as facilitate the molecular barcoding of the parasite for surveillance applications.

  6. Genomic evaluation of regional dairy cattle breeds in single-breed and multibreed contexts.

    PubMed

    Jónás, D; Ducrocq, V; Fritz, S; Baur, A; Sanchez, M-P; Croiseau, P

    2017-02-01

    An important prerequisite for high prediction accuracy in genomic prediction is the availability of a large training population, which allows accurate marker effect estimation. This requirement is not fulfilled in case of regional breeds with a limited number of breeding animals. We assessed the efficiency of the current French routine genomic evaluation procedure in four regional breeds (Abondance, Tarentaise, French Simmental and Vosgienne) as well as the potential benefits when the training populations consisting of males and females of these breeds are merged to form a multibreed training population. Genomic evaluation was 5-11% more accurate than a pedigree-based BLUP in three of the four breeds, while the numerically smallest breed showed a < 1% increase in accuracy. Multibreed genomic evaluation was beneficial for two breeds (Abondance and French Simmental) with maximum gains of 5 and 8% in correlation coefficients between yield deviations and genomic estimated breeding values, when compared to the single-breed genomic evaluation results. Inflation of genomic evaluation of young candidates was also reduced. Our results indicate that genomic selection can be effective in regional breeds as well. Here, we provide empirical evidence proving that genetic distance between breeds is only one of the factors affecting the efficiency of multibreed genomic evaluation. © 2016 Blackwell Verlag GmbH.

  7. Comparison of 17 genome types of adenovirus type 3 identified among strains recovered from six continents.

    PubMed Central

    Li, Q G; Wadell, G

    1988-01-01

    Restriction endonucleases BamHI, BclI, BglI, BglII, BstEII, EcoRI, HindIII, HpaI, SalI, SmalI, XbalI, and XholI were used to analyze 61 selected strains of adenovirus type 3 (Ad3) isolated from Africa, Asia, Australia, Europe, North America, and South America. It was noted that the use of BamHI, BclI, BglII, HpaI, SalI, and SmaI was sufficient to distinguish 17 genome types; 13 of them were newly identified. All 17 Ad3 genome types could be divided into three genomic clusters. Genome types of Ad3 cluster 1 occurred in Africa, Europe, South America, and North America. Genomic cluster 2 was identified in Africa; genomic cluster 3 was identified in Africa, Asia, Australia, Europe (a few), and North America. This was of interest because 15 identified genome types of Ad7 could also be divided into three genomic clusters. The degree of genetic relatedness between the 17 Ad3 and the 15 Ad7 genome types was analyzed and was expressed in a three-dimensional model. Images PMID:2838500

  8. A variable region within the genome of Streptococcus pneumoniae contributes to strain-strain variation in virulence.

    PubMed

    Harvey, Richard M; Stroeher, Uwe H; Ogunniyi, Abiodun D; Smith-Vaughan, Heidi C; Leach, Amanda J; Paton, James C

    2011-05-05

    The bacterial factors responsible for the variation in invasive potential between different clones and serotypes of Streptococcus pneumoniae are largely unknown. Therefore, the isolation of rare serotype 1 carriage strains in Indigenous Australian communities provided a unique opportunity to compare the genomes of non-invasive and invasive isolates of the same serotype in order to identify such factors. The human virulence status of non-invasive, intermediately virulent and highly virulent serotype 1 isolates was reflected in mice and showed that whilst both human non-invasive and highly virulent isolates were able to colonize the murine nasopharynx equally, only the human highly virulent isolates were able to invade and survive in the murine lungs and blood. Genomic sequencing comparisons between these isolates identified 8 regions >1 kb in size that were specific to only the highly virulent isolates, and included a version of the pneumococcal pathogenicity island 1 variable region (PPI-1v), phage-associated adherence factors, transporters and metabolic enzymes. In particular, a phage-associated endolysin, a putative iron/lead permease and an operon within PPI-1v exhibited niche-specific changes in expression that suggest important roles for these genes in the lungs and blood. Moreover, in vivo competition between pneumococci carrying PPI-1v derivatives representing the two identified versions of the region showed that the version of PPI-1v in the highly virulent isolates was more competitive than the version from the less virulent isolates in the nasopharyngeal tissue, blood and lungs. This study is the first to perform genomic comparisons between serotype 1 isolates with distinct virulence profiles that correlate between mice and humans, and has highlighted the important role that hypervariable genomic loci, such as PPI-1v, play in pneumococcal disease. The findings of this study have important implications for understanding the processes that drive progression

  9. Genome-wide association meta-analysis in 269,867 individuals identifies new genetic and functional links to intelligence.

    PubMed

    Savage, Jeanne E; Jansen, Philip R; Stringer, Sven; Watanabe, Kyoko; Bryois, Julien; de Leeuw, Christiaan A; Nagel, Mats; Awasthi, Swapnil; Barr, Peter B; Coleman, Jonathan R I; Grasby, Katrina L; Hammerschlag, Anke R; Kaminski, Jakob A; Karlsson, Robert; Krapohl, Eva; Lam, Max; Nygaard, Marianne; Reynolds, Chandra A; Trampush, Joey W; Young, Hannah; Zabaneh, Delilah; Hägg, Sara; Hansell, Narelle K; Karlsson, Ida K; Linnarsson, Sten; Montgomery, Grant W; Muñoz-Manchado, Ana B; Quinlan, Erin B; Schumann, Gunter; Skene, Nathan G; Webb, Bradley T; White, Tonya; Arking, Dan E; Avramopoulos, Dimitrios; Bilder, Robert M; Bitsios, Panos; Burdick, Katherine E; Cannon, Tyrone D; Chiba-Falek, Ornit; Christoforou, Andrea; Cirulli, Elizabeth T; Congdon, Eliza; Corvin, Aiden; Davies, Gail; Deary, Ian J; DeRosse, Pamela; Dickinson, Dwight; Djurovic, Srdjan; Donohoe, Gary; Conley, Emily Drabant; Eriksson, Johan G; Espeseth, Thomas; Freimer, Nelson A; Giakoumaki, Stella; Giegling, Ina; Gill, Michael; Glahn, David C; Hariri, Ahmad R; Hatzimanolis, Alex; Keller, Matthew C; Knowles, Emma; Koltai, Deborah; Konte, Bettina; Lahti, Jari; Le Hellard, Stephanie; Lencz, Todd; Liewald, David C; London, Edythe; Lundervold, Astri J; Malhotra, Anil K; Melle, Ingrid; Morris, Derek; Need, Anna C; Ollier, William; Palotie, Aarno; Payton, Antony; Pendleton, Neil; Poldrack, Russell A; Räikkönen, Katri; Reinvang, Ivar; Roussos, Panos; Rujescu, Dan; Sabb, Fred W; Scult, Matthew A; Smeland, Olav B; Smyrnis, Nikolaos; Starr, John M; Steen, Vidar M; Stefanis, Nikos C; Straub, Richard E; Sundet, Kjetil; Tiemeier, Henning; Voineskos, Aristotle N; Weinberger, Daniel R; Widen, Elisabeth; Yu, Jin; Abecasis, Goncalo; Andreassen, Ole A; Breen, Gerome; Christiansen, Lene; Debrabant, Birgit; Dick, Danielle M; Heinz, Andreas; Hjerling-Leffler, Jens; Ikram, M Arfan; Kendler, Kenneth S; Martin, Nicholas G; Medland, Sarah E; Pedersen, Nancy L; Plomin, Robert; Polderman, Tinca J C; Ripke, Stephan; van der Sluis, Sophie; Sullivan, Patrick F; Vrieze, Scott I; Wright, Margaret J; Posthuma, Danielle

    2018-06-25

    Intelligence is highly heritable 1 and a major determinant of human health and well-being 2 . Recent genome-wide meta-analyses have identified 24 genomic loci linked to variation in intelligence 3-7 , but much about its genetic underpinnings remains to be discovered. Here, we present a large-scale genetic association study of intelligence (n = 269,867), identifying 205 associated genomic loci (190 new) and 1,016 genes (939 new) via positional mapping, expression quantitative trait locus (eQTL) mapping, chromatin interaction mapping, and gene-based association analysis. We find enrichment of genetic effects in conserved and coding regions and associations with 146 nonsynonymous exonic variants. Associated genes are strongly expressed in the brain, specifically in striatal medium spiny neurons and hippocampal pyramidal neurons. Gene set analyses implicate pathways related to nervous system development and synaptic structure. We confirm previous strong genetic correlations with multiple health-related outcomes, and Mendelian randomization analysis results suggest protective effects of intelligence for Alzheimer's disease and ADHD and bidirectional causation with pleiotropic effects for schizophrenia. These results are a major step forward in understanding the neurobiology of cognitive function as well as genetically related neurological and psychiatric disorders.

  10. Sequencing of a QTL-rich region of the Theobroma cacao genome using pooled BACs and the identification of trait specific candidate genes.

    PubMed

    Feltus, Frank A; Saski, Christopher A; Mockaitis, Keithanne; Haiminen, Niina; Parida, Laxmi; Smith, Zachary; Ford, James; Staton, Margaret E; Ficklin, Stephen P; Blackmon, Barbara P; Cheng, Chun-Huai; Schnell, Raymond J; Kuhn, David N; Motamayor, Juan-Carlos

    2011-07-27

    BAC-based physical maps provide for sequencing across an entire genome or a selected sub-genomic region of biological interest. Such a region can be approached with next-generation whole-genome sequencing and assembly as if it were an independent small genome. Using the minimum tiling path as a guide, specific BAC clones representing the prioritized genomic interval are selected, pooled, and used to prepare a sequencing library. This pooled BAC approach was taken to sequence and assemble a QTL-rich region, of ~3 Mbp and represented by twenty-seven BACs, on linkage group 5 of the Theobroma cacao cv. Matina 1-6 genome. Using various mixtures of read coverages from paired-end and linear 454 libraries, multiple assemblies of varied quality were generated. Quality was assessed by comparing the assembly of 454 reads with a subset of ten BACs individually sequenced and assembled using Sanger reads. A mixture of reads optimal for assembly was identified. We found, furthermore, that a quality assembly suitable for serving as a reference genome template could be obtained even with a reduced depth of sequencing coverage. Annotation of the resulting assembly revealed several genes potentially responsible for three T. cacao traits: black pod disease resistance, bean shape index, and pod weight. Our results, as with other pooled BAC sequencing reports, suggest that pooling portions of a minimum tiling path derived from a BAC-based physical map is an effective method to target sub-genomic regions for sequencing. While we focused on a single QTL region, other QTL regions of importance could be similarly sequenced allowing for biological discovery to take place before a high quality whole-genome assembly is completed.

  11. Substantial genome synteny preservation among woody angiosperm species: comparative genomics of Chinese chestnut (Castanea mollissima) and plant reference genomes.

    PubMed

    Staton, Margaret; Zhebentyayeva, Tetyana; Olukolu, Bode; Fang, Guang Chen; Nelson, Dana; Carlson, John E; Abbott, Albert G

    2015-10-05

    Chinese chestnut (Castanea mollissima) has emerged as a model species for the Fagaceae family with extensive genomic resources including a physical map, a dense genetic map and quantitative trait loci (QTLs) for chestnut blight resistance. These resources enable comparative genomics analyses relative to model plants. We assessed the degree of conservation between the chestnut genome and other well annotated and assembled plant genomic sequences, focusing on the QTL regions of most interest to the chestnut breeding community. The integrated physical and genetic map of Chinese chestnut has been improved to now include 858 shared sequence-based markers. The utility of the integrated map has also been improved through the addition of 42,970 BAC (bacterial artificial chromosome) end sequences spanning over 26 million bases of the estimated 800 Mb chestnut genome. Synteny between chestnut and ten model plant species was conducted on a macro-syntenic scale using sequences from both individual probes and BAC end sequences across the chestnut physical map. Blocks of synteny with chestnut were found in all ten reference species, with the percent of the chestnut physical map that could be aligned ranging from 10 to 39 %. The integrated genetic and physical map was utilized to identify BACs that spanned the three previously identified QTL regions conferring blight resistance. The clones were pooled and sequenced, yielding 396 sequence scaffolds covering 13.9 Mbp. Comparative genomic analysis on a microsytenic scale, using the QTL-associated genomic sequence, identified synteny from chestnut to other plant genomes ranging from 5.4 to 12.9 % of the genome sequences aligning. On both the macro- and micro-synteny levels, the peach, grape and poplar genomes were found to be the most structurally conserved with chestnut. Interestingly, these results did not strictly follow the expectation that decreased phylogenetic distance would correspond to increased levels of genome

  12. Genome scan study of prostate cancer in Arabs: identification of three genomic regions with multiple prostate cancer susceptibility loci in Tunisians.

    PubMed

    Shan, Jingxuan; Al-Rumaihi, Khalid; Rabah, Danny; Al-Bozom, Issam; Kizhakayil, Dhanya; Farhat, Karim; Al-Said, Sami; Kfoury, Hala; Dsouza, Shoba P; Rowe, Jillian; Khalak, Hanif G; Jafri, Shahzad; Aigha, Idil I; Chouchane, Lotfi

    2013-05-13

    Large databases focused on genetic susceptibility to prostate cancer have been accumulated from population studies of different ancestries, including Europeans and African-Americans. Arab populations, however, have been only rarely studied. Using Affymetrix Genome-Wide Human SNP Array 6, we conducted a genome-wide association study (GWAS) in which 534,781 single nucleotide polymorphisms (SNPs) were genotyped in 221 Tunisians (90 prostate cancer patients and 131 age-matched healthy controls). TaqMan SNP Genotyping Assays on 11 prostate cancer associated SNPs were performed in a distinct cohort of 337 individuals from Arab ancestry living in Qatar and Saudi Arabia (155 prostate cancer patients and 182 age-matched controls). In-silico expression quantitative trait locus (eQTL) analysis along with mRNA quantification of nearby genes was performed to identify loci potentially cis-regulated by the identified SNPs. Three chromosomal regions, encompassing 14 SNPs, are significantly associated with prostate cancer risk in the Tunisian population (P = 1 × 10-4 to P = 1 × 10-5). In addition to SNPs located on chromosome 17q21, previously found associated with prostate cancer in Western populations, two novel chromosomal regions are revealed on chromosome 9p24 and 22q13. eQTL analysis and mRNA quantification indicate that the prostate cancer associated SNPs of chromosome 17 could enhance the expression of STAT5B gene. Our findings, identifying novel GWAS prostate cancer susceptibility loci, indicate that prostate cancer genetic risk factors could be ethnic specific.

  13. Genomic regions responsible for seminal and crown root lengths identified by 2D & 3D root system image analysis.

    PubMed

    Uga, Yusaku; Assaranurak, Ithipong; Kitomi, Yuka; Larson, Brandon G; Craft, Eric J; Shaff, Jon E; McCouch, Susan R; Kochian, Leon V

    2018-04-20

    Genetic improvement of root system architecture is a promising approach for improved uptake of water and mineral nutrients distributed unevenly in the soil. To identify genomic regions associated with the length of different root types in rice, we quantified root system architecture in a set of 26 chromosome segment substitution lines derived from a cross between lowland indica rice, IR64, and upland tropical japonica rice, Kinandang Patong, (IK-CSSLs), using 2D & 3D root phenotyping platforms. Lengths of seminal and crown roots in the IK-CSSLs grown under hydroponic conditions were measured by 2D image analysis (RootReader2D). Twelve CSSLs showed significantly longer seminal root length than the recurrent parent IR64. Of these, 8 CSSLs also exhibited longer total length of the three longest crown roots compared to IR64. Three-dimensional image analysis (RootReader3D) for these CSSLs grown in gellan gum revealed that only one CSSL, SL1003, showed significantly longer total root length than IR64. To characterize the root morphology of SL1003 under soil conditions, SL1003 was grown in Turface, a soil-like growth media, and roots were quantified using RootReader3D. SL1003 had larger total root length and increased total crown root length than did IR64, although its seminal root length was similar to that of IR64. The larger TRL in SL1003 may be due to increased crown root length. SL1003 carries an introgression from Kinandang Patong on the long arm of chromosome 1 in the genetic background of IR64. We conclude that this region harbors a QTL controlling crown root elongation.

  14. Identification of coding and non-coding mutational hotspots in cancer genomes.

    PubMed

    Piraino, Scott W; Furney, Simon J

    2017-01-05

    The identification of mutations that play a causal role in tumour development, so called "driver" mutations, is of critical importance for understanding how cancers form and how they might be treated. Several large cancer sequencing projects have identified genes that are recurrently mutated in cancer patients, suggesting a role in tumourigenesis. While the landscape of coding drivers has been extensively studied and many of the most prominent driver genes are well characterised, comparatively less is known about the role of mutations in the non-coding regions of the genome in cancer development. The continuing fall in genome sequencing costs has resulted in a concomitant increase in the number of cancer whole genome sequences being produced, facilitating systematic interrogation of both the coding and non-coding regions of cancer genomes. To examine the mutational landscapes of tumour genomes we have developed a novel method to identify mutational hotspots in tumour genomes using both mutational data and information on evolutionary conservation. We have applied our methodology to over 1300 whole cancer genomes and show that it identifies prominent coding and non-coding regions that are known or highly suspected to play a role in cancer. Importantly, we applied our method to the entire genome, rather than relying on predefined annotations (e.g. promoter regions) and we highlight recurrently mutated regions that may have resulted from increased exposure to mutational processes rather than selection, some of which have been identified previously as targets of selection. Finally, we implicate several pan-cancer and cancer-specific candidate non-coding regions, which could be involved in tumourigenesis. We have developed a framework to identify mutational hotspots in cancer genomes, which is applicable to the entire genome. This framework identifies known and novel coding and non-coding mutional hotspots and can be used to differentiate candidate driver regions from

  15. Genome-wide association analysis of more than 120,000 individuals identifies 15 new susceptibility loci for breast cancer

    PubMed Central

    Michailidou, Kyriaki; Beesley, Jonathan; Lindstrom, Sara; Canisius, Sander; Dennis, Joe; Lush, Michael; Maranian, Mel J; Bolla, Manjeet K; Wang, Qin; Shah, Mitul; Perkins, Barbara J; Czene, Kamila; Eriksson, Mikael; Darabi, Hatef; Brand, Judith S; Bojesen, Stig E; Nordestgaard, Børge G; Flyger, Henrik; Nielsen, Sune F; Rahman, Nazneen; Turnbull, Clare; Fletcher, Olivia; Peto, Julian; Gibson, Lorna; dos-Santos-Silva, Isabel; Chang-Claude, Jenny; Flesch-Janys, Dieter; Rudolph, Anja; Eilber, Ursula; Behrens, Sabine; Nevanlinna, Heli; Muranen, Taru A; Aittomäki, Kristiina; Blomqvist, Carl; Khan, Sofia; Aaltonen, Kirsimari; Ahsan, Habibul; Kibriya, Muhammad G; Whittemore, Alice S; John, Esther M; Malone, Kathleen E; Gammon, Marilie D; Santella, Regina M; Ursin, Giske; Makalic, Enes; Schmidt, Daniel F; Casey, Graham; Hunter, David J; Gapstur, Susan M; Gaudet, Mia M; Diver, W Ryan; Haiman, Christopher A; Schumacher, Fredrick; Henderson, Brian E; Le Marchand, Loic; Berg, Christine D; Chanock, Stephen; Figueroa, Jonine; Hoover, Robert N; Lambrechts, Diether; Neven, Patrick; Wildiers, Hans; van Limbergen, Erik; Schmidt, Marjanka K; Broeks, Annegien; Verhoef, Senno; Cornelissen, Sten; Couch, Fergus J; Olson, Janet E; Hallberg, Emily; Vachon, Celine; Waisfisz, Quinten; Meijers-Heijboer, Hanne; Adank, Muriel A; van der Luijt, Rob B; Li, Jingmei; Liu, Jianjun; Humphreys, Keith; Kang, Daehee; Choi, Ji-Yeob; Park, Sue K; Yoo, Keun-Young; Matsuo, Keitaro; Ito, Hidemi; Iwata, Hiroji; Tajima, Kazuo; Guénel, Pascal; Truong, Thérèse; Mulot, Claire; Sanchez, Marie; Burwinkel, Barbara; Marme, Frederik; Surowy, Harald; Sohn, Christof; Wu, Anna H; Tseng, Chiu-chen; Van Den Berg, David; Stram, Daniel O; González-Neira, Anna; Benitez, Javier; Zamora, M Pilar; Perez, Jose Ignacio Arias; Shu, Xiao-Ou; Lu, Wei; Gao, Yu-Tang; Cai, Hui; Cox, Angela; Cross, Simon S; Reed, Malcolm WR; Andrulis, Irene L; Knight, Julia A; Glendon, Gord; Mulligan, Anna Marie; Sawyer, Elinor J; Tomlinson, Ian; Kerin, Michael J; Miller, Nicola; Lindblom, Annika; Margolin, Sara; Teo, Soo Hwang; Yip, Cheng Har; Taib, Nur Aishah Mohd; TAN, Gie-Hooi; Hooning, Maartje J; Hollestelle, Antoinette; Martens, John WM; Collée, J Margriet; Blot, William; Signorello, Lisa B; Cai, Qiuyin; Hopper, John L; Southey, Melissa C; Tsimiklis, Helen; Apicella, Carmel; Shen, Chen-Yang; Hsiung, Chia-Ni; Wu, Pei-Ei; Hou, Ming-Feng; Kristensen, Vessela N; Nord, Silje; Alnaes, Grethe I Grenaker; Giles, Graham G; Milne, Roger L; McLean, Catriona; Canzian, Federico; Trichopoulos, Dmitrios; Peeters, Petra; Lund, Eiliv; Sund, Malin; Khaw, Kay-Tee; Gunter, Marc J; Palli, Domenico; Mortensen, Lotte Maxild; Dossus, Laure; Huerta, Jose-Maria; Meindl, Alfons; Schmutzler, Rita K; Sutter, Christian; Yang, Rongxi; Muir, Kenneth; Lophatananon, Artitaya; Stewart-Brown, Sarah; Siriwanarangsan, Pornthep; Hartman, Mikael; Miao, Hui; Chia, Kee Seng; Chan, Ching Wan; Fasching, Peter A; Hein, Alexander; Beckmann, Matthias W; Haeberle, Lothar; Brenner, Hermann; Dieffenbach, Aida Karina; Arndt, Volker; Stegmaier, Christa; Ashworth, Alan; Orr, Nick; Schoemaker, Minouk J; Swerdlow, Anthony J; Brinton, Louise; Garcia-Closas, Montserrat; Zheng, Wei; Halverson, Sandra L; Shrubsole, Martha; Long, Jirong; Goldberg, Mark S; Labrèche, France; Dumont, Martine; Winqvist, Robert; Pylkäs, Katri; Jukkola-Vuorinen, Arja; Grip, Mervi; Brauch, Hiltrud; Hamann, Ute; Brüning, Thomas; Radice, Paolo; Peterlongo, Paolo; Manoukian, Siranoush; Bernard, Loris; Bogdanova, Natalia V; Dörk, Thilo; Mannermaa, Arto; Kataja, Vesa; Kosma, Veli-Matti; Hartikainen, Jaana M; Devilee, Peter; Tollenaar, Robert AEM; Seynaeve, Caroline; Van Asperen, Christi J; Jakubowska, Anna; Lubinski, Jan; Jaworska, Katarzyna; Huzarski, Tomasz; Sangrajrang, Suleeporn; Gaborieau, Valerie; Brennan, Paul; McKay, James; Slager, Susan; Toland, Amanda E; Ambrosone, Christine B; Yannoukakos, Drakoulis; Kabisch, Maria; Torres, Diana; Neuhausen, Susan L; Anton-Culver, Hoda; Luccarini, Craig; Baynes, Caroline; Ahmed, Shahana; Healey, Catherine S; Tessier, Daniel C; Vincent, Daniel; Bacot, Francois; Pita, Guillermo; Alonso, M Rosario; Álvarez, Nuria; Herrero, Daniel; Simard, Jacques; Pharoah, Paul PDP; Kraft, Peter; Dunning, Alison M; Chenevix-Trench, Georgia; Hall, Per; Easton, Douglas F

    2015-01-01

    Genome wide association studies (GWAS) and large scale replication studies have identified common variants in 79 loci associated with breast cancer, explaining ~14% of the familial risk of the disease. To identify new susceptibility loci, we performed a meta-analysis of 11 GWAS comprising of 15,748 breast cancer cases and 18,084 controls, and 46,785 cases and 42,892 controls from 41 studies genotyped on a 200K custom array (iCOGS). Analyses were restricted to women of European ancestry. Genotypes for more than 11M SNPs were generated by imputation using the 1000 Genomes Project reference panel. We identified 15 novel loci associated with breast cancer at P<5×10−8. Combining association analysis with ChIP-Seq data in mammary cell lines and ChIA-PET chromatin interaction data in ENCODE, we identified likely target genes in two regions: SETBP1 on 18q12.3 and RNF115 and PDZK1 on 1q21.1. One association appears to be driven by an amino-acid substitution in EXO1. PMID:25751625

  16. Genome-wide association analysis of more than 120,000 individuals identifies 15 new susceptibility loci for breast cancer.

    PubMed

    Michailidou, Kyriaki; Beesley, Jonathan; Lindstrom, Sara; Canisius, Sander; Dennis, Joe; Lush, Michael J; Maranian, Mel J; Bolla, Manjeet K; Wang, Qin; Shah, Mitul; Perkins, Barbara J; Czene, Kamila; Eriksson, Mikael; Darabi, Hatef; Brand, Judith S; Bojesen, Stig E; Nordestgaard, Børge G; Flyger, Henrik; Nielsen, Sune F; Rahman, Nazneen; Turnbull, Clare; Fletcher, Olivia; Peto, Julian; Gibson, Lorna; dos-Santos-Silva, Isabel; Chang-Claude, Jenny; Flesch-Janys, Dieter; Rudolph, Anja; Eilber, Ursula; Behrens, Sabine; Nevanlinna, Heli; Muranen, Taru A; Aittomäki, Kristiina; Blomqvist, Carl; Khan, Sofia; Aaltonen, Kirsimari; Ahsan, Habibul; Kibriya, Muhammad G; Whittemore, Alice S; John, Esther M; Malone, Kathleen E; Gammon, Marilie D; Santella, Regina M; Ursin, Giske; Makalic, Enes; Schmidt, Daniel F; Casey, Graham; Hunter, David J; Gapstur, Susan M; Gaudet, Mia M; Diver, W Ryan; Haiman, Christopher A; Schumacher, Fredrick; Henderson, Brian E; Le Marchand, Loic; Berg, Christine D; Chanock, Stephen J; Figueroa, Jonine; Hoover, Robert N; Lambrechts, Diether; Neven, Patrick; Wildiers, Hans; van Limbergen, Erik; Schmidt, Marjanka K; Broeks, Annegien; Verhoef, Senno; Cornelissen, Sten; Couch, Fergus J; Olson, Janet E; Hallberg, Emily; Vachon, Celine; Waisfisz, Quinten; Meijers-Heijboer, Hanne; Adank, Muriel A; van der Luijt, Rob B; Li, Jingmei; Liu, Jianjun; Humphreys, Keith; Kang, Daehee; Choi, Ji-Yeob; Park, Sue K; Yoo, Keun-Young; Matsuo, Keitaro; Ito, Hidemi; Iwata, Hiroji; Tajima, Kazuo; Guénel, Pascal; Truong, Thérèse; Mulot, Claire; Sanchez, Marie; Burwinkel, Barbara; Marme, Frederik; Surowy, Harald; Sohn, Christof; Wu, Anna H; Tseng, Chiu-chen; Van Den Berg, David; Stram, Daniel O; González-Neira, Anna; Benitez, Javier; Zamora, M Pilar; Perez, Jose Ignacio Arias; Shu, Xiao-Ou; Lu, Wei; Gao, Yu-Tang; Cai, Hui; Cox, Angela; Cross, Simon S; Reed, Malcolm W R; Andrulis, Irene L; Knight, Julia A; Glendon, Gord; Mulligan, Anna Marie; Sawyer, Elinor J; Tomlinson, Ian; Kerin, Michael J; Miller, Nicola; Lindblom, Annika; Margolin, Sara; Teo, Soo Hwang; Yip, Cheng Har; Taib, Nur Aishah Mohd; Tan, Gie-Hooi; Hooning, Maartje J; Hollestelle, Antoinette; Martens, John W M; Collée, J Margriet; Blot, William; Signorello, Lisa B; Cai, Qiuyin; Hopper, John L; Southey, Melissa C; Tsimiklis, Helen; Apicella, Carmel; Shen, Chen-Yang; Hsiung, Chia-Ni; Wu, Pei-Ei; Hou, Ming-Feng; Kristensen, Vessela N; Nord, Silje; Alnaes, Grethe I Grenaker; Giles, Graham G; Milne, Roger L; McLean, Catriona; Canzian, Federico; Trichopoulos, Dimitrios; Peeters, Petra; Lund, Eiliv; Sund, Malin; Khaw, Kay-Tee; Gunter, Marc J; Palli, Domenico; Mortensen, Lotte Maxild; Dossus, Laure; Huerta, Jose-Maria; Meindl, Alfons; Schmutzler, Rita K; Sutter, Christian; Yang, Rongxi; Muir, Kenneth; Lophatananon, Artitaya; Stewart-Brown, Sarah; Siriwanarangsan, Pornthep; Hartman, Mikael; Miao, Hui; Chia, Kee Seng; Chan, Ching Wan; Fasching, Peter A; Hein, Alexander; Beckmann, Matthias W; Haeberle, Lothar; Brenner, Hermann; Dieffenbach, Aida Karina; Arndt, Volker; Stegmaier, Christa; Ashworth, Alan; Orr, Nick; Schoemaker, Minouk J; Swerdlow, Anthony J; Brinton, Louise; Garcia-Closas, Montserrat; Zheng, Wei; Halverson, Sandra L; Shrubsole, Martha; Long, Jirong; Goldberg, Mark S; Labrèche, France; Dumont, Martine; Winqvist, Robert; Pylkäs, Katri; Jukkola-Vuorinen, Arja; Grip, Mervi; Brauch, Hiltrud; Hamann, Ute; Brüning, Thomas; Radice, Paolo; Peterlongo, Paolo; Manoukian, Siranoush; Bernard, Loris; Bogdanova, Natalia V; Dörk, Thilo; Mannermaa, Arto; Kataja, Vesa; Kosma, Veli-Matti; Hartikainen, Jaana M; Devilee, Peter; Tollenaar, Robert A E M; Seynaeve, Caroline; Van Asperen, Christi J; Jakubowska, Anna; Lubinski, Jan; Jaworska, Katarzyna; Huzarski, Tomasz; Sangrajrang, Suleeporn; Gaborieau, Valerie; Brennan, Paul; McKay, James; Slager, Susan; Toland, Amanda E; Ambrosone, Christine B; Yannoukakos, Drakoulis; Kabisch, Maria; Torres, Diana; Neuhausen, Susan L; Anton-Culver, Hoda; Luccarini, Craig; Baynes, Caroline; Ahmed, Shahana; Healey, Catherine S; Tessier, Daniel C; Vincent, Daniel; Bacot, Francois; Pita, Guillermo; Alonso, M Rosario; Álvarez, Nuria; Herrero, Daniel; Simard, Jacques; Pharoah, Paul P D P; Kraft, Peter; Dunning, Alison M; Chenevix-Trench, Georgia; Hall, Per; Easton, Douglas F

    2015-04-01

    Genome-wide association studies (GWAS) and large-scale replication studies have identified common variants in 79 loci associated with breast cancer, explaining ∼14% of the familial risk of the disease. To identify new susceptibility loci, we performed a meta-analysis of 11 GWAS, comprising 15,748 breast cancer cases and 18,084 controls together with 46,785 cases and 42,892 controls from 41 studies genotyped on a 211,155-marker custom array (iCOGS). Analyses were restricted to women of European ancestry. We generated genotypes for more than 11 million SNPs by imputation using the 1000 Genomes Project reference panel, and we identified 15 new loci associated with breast cancer at P < 5 × 10(-8). Combining association analysis with ChIP-seq chromatin binding data in mammary cell lines and ChIA-PET chromatin interaction data from ENCODE, we identified likely target genes in two regions: SETBP1 at 18q12.3 and RNF115 and PDZK1 at 1q21.1. One association appears to be driven by an amino acid substitution encoded in EXO1.

  17. Genome and transcriptome sequencing identifies breeding targets in the orphan crop tef (Eragrostis tef).

    PubMed

    Cannarozzi, Gina; Plaza-Wüthrich, Sonia; Esfeld, Korinna; Larti, Stéphanie; Wilson, Yi Song; Girma, Dejene; de Castro, Edouard; Chanyalew, Solomon; Blösch, Regula; Farinelli, Laurent; Lyons, Eric; Schneider, Michel; Falquet, Laurent; Kuhlemeier, Cris; Assefa, Kebebew; Tadele, Zerihun

    2014-07-09

    Tef (Eragrostis tef), an indigenous cereal critical to food security in the Horn of Africa, is rich in minerals and protein, resistant to many biotic and abiotic stresses and safe for diabetics as well as sufferers of immune reactions to wheat gluten. We present the genome of tef, the first species in the grass subfamily Chloridoideae and the first allotetraploid assembled de novo. We sequenced the tef genome for marker-assisted breeding, to shed light on the molecular mechanisms conferring tef's desirable nutritional and agronomic properties, and to make its genome publicly available as a community resource. The draft genome contains 672 Mbp representing 87% of the genome size estimated from flow cytometry. We also sequenced two transcriptomes, one from a normalized RNA library and another from unnormalized RNASeq data. The normalized RNA library revealed around 38000 transcripts that were then annotated by the SwissProt group. The CoGe comparative genomics platform was used to compare the tef genome to other genomes, notably sorghum. Scaffolds comprising approximately half of the genome size were ordered by syntenic alignment to sorghum producing tef pseudo-chromosomes, which were sorted into A and B genomes as well as compared to the genetic map of tef. The draft genome was used to identify novel SSR markers, investigate target genes for abiotic stress resistance studies, and understand the evolution of the prolamin family of proteins that are responsible for the immune response to gluten. It is highly plausible that breeding targets previously identified in other cereal crops will also be valuable breeding targets in tef. The draft genome and transcriptome will be of great use for identifying these targets for genetic improvement of this orphan crop that is vital for feeding 50 million people in the Horn of Africa.

  18. Genome-Wide Linkage Analysis to Identify Genetic Modifiers of ALK Mutation Penetrance in Familial Neuroblastoma

    PubMed Central

    Devoto, Marcella; Specchia, Claudia; Laudenslager, Marci; Longo, Luca; Hakonarson, Hakon; Maris, John; Mossé, Yael

    2011-01-01

    Background Neuroblastoma (NB) is an important childhood cancer with a strong genetic component related to disease susceptibility. Approximately 1% of NB cases have a positive family history. Following a genome-wide linkage analysis and sequencing of candidate genes in the critical region, we identified ALK as the major familial NB gene. Dominant mutations in ALK are found in more than 50% of familial NB cases. However, in the families used for the linkage study, only about 50% of carriers of ALK mutations are affected by NB. Methods To test whether genetic variation may explain the reduced penetrance of the disease phenotype, we analyzed genome-wide genotype data in ALK mutation-positive families using a model-based linkage approach with different liability classes for carriers and non-carriers of ALK mutations. Results The region with the highest LOD score was located at chromosome 2p23–p24 and included the ALK locus under models of dominant and recessive inheritance. Conclusions This finding suggests that variants in the non-mutated ALK gene or another gene linked to it may affect penetrance of the ALK mutations and risk of developing NB in familial cases. PMID:21734404

  19. Assembling the Setaria italica L. Beauv. genome into nine chromosomes and insights into regions affecting growth and drought tolerance

    PubMed Central

    Tsai, Kevin J.; Lu, Mei-Yeh Jade; Yang, Kai-Jung; Li, Mengyun; Teng, Yuchuan; Chen, Shihmay; Ku, Maurice S. B.; Li, Wen-Hsiung

    2016-01-01

    The diploid C4 plant foxtail millet (Setaria italica L. Beauv.) is an important crop in many parts of Africa and Asia for the vast consumption of its grain and ability to grow in harsh environments, but remains understudied in terms of complete genomic architecture. To date, there have been only two genome assembly and annotation efforts with neither assembly reaching over 86% of the estimated genome size. We have combined de novo assembly with custom reference-guided improvements on a popular cultivar of foxtail millet and have achieved a genome assembly of 477 Mbp in length, which represents over 97% of the estimated 490 Mbp. The assembly anchors over 98% of the predicted genes to the nine assembled nuclear chromosomes and contains more functional annotation gene models than previous assemblies. Our annotation has identified a large number of unique gene ontology terms related to metabolic activities, a region of chromosome 9 with several growth factor proteins, and regions syntenic with pearl millet or maize genomic regions that have been previously shown to affect growth. The new assembly and annotation for this important species can be used for detailed investigation and future innovations in growth for millet and other grains. PMID:27734962

  20. Assembling the Setaria italica L. Beauv. genome into nine chromosomes and insights into regions affecting growth and drought tolerance.

    PubMed

    Tsai, Kevin J; Lu, Mei-Yeh Jade; Yang, Kai-Jung; Li, Mengyun; Teng, Yuchuan; Chen, Shihmay; Ku, Maurice S B; Li, Wen-Hsiung

    2016-10-13

    The diploid C 4 plant foxtail millet (Setaria italica L. Beauv.) is an important crop in many parts of Africa and Asia for the vast consumption of its grain and ability to grow in harsh environments, but remains understudied in terms of complete genomic architecture. To date, there have been only two genome assembly and annotation efforts with neither assembly reaching over 86% of the estimated genome size. We have combined de novo assembly with custom reference-guided improvements on a popular cultivar of foxtail millet and have achieved a genome assembly of 477 Mbp in length, which represents over 97% of the estimated 490 Mbp. The assembly anchors over 98% of the predicted genes to the nine assembled nuclear chromosomes and contains more functional annotation gene models than previous assemblies. Our annotation has identified a large number of unique gene ontology terms related to metabolic activities, a region of chromosome 9 with several growth factor proteins, and regions syntenic with pearl millet or maize genomic regions that have been previously shown to affect growth. The new assembly and annotation for this important species can be used for detailed investigation and future innovations in growth for millet and other grains.

  1. Engineered chromosome-based genetic mapping establishes a 3.7 Mb critical genomic region for Down syndrome-associated heart defects in mice.

    PubMed

    Liu, Chunhong; Morishima, Masae; Jiang, Xiaoling; Yu, Tao; Meng, Kai; Ray, Debjit; Pao, Annie; Ye, Ping; Parmacek, Michael S; Yu, Y Eugene

    2014-06-01

    Trisomy 21 (Down syndrome, DS) is the most common human genetic anomaly associated with heart defects. Based on evolutionary conservation, DS-associated heart defects have been modeled in mice. By generating and analyzing mouse mutants carrying different genomic rearrangements in human chromosome 21 (Hsa21) syntenic regions, we found the triplication of the Tiam1-Kcnj6 region on mouse chromosome 16 (Mmu16) resulted in DS-related cardiovascular abnormalities. In this study, we developed two tandem duplications spanning the Tiam1-Kcnj6 genomic region on Mmu16 using recombinase-mediated genome engineering, Dp(16)3Yey and Dp(16)4Yey, spanning the 2.1 Mb Tiam1-Il10rb and 3.7 Mb Ifnar1-Kcnj6 regions, respectively. We found that Dp(16)4Yey/+, but not Dp(16)3Yey/+, led to heart defects, suggesting the triplication of the Ifnar1-Kcnj6 region is sufficient to cause DS-associated heart defects. Our transcriptional analysis of Dp(16)4Yey/+ embryos showed that the Hsa21 gene orthologs located within the duplicated interval were expressed at the elevated levels, reflecting the consequences of the gene dosage alterations. Therefore, we have identified a 3.7 Mb genomic region, the smallest critical genomic region, for DS-associated heart defects, and our results should set the stage for the final step to establish the identities of the causal gene(s), whose elevated expression(s) directly underlie this major DS phenotype.

  2. Genomic regions controlling shape variation in the first upper molar of the house mouse

    PubMed Central

    Pantalacci, Sophie; Turner, Leslie M; Steingrimsson, Eirikur; Renaud, Sabrina

    2017-01-01

    Numerous loci of large effect have been shown to underlie phenotypic variation between species. However, loci with subtle effects are presumably more frequently involved in microevolutionary processes but have rarely been discovered. We explore the genetic basis of shape variation in the first upper molar of hybrid mice between Mus musculus musculus and M. m. domesticus. We performed the first genome-wide association study for molar shape and used 3D surface morphometrics to quantify subtle variation between individuals. We show that many loci of small effect underlie phenotypic variation, and identify five genomic regions associated with tooth shape; one region contained the gene microphthalmia-associated transcription factor Mitf that has previously been associated with tooth malformations. Using a panel of five mutant laboratory strains, we show the effect of the Mitf gene on tooth shape. This is the first report of a gene causing subtle but consistent variation in tooth shape resembling variation in nature. PMID:29091026

  3. Genome-wide imputation study identifies novel HLA locus for pulmonary fibrosis and potential role for auto-immunity in fibrotic idiopathic interstitial pneumonia.

    PubMed

    Fingerlin, Tasha E; Zhang, Weiming; Yang, Ivana V; Ainsworth, Hannah C; Russell, Pamela H; Blumhagen, Rachel Z; Schwarz, Marvin I; Brown, Kevin K; Steele, Mark P; Loyd, James E; Cosgrove, Gregory P; Lynch, David A; Groshong, Steve; Collard, Harold R; Wolters, Paul J; Bradford, Williamson Z; Kossen, Karl; Seiwert, Scott D; du Bois, Roland M; Garcia, Christine Kim; Devine, Megan S; Gudmundsson, Gunnar; Isaksson, Helgi J; Kaminski, Naftali; Zhang, Yingze; Gibson, Kevin F; Lancaster, Lisa H; Maher, Toby M; Molyneaux, Philip L; Wells, Athol U; Moffatt, Miriam F; Selman, Moises; Pardo, Annie; Kim, Dong Soon; Crapo, James D; Make, Barry J; Regan, Elizabeth A; Walek, Dinesha S; Daniel, Jerry J; Kamatani, Yoichiro; Zelenika, Diana; Murphy, Elissa; Smith, Keith; McKean, David; Pedersen, Brent S; Talbert, Janet; Powers, Julia; Markin, Cheryl R; Beckman, Kenneth B; Lathrop, Mark; Freed, Brian; Langefeld, Carl D; Schwartz, David A

    2016-06-07

    Fibrotic idiopathic interstitial pneumonias (fIIP) are a group of fatal lung diseases with largely unknown etiology and without definitive treatment other than lung transplant to prolong life. There is strong evidence for the importance of both rare and common genetic risk alleles in familial and sporadic disease. We have previously used genome-wide single nucleotide polymorphism data to identify 10 risk loci for fIIP. Here we extend that work to imputed genome-wide genotypes and conduct new RNA sequencing studies of lung tissue to identify and characterize new fIIP risk loci. We performed genome-wide genotype imputation association analyses in 1616 non-Hispanic white (NHW) cases and 4683 NHW controls followed by validation and replication (878 cases, 2017 controls) genotyping and targeted gene expression in lung tissue. Following meta-analysis of the discovery and replication populations, we identified a novel fIIP locus in the HLA region of chromosome 6 (rs7887 P meta  = 3.7 × 10(-09)). Imputation of classic HLA alleles identified two in high linkage disequilibrium that are associated with fIIP (DRB1*15:01 P = 1.3 × 10(-7) and DQB1*06:02 P = 6.1 × 10(-8)). Targeted RNA-sequencing of the HLA locus identified 21 genes differentially expressed between fibrotic and control lung tissue (Q < 0.001), many of which are involved in immune and inflammatory response regulation. In addition, the putative risk alleles, DRB1*15:01 and DQB1*06:02, are associated with expression of the DQB1 gene among fIIP cases (Q < 1 × 10(-16)). We have identified a genome-wide significant association between the HLA region and fIIP. Two HLA alleles are associated with fIIP and affect expression of HLA genes in lung tissue, indicating that the potential genetic risk due to HLA alleles may involve gene regulation in addition to altered protein structure. These studies reveal the importance of the HLA region for risk of fIIP and a basis for the potential

  4. A genome-wide association study of atopic dermatitis identifies loci with overlapping effects on asthma and psoriasis.

    PubMed

    Weidinger, Stephan; Willis-Owen, Saffron A G; Kamatani, Yoichiro; Baurecht, Hansjörg; Morar, Nilesh; Liang, Liming; Edser, Pauline; Street, Teresa; Rodriguez, Elke; O'Regan, Grainne M; Beattie, Paula; Fölster-Holst, Regina; Franke, Andre; Novak, Natalija; Fahy, Caoimhe M; Winge, Mårten C G; Kabesch, Michael; Illig, Thomas; Heath, Simon; Söderhäll, Cilla; Melén, Erik; Pershagen, Göran; Kere, Juha; Bradley, Maria; Lieden, Agne; Nordenskjold, Magnus; Harper, John I; McLean, W H Irwin; Brown, Sara J; Cookson, William O C; Lathrop, G Mark; Irvine, Alan D; Moffatt, Miriam F

    2013-12-01

    Atopic dermatitis (AD) is the most common dermatological disease of childhood. Many children with AD have asthma and AD shares regions of genetic linkage with psoriasis, another chronic inflammatory skin disease. We present here a genome-wide association study (GWAS) of childhood-onset AD in 1563 European cases with known asthma status and 4054 European controls. Using Illumina genotyping followed by imputation, we generated 268 034 consensus genotypes and in excess of 2 million single nucleotide polymorphisms (SNPs) for analysis. Association signals were assessed for replication in a second panel of 2286 European cases and 3160 European controls. Four loci achieved genome-wide significance for AD and replicated consistently across all cohorts. These included the epidermal differentiation complex (EDC) on chromosome 1, the genomic region proximal to LRRC32 on chromosome 11, the RAD50/IL13 locus on chromosome 5 and the major histocompatibility complex (MHC) on chromosome 6; reflecting action of classical HLA alleles. We observed variation in the contribution towards co-morbid asthma for these regions of association. We further explored the genetic relationship between AD, asthma and psoriasis by examining previously identified susceptibility SNPs for these diseases. We found considerable overlap between AD and psoriasis together with variable coincidence between allergic rhinitis (AR) and asthma. Our results indicate that the pathogenesis of AD incorporates immune and epidermal barrier defects with combinations of specific and overlapping effects at individual loci.

  5. A genome-wide association study of atopic dermatitis identifies loci with overlapping effects on asthma and psoriasis

    PubMed Central

    Weidinger, Stephan; Willis-Owen, Saffron A.G.; Kamatani, Yoichiro; Baurecht, Hansjörg; Morar, Nilesh; Liang, Liming; Edser, Pauline; Street, Teresa; Rodriguez, Elke; O'Regan, Grainne M.; Beattie, Paula; Fölster-Holst, Regina; Franke, Andre; Novak, Natalija; Fahy, Caoimhe M.; Winge, Mårten C.G.; Kabesch, Michael; Illig, Thomas; Heath, Simon; Söderhäll, Cilla; Melén, Erik; Pershagen, Göran; Kere, Juha; Bradley, Maria; Lieden, Agne; Nordenskjold, Magnus; Harper, John I.; Mclean, W.H. Irwin; Brown, Sara J.; Cookson, William O.C.; Lathrop, G. Mark; Irvine, Alan D.; Moffatt, Miriam F.

    2013-01-01

    Atopic dermatitis (AD) is the most common dermatological disease of childhood. Many children with AD have asthma and AD shares regions of genetic linkage with psoriasis, another chronic inflammatory skin disease. We present here a genome-wide association study (GWAS) of childhood-onset AD in 1563 European cases with known asthma status and 4054 European controls. Using Illumina genotyping followed by imputation, we generated 268 034 consensus genotypes and in excess of 2 million single nucleotide polymorphisms (SNPs) for analysis. Association signals were assessed for replication in a second panel of 2286 European cases and 3160 European controls. Four loci achieved genome-wide significance for AD and replicated consistently across all cohorts. These included the epidermal differentiation complex (EDC) on chromosome 1, the genomic region proximal to LRRC32 on chromosome 11, the RAD50/IL13 locus on chromosome 5 and the major histocompatibility complex (MHC) on chromosome 6; reflecting action of classical HLA alleles. We observed variation in the contribution towards co-morbid asthma for these regions of association. We further explored the genetic relationship between AD, asthma and psoriasis by examining previously identified susceptibility SNPs for these diseases. We found considerable overlap between AD and psoriasis together with variable coincidence between allergic rhinitis (AR) and asthma. Our results indicate that the pathogenesis of AD incorporates immune and epidermal barrier defects with combinations of specific and overlapping effects at individual loci. PMID:23886662

  6. Pool-based genome-wide association study identified novel candidate regions on BTA9 and 14 for oleic acid percentage in Japanese Black cattle.

    PubMed

    Kawaguchi, Fuki; Kigoshi, Hiroto; Nakajima, Ayaka; Matsumoto, Yuta; Uemoto, Yoshinobu; Fukushima, Moriyuki; Yoshida, Emi; Iwamoto, Eiji; Akiyama, Takayuki; Kohama, Namiko; Kobayashi, Eiji; Honda, Takeshi; Oyama, Kenji; Mannen, Hideyuki; Sasazaki, Shinji

    2018-05-17

    Fatty acid composition is an important indicator of beef quality. The objective of this study was to search the potential candidate region for fatty acid composition. We performed pool-based genome-wide association studies (GWAS) for oleic acid percentage (C18:1) in a Japanese Black cattle population from the Hyogo prefecture. GWAS analysis revealed two novel candidate regions on BTA9 and BTA14. The most significant single nucleotide polymorphisms (SNPs) in each region were genotyped in a population (n = 899) to verify their effect on C18:1. Statistical analysis revealed that both SNPs were significantly associated with C18:1 (p = .0080 and .0003), validating the quantitative trait loci (QTLs) detected in GWAS. We subsequently selected VNN1 and LYPLA1 genes as candidate genes from each region on BTA9 and BTA14, respectively. We sequenced full-length coding sequence (CDS) of these genes in eight individuals and identified a nonsynonymous SNP T66M on VNN1 gene as a putative candidate polymorphism. The polymorphism was also significantly associated with C18:1, but the p value (p = .0162) was higher than the most significant SNP on BTA9, suggesting that it would not be responsible for the QTL. Although further investigation will be needed to determine the responsible gene and polymorphism, our findings would contribute to development of selective markers for fatty acid composition in the Japanese Black cattle of Hyogo. © 2018 Japanese Society of Animal Science.

  7. Sequencing of a QTL-rich region of the Theobroma cacao genome using pooled BACs and the identification of trait specific candidate genes

    PubMed Central

    2011-01-01

    Background BAC-based physical maps provide for sequencing across an entire genome or a selected sub-genomic region of biological interest. Such a region can be approached with next-generation whole-genome sequencing and assembly as if it were an independent small genome. Using the minimum tiling path as a guide, specific BAC clones representing the prioritized genomic interval are selected, pooled, and used to prepare a sequencing library. Results This pooled BAC approach was taken to sequence and assemble a QTL-rich region, of ~3 Mbp and represented by twenty-seven BACs, on linkage group 5 of the Theobroma cacao cv. Matina 1-6 genome. Using various mixtures of read coverages from paired-end and linear 454 libraries, multiple assemblies of varied quality were generated. Quality was assessed by comparing the assembly of 454 reads with a subset of ten BACs individually sequenced and assembled using Sanger reads. A mixture of reads optimal for assembly was identified. We found, furthermore, that a quality assembly suitable for serving as a reference genome template could be obtained even with a reduced depth of sequencing coverage. Annotation of the resulting assembly revealed several genes potentially responsible for three T. cacao traits: black pod disease resistance, bean shape index, and pod weight. Conclusions Our results, as with other pooled BAC sequencing reports, suggest that pooling portions of a minimum tiling path derived from a BAC-based physical map is an effective method to target sub-genomic regions for sequencing. While we focused on a single QTL region, other QTL regions of importance could be similarly sequenced allowing for biological discovery to take place before a high quality whole-genome assembly is completed. PMID:21794110

  8. Genome-association analysis of Korean Holstein milk traits using genomic estimated breeding value

    PubMed Central

    Shin, Donghyun; Lee, Chul; Park, Kyoung-Do; Kim, Heebal; Cho, Kwang-hyeon

    2017-01-01

    Objective Holsteins are known as the world’s highest-milk producing dairy cattle. The purpose of this study was to identify genetic regions strongly associated with milk traits (milk production, fat, and protein) using Korean Holstein data. Methods This study was performed using single nucleotide polymorphism (SNP) chip data (Illumina BovineSNP50 Beadchip) of 911 Korean Holstein individuals. We inferred each genomic estimated breeding values based on best linear unbiased prediction (BLUP) and ridge regression using BLUPF90 and R. We then performed a genome-wide association study and identified genetic regions related to milk traits. Results We identified 9, 6, and 17 significant genetic regions related to milk production, fat and protein, respectively. These genes are newly reported in the genetic association with milk traits of Holstein. Conclusion This study complements a recent Holstein genome-wide association studies that identified other SNPs and genes as the most significant variants. These results will help to expand the knowledge of the polygenic nature of milk production in Holsteins. PMID:26954162

  9. aCGH Local Copy Number Aberrations Associated with Overall Copy Number Genomic Instability in Colorectal Cancer: Coordinate Involvement of the Regions Including BCR and ABL

    PubMed Central

    Bartos, Jeremy D.; Gaile, Daniel P.; McQuaid, Devin E.; Conroy, Jeffrey M.; Darbary, Huferesh; Nowak, Norma J.; Block, Annemarie; Petrelli, Nicholas J.; Mittelman, Arnold; Stoler, Daniel L.; Anderson, Garth R.

    2007-01-01

    In order to identify small regions of the genome whose specific copy number alteration is associated with high genomic instability in the form of overall genome-wide copy number aberrations, we have analyzed array-based comparative genomic hybridization (aCGH) data from 33 sporadic colorectal carcinomas. Copy number changes of a small number of specific regions were significantly correlated with elevated overall amplifications and deletions scattered throughout the entire genome. One significant region at 9q34 includes the c-ABL gene Another region spanning 22q11–13 includes the breakpoint cluster region (BCR) of the Philadelphia chromosome Coordinate 22q11–13 alterations were observed in nine of eleven tumors with the 9q34 alteration Additional regions on 1q and 14q were associated with overall genome-wide copy number changes, while copy number aberrations on chromosome 7p, 7q, and 13q21.1–31.3 were found associated with this instability only in tumors from patients with a smoking history Our analysis demonstrates there are a small number of regions of the genome where gain or loss is commonly associated with a tumor’s overall level of copy number aberrations Our finding BCR and ABL located within two of the instability-associated regions, and the involvement of these two regions occurring coordinately, suggests a system akin to the BCR-ABL translocation of CML may be involved in genomic instability in about one-third of human colorectal carcinomas. PMID:17196995

  10. Genome-wide association analysis identifies 30 new susceptibility loci for schizophrenia.

    PubMed

    Li, Zhiqiang; Chen, Jianhua; Yu, Hao; He, Lin; Xu, Yifeng; Zhang, Dai; Yi, Qizhong; Li, Changgui; Li, Xingwang; Shen, Jiawei; Song, Zhijian; Ji, Weidong; Wang, Meng; Zhou, Juan; Chen, Boyu; Liu, Yahui; Wang, Jiqiang; Wang, Peng; Yang, Ping; Wang, Qingzhong; Feng, Guoyin; Liu, Benxiu; Sun, Wensheng; Li, Baojie; He, Guang; Li, Weidong; Wan, Chunling; Xu, Qi; Li, Wenjin; Wen, Zujia; Liu, Ke; Huang, Fang; Ji, Jue; Ripke, Stephan; Yue, Weihua; Sullivan, Patrick F; O'Donovan, Michael C; Shi, Yongyong

    2017-11-01

    We conducted a genome-wide association study (GWAS) with replication in 36,180 Chinese individuals and performed further transancestry meta-analyses with data from the Psychiatry Genomics Consortium (PGC2). Approximately 95% of the genome-wide significant (GWS) index alleles (or their proxies) from the PGC2 study were overrepresented in Chinese schizophrenia cases, including ∼50% that achieved nominal significance and ∼75% that continued to be GWS in the transancestry analysis. The Chinese-only analysis identified seven GWS loci; three of these also were GWS in the transancestry analyses, which identified 109 GWS loci, thus yielding a total of 113 GWS loci (30 novel) in at least one of these analyses. We observed improvements in the fine-mapping resolution at many susceptibility loci. Our results provide several lines of evidence supporting candidate genes at many loci and highlight some pathways for further research. Together, our findings provide novel insight into the genetic architecture and biological etiology of schizophrenia.

  11. Identification of genomic sites for CRISPR/Cas9-based genome editing in the Vitis vinifera genome.

    PubMed

    Wang, Yi; Liu, Xianju; Ren, Chong; Zhong, Gan-Yuan; Yang, Long; Li, Shaohua; Liang, Zhenchang

    2016-04-21

    CRISPR/Cas9 has been recently demonstrated as an effective and popular genome editing tool for modifying genomes of humans, animals, microorganisms, and plants. Success of such genome editing is highly dependent on the availability of suitable target sites in the genomes to be edited. Many specific target sites for CRISPR/Cas9 have been computationally identified for several annual model and crop species, but such sites have not been reported for perennial, woody fruit species. In this study, we identified and characterized five types of CRISPR/Cas9 target sites in the widely cultivated grape species Vitis vinifera and developed a user-friendly database for editing grape genomes in the future. A total of 35,767,960 potential CRISPR/Cas9 target sites were identified from grape genomes in this study. Among them, 22,597,817 target sites were mapped to specific genomic locations and 7,269,788 were found to be highly specific. Protospacers and PAMs were found to distribute uniformly and abundantly in the grape genomes. They were present in all the structural elements of genes with the coding region having the highest abundance. Five PAM types, TGG, AGG, GGG, CGG and NGG, were observed. With the exception of the NGG type, they were abundantly present in the grape genomes. Synteny analysis of similar genes revealed that the synteny of protospacers matched the synteny of homologous genes. A user-friendly database containing protospacers and detailed information of the sites was developed and is available for public use at the Grape-CRISPR website ( http://biodb.sdau.edu.cn/gc/index.html ). Grape genomes harbour millions of potential CRISPR/Cas9 target sites. These sites are widely distributed among and within chromosomes with predominant abundance in the coding regions of genes. We developed a publicly-accessible Grape-CRISPR database for facilitating the use of the CRISPR/Cas9 system as a genome editing tool for functional studies and molecular breeding of grapes. Among

  12. The Valdostana goat: a genome-wide investigation of the distinctiveness of its selective sweep regions.

    PubMed

    Talenti, Andrea; Bertolini, Francesca; Pagnacco, Giulio; Pilla, Fabio; Ajmone-Marsan, Paolo; Rothschild, Max F; Crepaldi, Paola

    2017-04-01

    The Valdostana goat is an alpine breed, raised only in the northern Italian region of the Aosta Valley. This breed's main purpose is to produce milk and meat, but is peculiar for its involvement in the "Batailles de Chèvres," a recent tradition of non-cruel fight tournaments. At both the genetic and genomic levels, only a very limited number of studies have been performed with this breed and there are no studies about the genomic signatures left by selection. In this work, 24 unrelated Valdostana animals were screened for runs of homozygosity to identify highly homozygous regions. Then, six different approaches (ROH comparison, Fst single SNPs and windows based, Bayesian, Rsb, and XP-EHH) were applied comparing the Valdostana dataset with 14 other Italian goat breeds to confirm regions that were different among the comparisons. A total of three regions of selection that were also unique among the Valdostana were identified and located on chromosomes 1, 7, and 12 and contained 144 genes. Enrichment analyses detected genes such as cytokines and lymphocyte/leukocyte proliferation genes involved in the regulation of the immune system. A genetic link between an aggressive challenge, cytokines, and immunity has been hypothesized in many studies both in humans and in other species. Possible hypotheses associated with the signals of selection detected could be therefore related to immune-related factors as well as with the peculiar battle competition, or other breed-specific traits, and provided insights for further investigation of these unique regions, for the understanding and safeguard of the Valdostana breed.

  13. A genome-wide association scan in admixed Latin Americans identifies loci influencing facial and scalp hair features

    PubMed Central

    Adhikari, Kaustubh; Fontanil, Tania; Cal, Santiago; Mendoza-Revilla, Javier; Fuentes-Guajardo, Macarena; Chacón-Duque, Juan-Camilo; Al-Saadi, Farah; Johansson, Jeanette A.; Quinto-Sanchez, Mirsha; Acuña-Alonzo, Victor; Jaramillo, Claudia; Arias, William; Barquera Lozano, Rodrigo; Macín Pérez, Gastón; Gómez-Valdés, Jorge; Villamil-Ramírez, Hugo; Hunemeier, Tábita; Ramallo, Virginia; Silva de Cerqueira, Caio C.; Hurtado, Malena; Villegas, Valeria; Granja, Vanessa; Gallo, Carla; Poletti, Giovanni; Schuler-Faccini, Lavinia; Salzano, Francisco M.; Bortolini, Maria-Cátira; Canizales-Quinteros, Samuel; Rothhammer, Francisco; Bedoya, Gabriel; Gonzalez-José, Rolando; Headon, Denis; López-Otín, Carlos; Tobin, Desmond J.; Balding, David; Ruiz-Linares, Andrés

    2016-01-01

    We report a genome-wide association scan in over 6,000 Latin Americans for features of scalp hair (shape, colour, greying, balding) and facial hair (beard thickness, monobrow, eyebrow thickness). We found 18 signals of association reaching genome-wide significance (P values 5 × 10−8 to 3 × 10−119), including 10 novel associations. These include novel loci for scalp hair shape and balding, and the first reported loci for hair greying, monobrow, eyebrow and beard thickness. A newly identified locus influencing hair shape includes a Q30R substitution in the Protease Serine S1 family member 53 (PRSS53). We demonstrate that this enzyme is highly expressed in the hair follicle, especially the inner root sheath, and that the Q30R substitution affects enzyme processing and secretion. The genome regions associated with hair features are enriched for signals of selection, consistent with proposals regarding the evolution of human hair. PMID:26926045

  14. Genomic analysis of Ascochyta rabiei identifies dynamic genome environments of solanapyrone biosynthesis gene cluster and a novel type of pathway-specific regulator

    USDA-ARS?s Scientific Manuscript database

    Secondary metabolite genes are often clustered together and situated in particular genomic regions such as the subtelomere, which can facilitate niche adaptation in fungi. Solanapyrones are toxic secondary metabolites produced by fungi occupying different ecological niches. Full genome sequencing of...

  15. Unprecedented Melioidosis Cases in Northern Australia Caused by an Asian Burkholderia pseudomallei Strain Identified by Using Large-Scale Comparative Genomics

    PubMed Central

    Smith, Emma J.; MacHunter, Barbara; Harrington, Glenda; Theobald, Vanessa; Hall, Carina M.; Hornstra, Heidie M.; McRobb, Evan; Podin, Yuwana; Mayo, Mark; Sahl, Jason W.; Wagner, David M.; Keim, Paul; Kaestli, Mirjam; Currie, Bart J.

    2015-01-01

    Melioidosis is a disease of humans and animals that is caused by the saprophytic bacterium Burkholderia pseudomallei. Once thought to be confined to certain locations, the known presence of B. pseudomallei is expanding as more regions of endemicity are uncovered. There is no vaccine for melioidosis, and even with antibiotic administration, the mortality rate is as high as 40% in some regions that are endemic for the infection. Despite high levels of recombination, phylogenetic reconstruction of B. pseudomallei populations using whole-genome sequencing (WGS) has revealed surprisingly robust biogeographic separation between isolates from Australia and Asia. To date, there have been no confirmed autochthonous melioidosis cases in Australia caused by an Asian isolate; likewise, no autochthonous cases in Asia have been identified as Australian in origin. Here, we used comparative genomic analysis of 455 B. pseudomallei genomes to confirm the unprecedented presence of an Asian clone, sequence type 562 (ST-562), in Darwin, northern Australia. First observed in Darwin in 2005, the incidence of melioidosis cases attributable to ST-562 infection has steadily risen, and it is now a common strain in Darwin. Intriguingly, the Australian ST-562 appears to be geographically restricted to a single locale and is genetically less diverse than other common STs from this region, indicating a recent introduction of this clone into northern Australia. Detailed genomic and epidemiological investigations of new clinical and environmental B. pseudomallei isolates in the Darwin region and ST-562 isolates from Asia will be critical for understanding the origin, distribution, and dissemination of this emerging clone in northern Australia. PMID:26607593

  16. Discovering transcription factor binding sites in highly repetitive regions of genomes with multi-read analysis of ChIP-Seq data.

    PubMed

    Chung, Dongjun; Kuan, Pei Fen; Li, Bo; Sanalkumar, Rajendran; Liang, Kun; Bresnick, Emery H; Dewey, Colin; Keleş, Sündüz

    2011-07-01

    Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is rapidly replacing chromatin immunoprecipitation combined with genome-wide tiling array analysis (ChIP-chip) as the preferred approach for mapping transcription-factor binding sites and chromatin modifications. The state of the art for analyzing ChIP-seq data relies on using only reads that map uniquely to a relevant reference genome (uni-reads). This can lead to the omission of up to 30% of alignable reads. We describe a general approach for utilizing reads that map to multiple locations on the reference genome (multi-reads). Our approach is based on allocating multi-reads as fractional counts using a weighted alignment scheme. Using human STAT1 and mouse GATA1 ChIP-seq datasets, we illustrate that incorporation of multi-reads significantly increases sequencing depths, leads to detection of novel peaks that are not otherwise identifiable with uni-reads, and improves detection of peaks in mappable regions. We investigate various genome-wide characteristics of peaks detected only by utilization of multi-reads via computational experiments. Overall, peaks from multi-read analysis have similar characteristics to peaks that are identified by uni-reads except that the majority of them reside in segmental duplications. We further validate a number of GATA1 multi-read only peaks by independent quantitative real-time ChIP analysis and identify novel target genes of GATA1. These computational and experimental results establish that multi-reads can be of critical importance for studying transcription factor binding in highly repetitive regions of genomes with ChIP-seq experiments.

  17. Regions of very low H3K27me3 partition the Drosophila genome into topological domains

    PubMed Central

    Flower, Rosalyn; Choo, Siew Woh

    2017-01-01

    It is now well established that eukaryote genomes have a common architectural organization into topologically associated domains (TADs) and evidence is accumulating that this organization plays an important role in gene regulation. However, the mechanisms that partition the genome into TADs and the nature of domain boundaries are still poorly understood. We have investigated boundary regions in the Drosophila genome and find that they can be identified as domains of very low H3K27me3. The genome-wide H3K27me3 profile partitions into two states; very low H3K27me3 identifies Depleted (D) domains that contain housekeeping genes and their regulators such as the histone acetyltransferase-containing NSL complex, whereas domains containing moderate-to-high levels of H3K27me3 (Enriched or E domains) are associated with regulated genes, irrespective of whether they are active or inactive. The D domains correlate with the boundaries of TADs and are enriched in a subset of architectural proteins, particularly Chromator, BEAF-32, and Z4/Putzig. However, rather than being clustered at the borders of these domains, these proteins bind throughout the H3K27me3-depleted regions and are much more strongly associated with the transcription start sites of housekeeping genes than with the H3K27me3 domain boundaries. While we have not demonstrated causality, we suggest that the D domain chromatin state, characterised by very low or absent H3K27me3 and established by housekeeping gene regulators, acts to separate topological domains thereby setting up the domain architecture of the genome. PMID:28282436

  18. A genome-wide association study identifies risk loci to equine recurrent uveitis in German warmblood horses.

    PubMed

    Kulbrock, Maike; Lehner, Stefanie; Metzger, Julia; Ohnesorge, Bernhard; Distl, Ottmar

    2013-01-01

    Equine recurrent uveitis (ERU) is a common eye disease affecting up to 3-15% of the horse population. A genome-wide association study (GWAS) using the Illumina equine SNP50 bead chip was performed to identify loci conferring risk to ERU. The sample included a total of 144 German warmblood horses. A GWAS showed a significant single nucleotide polymorphism (SNP) on horse chromosome (ECA) 20 at 49.3 Mb, with IL-17A and IL-17F being the closest genes. This locus explained a fraction of 23% of the phenotypic variance for ERU. A GWAS taking into account the severity of ERU, revealed a SNP on ECA18 nearby to the crystalline gene cluster CRYGA-CRYGF. For both genomic regions on ECA18 and 20, significantly associated haplotypes containing the genome-wide significant SNPs could be demonstrated. In conclusion, our results are indicative for a genetic component regulating the possible critical role of IL-17A and IL-17F in the pathogenesis of ERU. The associated SNP on ECA18 may be indicative for cataract formation in the course of ERU.

  19. Genome-wide copy number variation (CNV) detection in Nelore cattle reveals highly frequent variants in genome regions harboring QTLs affecting production traits.

    PubMed

    da Silva, Joaquim Manoel; Giachetto, Poliana Fernanda; da Silva, Luiz Otávio; Cintra, Leandro Carrijo; Paiva, Samuel Rezende; Yamagishi, Michel Eduardo Beleza; Caetano, Alexandre Rodrigues

    2016-06-13

    Copy number variations (CNVs) have been shown to account for substantial portions of observed genomic variation and have been associated with qualitative and quantitative traits and the onset of disease in a number of species. Information from high-resolution studies to detect, characterize and estimate population-specific variant frequencies will facilitate the incorporation of CNVs in genomic studies to identify genes affecting traits of importance. Genome-wide CNVs were detected in high-density single nucleotide polymorphism (SNP) genotyping data from 1,717 Nelore (Bos indicus) cattle, and in NGS data from eight key ancestral bulls. A total of 68,007 and 12,786 distinct CNVs were observed, respectively. Cross-comparisons of results obtained for the eight resequenced animals revealed that 92 % of the CNVs were observed in both datasets, while 62 % of all detected CNVs were observed to overlap with previously validated cattle copy number variant regions (CNVRs). Observed CNVs were used for obtaining breed-specific CNV frequencies and identification of CNVRs, which were subsequently used for gene annotation. A total of 688 of the detected CNVRs were observed to overlap with 286 non-redundant QTLs associated with important production traits in cattle. All of 34 CNVs previously reported to be associated with milk production traits in Holsteins were also observed in Nelore cattle. Comparisons of estimated frequencies of these CNVs in the two breeds revealed 14, 13, 6 and 14 regions in high (>20 %), low (<20 %) and divergent (NEL > HOL, NEL < HOL) frequencies, respectively. Obtained results significantly enriched the bovine CNV map and enabled the identification of variants that are potentially associated with traits under selection in Nelore cattle, particularly in genome regions harboring QTLs affecting production traits.

  20. Genomic variation in Plasmodium vivax malaria reveals regions under selective pressure

    PubMed Central

    Diez Benavente, Ernest; Ward, Zoe; Chan, Wilson; Mohareb, Fady R.; Sutherland, Colin J.; Roper, Cally; Campino, Susana

    2017-01-01

    Background Although Plasmodium vivax contributes to almost half of all malaria cases outside Africa, it has been relatively neglected compared to the more deadly P. falciparum. It is known that P. vivax populations possess high genetic diversity, differing geographically potentially due to different vector species, host genetics and environmental factors. Results We analysed the high-quality genomic data for 46 P. vivax isolates spanning 10 countries across 4 continents. Using population genetic methods we identified hotspots of selection pressure, including the previously reported MRP1 and DHPS genes, both putative drug resistance loci. Extra copies and deletions in the promoter region of another drug resistance candidate, MDR1 gene, and duplications in the Duffy binding protein gene (PvDBP) potentially involved in erythrocyte invasion, were also identified. For surveillance applications, continental-informative markers were found in putative drug resistance loci, and we show that organellar polymorphisms could classify P. vivax populations across continents and differentiate between Plasmodia spp. Conclusions This study has shown that genomic diversity that lies within and between P. vivax populations can be used to elucidate potential drug resistance and invasion mechanisms, as well as facilitate the molecular barcoding of the parasite for surveillance applications. PMID:28493919

  1. Genome-wide association studies in dogs and humans identify ADAMTS20 as a risk variant for cleft lip and palate.

    PubMed

    Wolf, Zena T; Brand, Harrison A; Shaffer, John R; Leslie, Elizabeth J; Arzi, Boaz; Willet, Cali E; Cox, Timothy C; McHenry, Toby; Narayan, Nicole; Feingold, Eleanor; Wang, Xioajing; Sliskovic, Saundra; Karmi, Nili; Safra, Noa; Sanchez, Carla; Deleyiannis, Frederic W B; Murray, Jeffrey C; Wade, Claire M; Marazita, Mary L; Bannasch, Danika L

    2015-03-01

    Cleft lip with or without cleft palate (CL/P) is the most commonly occurring craniofacial birth defect. We provide insight into the genetic etiology of this birth defect by performing genome-wide association studies in two species: dogs and humans. In the dog, a genome-wide association study of 7 CL/P cases and 112 controls from the Nova Scotia Duck Tolling Retriever (NSDTR) breed identified a significantly associated region on canine chromosome 27 (unadjusted p=1.1 x 10(-13); adjusted p= 2.2 x 10(-3)). Further analysis in NSDTR families and additional full sibling cases identified a 1.44 Mb homozygous haplotype (chromosome 27: 9.29 - 10.73 Mb) segregating with a more complex phenotype of cleft lip, cleft palate, and syndactyly (CLPS) in 13 cases. Whole-genome sequencing of 3 CLPS cases and 4 controls at 15X coverage led to the discovery of a frameshift mutation within ADAMTS20 (c.1360_1361delAA (p.Lys453Ilefs*3)), which segregated concordant with the phenotype. In a parallel study in humans, a family-based association analysis (DFAM) of 125 CL/P cases, 420 unaffected relatives, and 392 controls from a Guatemalan cohort, identified a suggestive association (rs10785430; p =2.67 x 10-6) with the same gene, ADAMTS20. Sequencing of cases from the Guatemalan cohort was unable to identify a causative mutation within the coding region of ADAMTS20, but four coding variants were found in additional cases of CL/P. In summary, this study provides genetic evidence for a role of ADAMTS20 in CL/P development in dogs and as a candidate gene for CL/P development in humans.

  2. Identifying elemental genomic track types and representing them uniformly

    PubMed Central

    2011-01-01

    Background With the recent advances and availability of various high-throughput sequencing technologies, data on many molecular aspects, such as gene regulation, chromatin dynamics, and the three-dimensional organization of DNA, are rapidly being generated in an increasing number of laboratories. The variation in biological context, and the increasingly dispersed mode of data generation, imply a need for precise, interoperable and flexible representations of genomic features through formats that are easy to parse. A host of alternative formats are currently available and in use, complicating analysis and tool development. The issue of whether and how the multitude of formats reflects varying underlying characteristics of data has to our knowledge not previously been systematically treated. Results We here identify intrinsic distinctions between genomic features, and argue that the distinctions imply that a certain variation in the representation of features as genomic tracks is warranted. Four core informational properties of tracks are discussed: gaps, lengths, values and interconnections. From this we delineate fifteen generic track types. Based on the track type distinctions, we characterize major existing representational formats and find that the track types are not adequately supported by any single format. We also find, in contrast to the XML formats, that none of the existing tabular formats are conveniently extendable to support all track types. We thus propose two unified formats for track data, an improved XML format, BioXSD 1.1, and a new tabular format, GTrack 1.0. Conclusions The defined track types are shown to capture relevant distinctions between genomic annotation tracks, resulting in varying representational needs and analysis possibilities. The proposed formats, GTrack 1.0 and BioXSD 1.1, cater to the identified track distinctions and emphasize preciseness, flexibility and parsing convenience. PMID:22208806

  3. A genome-wide association study identifies multiple loci for variation in human ear morphology.

    PubMed

    Adhikari, Kaustubh; Reales, Guillermo; Smith, Andrew J P; Konka, Esra; Palmen, Jutta; Quinto-Sanchez, Mirsha; Acuña-Alonzo, Victor; Jaramillo, Claudia; Arias, William; Fuentes, Macarena; Pizarro, María; Barquera Lozano, Rodrigo; Macín Pérez, Gastón; Gómez-Valdés, Jorge; Villamil-Ramírez, Hugo; Hunemeier, Tábita; Ramallo, Virginia; Silva de Cerqueira, Caio C; Hurtado, Malena; Villegas, Valeria; Granja, Vanessa; Gallo, Carla; Poletti, Giovanni; Schuler-Faccini, Lavinia; Salzano, Francisco M; Bortolini, Maria-Cátira; Canizales-Quinteros, Samuel; Rothhammer, Francisco; Bedoya, Gabriel; Calderón, Rosario; Rosique, Javier; Cheeseman, Michael; Bhutta, Mahmood F; Humphries, Steve E; Gonzalez-José, Rolando; Headon, Denis; Balding, David; Ruiz-Linares, Andrés

    2015-06-24

    Here we report a genome-wide association study for non-pathological pinna morphology in over 5,000 Latin Americans. We find genome-wide significant association at seven genomic regions affecting: lobe size and attachment, folding of antihelix, helix rolling, ear protrusion and antitragus size (linear regression P values 2 × 10(-8) to 3 × 10(-14)). Four traits are associated with a functional variant in the Ectodysplasin A receptor (EDAR) gene, a key regulator of embryonic skin appendage development. We confirm expression of Edar in the developing mouse ear and that Edar-deficient mice have an abnormally shaped pinna. Two traits are associated with SNPs in a region overlapping the T-Box Protein 15 (TBX15) gene, a major determinant of mouse skeletal development. Strongest association in this region is observed for SNP rs17023457 located in an evolutionarily conserved binding site for the transcription factor Cartilage paired-class homeoprotein 1 (CART1), and we confirm that rs17023457 alters in vitro binding of CART1.

  4. Environmental Response and Genomic Regions Correlated with Rice Root Growth and Yield under Drought in the OryzaSNP Panel across Multiple Study Systems

    PubMed Central

    Wade, Len J.; Bartolome, Violeta; Mauleon, Ramil; Vasant, Vivek Deshmuck; Prabakar, Sumeet Mankar; Chelliah, Muthukumar; Kameoka, Emi; Nagendra, K.; Reddy, K. R. Kamalnath; Varma, C. Mohan Kumar; Patil, Kalmeshwar Gouda; Shrestha, Roshi; Al-Shugeairy, Zaniab; Al-Ogaidi, Faez; Munasinghe, Mayuri; Gowda, Veeresh; Semon, Mande; Suralta, Roel R.; Shenoy, Vinay; Vadez, Vincent; Serraj, Rachid; Shashidhar, H. E.; Yamauchi, Akira; Babu, Ranganathan Chandra; Price, Adam; McNally, Kenneth L.; Henry, Amelia

    2015-01-01

    The rapid progress in rice genotyping must be matched by advances in phenotyping. A better understanding of genetic variation in rice for drought response, root traits, and practical methods for studying them are needed. In this study, the OryzaSNP set (20 diverse genotypes that have been genotyped for SNP markers) was phenotyped in a range of field and container studies to study the diversity of rice root growth and response to drought. Of the root traits measured across more than 20 root experiments, root dry weight showed the most stable genotypic performance across studies. The environment (E) component had the strongest effect on yield and root traits. We identified genomic regions correlated with root dry weight, percent deep roots, maximum root depth, and grain yield based on a correlation analysis with the phenotypes and aus, indica, or japonica introgression regions using the SNP data. Two genomic regions were identified as hot spots in which root traits and grain yield were co-located; on chromosome 1 (39.7–40.7 Mb) and on chromosome 8 (20.3–21.9 Mb). Across experiments, the soil type/ growth medium showed more correlations with plant growth than the container dimensions. Although the correlations among studies and genetic co-location of root traits from a range of study systems points to their potential utility to represent responses in field studies, the best correlations were observed when the two setups had some similar properties. Due to the co-location of the identified genomic regions (from introgression block analysis) with QTL for a number of previously reported root and drought traits, these regions are good candidates for detailed characterization to contribute to understanding rice improvement for response to drought. This study also highlights the utility of characterizing a small set of 20 genotypes for root growth, drought response, and related genomic regions. PMID:25909711

  5. Determining Epigenetic Targets: A Beginner's Guide to Identifying Genome Functionality Through Database Analysis.

    PubMed

    Hay, Elizabeth A; Cowie, Philip; MacKenzie, Alasdair

    2017-01-01

    There can now be little doubt that the cis-regulatory genome represents the largest information source within the human genome essential for health. In addition to containing up to five times more information than the coding genome, the cis-regulatory genome also acts as a major reservoir of disease-associated polymorphic variation. The cis-regulatory genome, which is comprised of enhancers, silencers, promoters, and insulators, also acts as a major functional target for epigenetic modification including DNA methylation and chromatin modifications. These epigenetic modifications impact the ability of cis-regulatory sequences to maintain tissue-specific and inducible expression of genes that preserve health. There has been limited ability to identify and characterize the functional components of this huge and largely misunderstood part of the human genome that, for decades, was ignored as "Junk" DNA. In an attempt to address this deficit, the current chapter will first describe methods of identifying and characterizing functional elements of the cis-regulatory genome at a genome-wide level using databases such as ENCODE, the UCSC browser, and NCBI. We will then explore the databases on the UCSC genome browser, which provides access to DNA methylation and chromatin modification datasets. Finally, we will describe how we can superimpose the huge volume of study data contained in the NCBI archives onto that contained within the UCSC browser in order to glean relevant in vivo study data for any locus within the genome. An ability to access and utilize these information sources will become essential to informing the future design of experiments and subsequent determination of the role of epigenetics in health and disease and will form a critical step in our development of personalized medicine.

  6. Ebolavirus comparative genomics

    DOE PAGES

    Jun, Se-Ran; Leuze, Michael R.; Nookaew, Intawat; ...

    2015-07-14

    The 2014 Ebola outbreak in West Africa is the largest documented for this virus. We examine the dynamics of this genome, comparing more than one hundred currently available ebolavirus genomes to each other and to other viral genomes. Based on oligomer frequency analysis, the family Filoviridae forms a distinct group from all other sequenced viral genomes. All filovirus genomes sequenced to date encode proteins with similar functions and gene order, although there is considerable divergence in sequences between the three genera Ebolavirus, Cuevavirus, and Marburgvirus within the family Filoviridae. Whereas all ebolavirus genomes are quite similar (multiple sequences of themore » same strain are often identical), variation is most common in the intergenic regions and within specific areas of the genes encoding the glycoprotein (GP), nucleoprotein (NP), and polymerase (L). We predict regions that could contain epitope-binding sites, which might be good vaccine targets. In conclusion, this information, combined with glycosylation sites and experimentally determined epitopes, can identify the most promising regions for the development of therapeutic strategies.« less

  7. An exploration of the sequence of a 2.9-Mb region of the genome of Drosophila melanogaster: the Adh region.

    PubMed Central

    Ashburner, M; Misra, S; Roote, J; Lewis, S E; Blazej, R; Davis, T; Doyle, C; Galle, R; George, R; Harris, N; Hartzell, G; Harvey, D; Hong, L; Houston, K; Hoskins, R; Johnson, G; Martin, C; Moshrefi, A; Palazzolo, M; Reese, M G; Spradling, A; Tsang, G; Wan, K; Whitelaw, K; Celniker, S

    1999-01-01

    A contiguous sequence of nearly 3 Mb from the genome of Drosophila melanogaster has been sequenced from a series of overlapping P1 and BAC clones. This region covers 69 chromosome polytene bands on chromosome arm 2L, including the genetically well-characterized "Adh region." A computational analysis of the sequence predicts 218 protein-coding genes, 11 tRNAs, and 17 transposable element sequences. At least 38 of the protein-coding genes are arranged in clusters of from 2 to 6 closely related genes, suggesting extensive tandem duplication. The gene density is one protein-coding gene every 13 kb; the transposable element density is one element every 171 kb. Of 73 genes in this region identified by genetic analysis, 49 have been located on the sequence; P-element insertions have been mapped to 43 genes. Ninety-five (44%) of the known and predicted genes match a Drosophila EST, and 144 (66%) have clear similarities to proteins in other organisms. Genes known to have mutant phenotypes are more likely to be represented in cDNA libraries, and far more likely to have products similar to proteins of other organisms, than are genes with no known mutant phenotype. Over 650 chromosome aberration breakpoints map to this chromosome region, and their nonrandom distribution on the genetic map reflects variation in gene spacing on the DNA. This is the first large-scale analysis of the genome of D. melanogaster at the sequence level. In addition to the direct results obtained, this analysis has allowed us to develop and test methods that will be needed to interpret the complete sequence of the genome of this species.Before beginning a Hunt, it is wise to ask someone what you are looking for before you begin looking for it. Milne 1926 PMID:10471707

  8. Single Molecule Analysis of Replicated DNA Reveals the Usage of Multiple KSHV Genome Regions for Latent Replication

    PubMed Central

    Verma, Subhash C.; Lu, Jie; Cai, Qiliang; Kosiyatrakul, Settapong; McDowell, Maria E.; Schildkraut, Carl L.; Robertson, Erle S.

    2011-01-01

    Kaposi's sarcoma associated herpesvirus (KSHV), an etiologic agent of Kaposi's sarcoma, Body Cavity Based Lymphoma and Multicentric Castleman's Disease, establishes lifelong latency in infected cells. The KSHV genome tethers to the host chromosome with the help of a latency associated nuclear antigen (LANA). Additionally, LANA supports replication of the latent origins within the terminal repeats by recruiting cellular factors. Our previous studies identified and characterized another latent origin, which supported the replication of plasmids ex-vivo without LANA expression in trans. Therefore identification of an additional origin site prompted us to analyze the entire KSHV genome for replication initiation sites using single molecule analysis of replicated DNA (SMARD). Our results showed that replication of DNA can initiate throughout the KSHV genome and the usage of these regions is not conserved in two different KSHV strains investigated. SMARD also showed that the utilization of multiple replication initiation sites occurs across large regions of the genome rather than a specified sequence. The replication origin of the terminal repeats showed only a slight preference for their usage indicating that LANA dependent origin at the terminal repeats (TR) plays only a limited role in genome duplication. Furthermore, we performed chromatin immunoprecipitation for ORC2 and MCM3, which are part of the pre-replication initiation complex to determine the genomic sites where these proteins accumulate, to provide further characterization of potential replication initiation sites on the KSHV genome. The ChIP data confirmed accumulation of these pre-RC proteins at multiple genomic sites in a cell cycle dependent manner. Our data also show that both the frequency and the sites of replication initiation vary within the two KSHV genomes studied here, suggesting that initiation of replication is likely to be affected by the genomic context rather than the DNA sequences. PMID

  9. Secondary structure of the 3'-noncoding region of flavivirus genomes: comparative analysis of base pairing probabilities.

    PubMed

    Rauscher, S; Flamm, C; Mandl, C W; Heinz, F X; Stadler, P F

    1997-07-01

    The prediction of the complete matrix of base pairing probabilities was applied to the 3' noncoding region (NCR) of flavivirus genomes. This approach identifies not only well-defined secondary structure elements, but also regions of high structural flexibility. Flaviviruses, many of which are important human pathogens, have a common genomic organization, but exhibit a significant degree of RNA sequence diversity in the functionally important 3'-NCR. We demonstrate the presence of secondary structures shared by all flaviviruses, as well as structural features that are characteristic for groups of viruses within the genus reflecting the established classification scheme. The significance of most of the predicted structures is corroborated by compensatory mutations. The availability of infectious clones for several flaviviruses will allow the assessment of these structural elements in processes of the viral life cycle, such as replication and assembly.

  10. Genome-wide analysis identifies 12 loci influencing human reproductive behavior.

    PubMed

    Barban, Nicola; Jansen, Rick; de Vlaming, Ronald; Vaez, Ahmad; Mandemakers, Jornt J; Tropf, Felix C; Shen, Xia; Wilson, James F; Chasman, Daniel I; Nolte, Ilja M; Tragante, Vinicius; van der Laan, Sander W; Perry, John R B; Kong, Augustine; Ahluwalia, Tarunveer S; Albrecht, Eva; Yerges-Armstrong, Laura; Atzmon, Gil; Auro, Kirsi; Ayers, Kristin; Bakshi, Andrew; Ben-Avraham, Danny; Berger, Klaus; Bergman, Aviv; Bertram, Lars; Bielak, Lawrence F; Bjornsdottir, Gyda; Bonder, Marc Jan; Broer, Linda; Bui, Minh; Barbieri, Caterina; Cavadino, Alana; Chavarro, Jorge E; Turman, Constance; Concas, Maria Pina; Cordell, Heather J; Davies, Gail; Eibich, Peter; Eriksson, Nicholas; Esko, Tõnu; Eriksson, Joel; Falahi, Fahimeh; Felix, Janine F; Fontana, Mark Alan; Franke, Lude; Gandin, Ilaria; Gaskins, Audrey J; Gieger, Christian; Gunderson, Erica P; Guo, Xiuqing; Hayward, Caroline; He, Chunyan; Hofer, Edith; Huang, Hongyan; Joshi, Peter K; Kanoni, Stavroula; Karlsson, Robert; Kiechl, Stefan; Kifley, Annette; Kluttig, Alexander; Kraft, Peter; Lagou, Vasiliki; Lecoeur, Cecile; Lahti, Jari; Li-Gao, Ruifang; Lind, Penelope A; Liu, Tian; Makalic, Enes; Mamasoula, Crysovalanto; Matteson, Lindsay; Mbarek, Hamdi; McArdle, Patrick F; McMahon, George; Meddens, S Fleur W; Mihailov, Evelin; Miller, Mike; Missmer, Stacey A; Monnereau, Claire; van der Most, Peter J; Myhre, Ronny; Nalls, Mike A; Nutile, Teresa; Kalafati, Ioanna Panagiota; Porcu, Eleonora; Prokopenko, Inga; Rajan, Kumar B; Rich-Edwards, Janet; Rietveld, Cornelius A; Robino, Antonietta; Rose, Lynda M; Rueedi, Rico; Ryan, Kathleen A; Saba, Yasaman; Schmidt, Daniel; Smith, Jennifer A; Stolk, Lisette; Streeten, Elizabeth; Tönjes, Anke; Thorleifsson, Gudmar; Ulivi, Sheila; Wedenoja, Juho; Wellmann, Juergen; Willeit, Peter; Yao, Jie; Yengo, Loic; Zhao, Jing Hua; Zhao, Wei; Zhernakova, Daria V; Amin, Najaf; Andrews, Howard; Balkau, Beverley; Barzilai, Nir; Bergmann, Sven; Biino, Ginevra; Bisgaard, Hans; Bønnelykke, Klaus; Boomsma, Dorret I; Buring, Julie E; Campbell, Harry; Cappellani, Stefania; Ciullo, Marina; Cox, Simon R; Cucca, Francesco; Toniolo, Daniela; Davey-Smith, George; Deary, Ian J; Dedoussis, George; Deloukas, Panos; van Duijn, Cornelia M; de Geus, Eco J C; Eriksson, Johan G; Evans, Denis A; Faul, Jessica D; Sala, Cinzia Felicita; Froguel, Philippe; Gasparini, Paolo; Girotto, Giorgia; Grabe, Hans-Jörgen; Greiser, Karin Halina; Groenen, Patrick J F; de Haan, Hugoline G; Haerting, Johannes; Harris, Tamara B; Heath, Andrew C; Heikkilä, Kauko; Hofman, Albert; Homuth, Georg; Holliday, Elizabeth G; Hopper, John; Hyppönen, Elina; Jacobsson, Bo; Jaddoe, Vincent W V; Johannesson, Magnus; Jugessur, Astanand; Kähönen, Mika; Kajantie, Eero; Kardia, Sharon L R; Keavney, Bernard; Kolcic, Ivana; Koponen, Päivikki; Kovacs, Peter; Kronenberg, Florian; Kutalik, Zoltan; La Bianca, Martina; Lachance, Genevieve; Iacono, William G; Lai, Sandra; Lehtimäki, Terho; Liewald, David C; Lindgren, Cecilia M; Liu, Yongmei; Luben, Robert; Lucht, Michael; Luoto, Riitta; Magnus, Per; Magnusson, Patrik K E; Martin, Nicholas G; McGue, Matt; McQuillan, Ruth; Medland, Sarah E; Meisinger, Christa; Mellström, Dan; Metspalu, Andres; Traglia, Michela; Milani, Lili; Mitchell, Paul; Montgomery, Grant W; Mook-Kanamori, Dennis; de Mutsert, Renée; Nohr, Ellen A; Ohlsson, Claes; Olsen, Jørn; Ong, Ken K; Paternoster, Lavinia; Pattie, Alison; Penninx, Brenda W J H; Perola, Markus; Peyser, Patricia A; Pirastu, Mario; Polasek, Ozren; Power, Chris; Kaprio, Jaakko; Raffel, Leslie J; Räikkönen, Katri; Raitakari, Olli; Ridker, Paul M; Ring, Susan M; Roll, Kathryn; Rudan, Igor; Ruggiero, Daniela; Rujescu, Dan; Salomaa, Veikko; Schlessinger, David; Schmidt, Helena; Schmidt, Reinhold; Schupf, Nicole; Smit, Johannes; Sorice, Rossella; Spector, Tim D; Starr, John M; Stöckl, Doris; Strauch, Konstantin; Stumvoll, Michael; Swertz, Morris A; Thorsteinsdottir, Unnur; Thurik, A Roy; Timpson, Nicholas J; Tung, Joyce Y; Uitterlinden, André G; Vaccargiu, Simona; Viikari, Jorma; Vitart, Veronique; Völzke, Henry; Vollenweider, Peter; Vuckovic, Dragana; Waage, Johannes; Wagner, Gert G; Wang, Jie Jin; Wareham, Nicholas J; Weir, David R; Willemsen, Gonneke; Willeit, Johann; Wright, Alan F; Zondervan, Krina T; Stefansson, Kari; Krueger, Robert F; Lee, James J; Benjamin, Daniel J; Cesarini, David; Koellinger, Philipp D; den Hoed, Marcel; Snieder, Harold; Mills, Melinda C

    2016-12-01

    The genetic architecture of human reproductive behavior-age at first birth (AFB) and number of children ever born (NEB)-has a strong relationship with fitness, human development, infertility and risk of neuropsychiatric disorders. However, very few genetic loci have been identified, and the underlying mechanisms of AFB and NEB are poorly understood. We report a large genome-wide association study of both sexes including 251,151 individuals for AFB and 343,072 individuals for NEB. We identified 12 independent loci that are significantly associated with AFB and/or NEB in a SNP-based genome-wide association study and 4 additional loci associated in a gene-based effort. These loci harbor genes that are likely to have a role, either directly or by affecting non-local gene expression, in human reproduction and infertility, thereby increasing understanding of these complex traits.

  11. Short interspersed transposable elements (SINEs) are excluded from imprinted regions in the human genome.

    PubMed

    Greally, John M

    2002-01-08

    To test whether regions undergoing genomic imprinting have unique genomic characteristics, imprinted and nonimprinted human loci were compared for nucleotide and retroelement composition. Maternally and paternally expressed subgroups of imprinted genes were found to differ in terms of guanine and cytosine, CpG, and retroelement content, indicating a segregation into distinct genomic compartments. Imprinted regions have been normally permissive to L1 long interspersed transposable element retroposition during mammalian evolution but universally and significantly lack short interspersed transposable elements (SINEs). The primate-specific Alu SINEs, as well as the more ancient mammalian-wide interspersed repeat SINEs, are found at significantly low densities in imprinted regions. The latter paleogenomic signature indicates that the sequence characteristics of currently imprinted regions existed before the mammalian radiation. Transitions from imprinted to nonimprinted genomic regions in cis are characterized by a sharp inflection in SINE content, demonstrating that this genomic characteristic can help predict the presence and extent of regions undergoing imprinting. During primate evolution, SINE accumulation in imprinted regions occurred at a decreased rate compared with control loci. The constraint on SINE accumulation in imprinted regions may be mediated by an active selection process. This selection could be because of SINEs attracting and spreading methylation, as has been found at other loci. Methylation-induced silencing could lead to deleterious consequences at imprinted loci, where inactivation of one allele is already established, and expression is often essential for embryonic growth and survival.

  12. Short interspersed transposable elements (SINEs) are excluded from imprinted regions in the human genome

    PubMed Central

    Greally, John M.

    2002-01-01

    To test whether regions undergoing genomic imprinting have unique genomic characteristics, imprinted and nonimprinted human loci were compared for nucleotide and retroelement composition. Maternally and paternally expressed subgroups of imprinted genes were found to differ in terms of guanine and cytosine, CpG, and retroelement content, indicating a segregation into distinct genomic compartments. Imprinted regions have been normally permissive to L1 long interspersed transposable element retroposition during mammalian evolution but universally and significantly lack short interspersed transposable elements (SINEs). The primate-specific Alu SINEs, as well as the more ancient mammalian-wide interspersed repeat SINEs, are found at significantly low densities in imprinted regions. The latter paleogenomic signature indicates that the sequence characteristics of currently imprinted regions existed before the mammalian radiation. Transitions from imprinted to nonimprinted genomic regions in cis are characterized by a sharp inflection in SINE content, demonstrating that this genomic characteristic can help predict the presence and extent of regions undergoing imprinting. During primate evolution, SINE accumulation in imprinted regions occurred at a decreased rate compared with control loci. The constraint on SINE accumulation in imprinted regions may be mediated by an active selection process. This selection could be because of SINEs attracting and spreading methylation, as has been found at other loci. Methylation-induced silencing could lead to deleterious consequences at imprinted loci, where inactivation of one allele is already established, and expression is often essential for embryonic growth and survival. PMID:11756672

  13. Variants in Several Genomic Regions Associated with Asperger Disorder

    PubMed Central

    Salyakina, D.; Ma, D.Q.; Jaworski, J.M.; Konidari, I.; Whitehead, P.L.; Henson, R.; Martinez, D.; Robinson, J.L.; Sacharow, S.; Wright, H.H.; Abramson, R.K.; Gilbert, J.R.; Cuccaro, M.L.; Pericak-Vance, M.A.

    2010-01-01

    Asperger disorder (ASP) is one of the autism spectrum disorders (ASD) and is differentiated from autism largely on the absence of clinically significant cognitive and language delays. Analysis of a homogenous subset of families with ASP may help to address the corresponding effect of genetic heterogeneity on identifying ASD genetic risk factors. To examine the hypothesis that common variation is important in ASD, we performed a genome-wide association study (GWAS) in 124 ASP families in a discovery data set and 110 ASP families in a validation data set. We prioritized the top 100 association results from both cohorts by employing a ranking strategy. Novel regions on 5q21.1 (P = 9.7 × 10−7) and 15q22.1–q22.2 (P = 7.3 × 10−6) were our most significant findings in the combined data set. Three chromosomal regions showing association, 3p14.2 (P = 3.6 × 10−6), 3q25–26 (P = 6.0 × 10−5) and 3p23 (P = 3.3 × 10−4) overlapped linkage regions reported in Finnish ASP families, and eight association regions overlapped ASD linkage areas. Our findings suggest that ASP shares both ASD-related genetic risk factors, as well as has genetic risk factors unique to the ASP phenotype. PMID:21182207

  14. High-Throughput resequencing of maize landraces at genomic regions associated with flowering time

    USDA-ARS?s Scientific Manuscript database

    Despite the reduction in the price of sequencing, it remains expensive to sequence and assemble whole, complex genomes of multiple samples for population studies, particularly for large genomes like those of many crop species. Enrichment of target genome regions coupled with next generation sequenci...

  15. Genome-wide oxidative bisulfite sequencing identifies sex-specific methylation differences in the human placenta

    PubMed Central

    Johnson, Michelle D; Dopierala, Justyna

    2018-01-01

    ABSTRACT DNA methylation is an important regulator of gene function. Fetal sex is associated with the risk of several specific pregnancy complications related to placental function. However, the association between fetal sex and placental DNA methylation remains poorly understood. We carried out whole-genome oxidative bisulfite sequencing in the placentas of two healthy female and two healthy male pregnancies generating an average genome depth of coverage of 25x. Most highly ranked differentially methylated regions (DMRs) were located on the X chromosome but we identified a 225 kb sex-specific DMR in the body of the CUB and Sushi Multiple Domains 1 (CSMD1) gene on chromosome 8. The sex-specific differential methylation pattern observed in this region was validated in additional placentas using in-solution target capture. In a new RNA-seq data set from 64 female and 67 male placentas, CSMD1 mRNA was 1.8-fold higher in male than in female placentas (P value = 8.5 × 10−7, Mann-Whitney test). Exon-level quantification of CSMD1 mRNA from these 131 placentas suggested a likely placenta-specific CSMD1 isoform not detected in the 21 somatic tissues analyzed. We show that the gene body of an autosomal gene, CSMD1, is differentially methylated in a sex- and placental-specific manner, displaying sex-specific differences in placental transcript abundance. PMID:29376485

  16. Parent-Of-Origin Effects in Autism Identified through Genome-Wide Linkage Analysis of 16,000 SNPs

    PubMed Central

    Fradin, Delphine; Cheslack-Postava, Keely; Ladd-Acosta, Christine; Newschaffer, Craig; Chakravarti, Aravinda; Arking, Dan E.; Feinberg, Andrew; Fallin, M. Daniele

    2010-01-01

    Background Autism is a common heritable neurodevelopmental disorder with complex etiology. Several genome-wide linkage and association scans have been carried out to identify regions harboring genes related to autism or autism spectrum disorders, with mixed results. Given the overlap in autism features with genetic abnormalities known to be associated with imprinting, one possible reason for lack of consistency would be the influence of parent-of-origin effects that may mask the ability to detect linkage and association. Methods and Findings We have performed a genome-wide linkage scan that accounts for potential parent-of-origin effects using 16,311 SNPs among families from the Autism Genetic Resource Exchange (AGRE) and the National Institute of Mental Health (NIMH) autism repository. We report parametric (GH, Genehunter) and allele-sharing linkage (Aspex) results using a broad spectrum disorder case definition. Paternal-origin genome-wide statistically significant linkage was observed on chromosomes 4 (LODGH = 3.79, empirical p<0.005 and LODAspex = 2.96, p = 0.008), 15 (LODGH = 3.09, empirical p<0.005 and LODAspex = 3.62, empirical p = 0.003) and 20 (LODGH = 3.36, empirical p<0.005 and LODAspex = 3.38, empirical p = 0.006). Conclusions These regions may harbor imprinted sites associated with the development of autism and offer fruitful domains for molecular investigation into the role of epigenetic mechanisms in autism. PMID:20824079

  17. Genome-wide association study and accuracy of genomic prediction for teat number in Duroc pigs using genotyping-by-sequencing.

    PubMed

    Tan, Cheng; Wu, Zhenfang; Ren, Jiangli; Huang, Zhuolin; Liu, Dewu; He, Xiaoyan; Prakapenka, Dzianis; Zhang, Ran; Li, Ning; Da, Yang; Hu, Xiaoxiang

    2017-03-29

    The number of teats in pigs is related to a sow's ability to rear piglets to weaning age. Several studies have identified genes and genomic regions that affect teat number in swine but few common results were reported. The objective of this study was to identify genetic factors that affect teat number in pigs, evaluate the accuracy of genomic prediction, and evaluate the contribution of significant genes and genomic regions to genomic broad-sense heritability and prediction accuracy using 41,108 autosomal single nucleotide polymorphisms (SNPs) from genotyping-by-sequencing on 2936 Duroc boars. Narrow-sense heritability and dominance heritability of teat number estimated by genomic restricted maximum likelihood were 0.365 ± 0.030 and 0.035 ± 0.019, respectively. The accuracy of genomic predictions, calculated as the average correlation between the genomic best linear unbiased prediction and phenotype in a tenfold validation study, was 0.437 ± 0.064 for the model with additive and dominance effects and 0.435 ± 0.064 for the model with additive effects only. Genome-wide association studies (GWAS) using three methods of analysis identified 85 significant SNP effects for teat number on chromosomes 1, 6, 7, 10, 11, 12 and 14. The region between 102.9 and 106.0 Mb on chromosome 7, which was reported in several studies, had the most significant SNP effects in or near the PTGR2, FAM161B, LIN52, VRTN, FCF1, AREL1 and LRRC74A genes. This region accounted for 10.0% of the genomic additive heritability and 8.0% of the accuracy of prediction. The second most significant chromosome region not reported by previous GWAS was the region between 77.7 and 79.7 Mb on chromosome 11, where SNPs in the FGF14 gene had the most significant effect and accounted for 5.1% of the genomic additive heritability and 5.2% of the accuracy of prediction. The 85 significant SNPs accounted for 28.5 to 28.8% of the genomic additive heritability and 35.8 to 36.8% of the accuracy of

  18. Comparative transgenic analysis of enhancers from the human SHOX and mouse Shox2 genomic regions.

    PubMed

    Rosin, Jessica M; Abassah-Oppong, Samuel; Cobb, John

    2013-08-01

    Disruption of presumptive enhancers downstream of the human SHOX gene (hSHOX) is a frequent cause of the zeugopodal limb defects characteristic of Léri-Weill dyschondrosteosis (LWD). The closely related mouse Shox2 gene (mShox2) is also required for limb development, but in the more proximal stylopodium. In this study, we used transgenic mice in a comparative approach to characterize enhancer sequences in the hSHOX and mShox2 genomic regions. Among conserved noncoding elements (CNEs) that function as enhancers in vertebrate genomes, those that are maintained near paralogous genes are of particular interest given their ancient origins. Therefore, we first analyzed the regulatory potential of a genomic region containing one such duplicated CNE (dCNE) downstream of mShox2 and hSHOX. We identified a strong limb enhancer directly adjacent to the mShox2 dCNE that recapitulates the expression pattern of the endogenous gene. Interestingly, this enhancer requires sequences only conserved in the mammalian lineage in order to drive strong limb expression, whereas the more deeply conserved sequences of the dCNE function as a neural enhancer. Similarly, we found that a conserved element downstream of hSHOX (CNE9) also functions as a neural enhancer in transgenic mice. However, when the CNE9 transgenic construct was enlarged to include adjacent, non-conserved sequences frequently deleted in LWD patients, the transgene drove expression in the zeugopodium of the limbs. Therefore, both hSHOX and mShox2 limb enhancers are coupled to distinct neural enhancers. This is the first report demonstrating the activity of cis-regulatory elements from the hSHOX and mShox2 genomic regions in mammalian embryos.

  19. Genome-wide association analysis identifies three new breast cancer susceptibility loci.

    PubMed

    Ghoussaini, Maya; Fletcher, Olivia; Michailidou, Kyriaki; Turnbull, Clare; Schmidt, Marjanka K; Dicks, Ed; Dennis, Joe; Wang, Qin; Humphreys, Manjeet K; Luccarini, Craig; Baynes, Caroline; Conroy, Don; Maranian, Melanie; Ahmed, Shahana; Driver, Kristy; Johnson, Nichola; Orr, Nicholas; dos Santos Silva, Isabel; Waisfisz, Quinten; Meijers-Heijboer, Hanne; Uitterlinden, Andre G; Rivadeneira, Fernando; Hall, Per; Czene, Kamila; Irwanto, Astrid; Liu, Jianjun; Nevanlinna, Heli; Aittomäki, Kristiina; Blomqvist, Carl; Meindl, Alfons; Schmutzler, Rita K; Müller-Myhsok, Bertram; Lichtner, Peter; Chang-Claude, Jenny; Hein, Rebecca; Nickels, Stefan; Flesch-Janys, Dieter; Tsimiklis, Helen; Makalic, Enes; Schmidt, Daniel; Bui, Minh; Hopper, John L; Apicella, Carmel; Park, Daniel J; Southey, Melissa; Hunter, David J; Chanock, Stephen J; Broeks, Annegien; Verhoef, Senno; Hogervorst, Frans B L; Fasching, Peter A; Lux, Michael P; Beckmann, Matthias W; Ekici, Arif B; Sawyer, Elinor; Tomlinson, Ian; Kerin, Michael; Marme, Frederik; Schneeweiss, Andreas; Sohn, Christof; Burwinkel, Barbara; Guénel, Pascal; Truong, Thérèse; Cordina-Duverger, Emilie; Menegaux, Florence; Bojesen, Stig E; Nordestgaard, Børge G; Nielsen, Sune F; Flyger, Henrik; Milne, Roger L; Alonso, M Rosario; González-Neira, Anna; Benítez, Javier; Anton-Culver, Hoda; Ziogas, Argyrios; Bernstein, Leslie; Dur, Christina Clarke; Brenner, Hermann; Müller, Heiko; Arndt, Volker; Stegmaier, Christa; Justenhoven, Christina; Brauch, Hiltrud; Brüning, Thomas; Wang-Gohrke, Shan; Eilber, Ursula; Dörk, Thilo; Schürmann, Peter; Bremer, Michael; Hillemanns, Peter; Bogdanova, Natalia V; Antonenkova, Natalia N; Rogov, Yuri I; Karstens, Johann H; Bermisheva, Marina; Prokofieva, Darya; Khusnutdinova, Elza; Lindblom, Annika; Margolin, Sara; Mannermaa, Arto; Kataja, Vesa; Kosma, Veli-Matti; Hartikainen, Jaana M; Lambrechts, Diether; Yesilyurt, Betul T; Floris, Giuseppe; Leunen, Karin; Manoukian, Siranoush; Bonanni, Bernardo; Fortuzzi, Stefano; Peterlongo, Paolo; Couch, Fergus J; Wang, Xianshu; Stevens, Kristen; Lee, Adam; Giles, Graham G; Baglietto, Laura; Severi, Gianluca; McLean, Catriona; Alnaes, Grethe Grenaker; Kristensen, Vessela; Børrensen-Dale, Anne-Lise; John, Esther M; Miron, Alexander; Winqvist, Robert; Pylkäs, Katri; Jukkola-Vuorinen, Arja; Kauppila, Saila; Andrulis, Irene L; Glendon, Gord; Mulligan, Anna Marie; Devilee, Peter; van Asperen, Christie J; Tollenaar, Rob A E M; Seynaeve, Caroline; Figueroa, Jonine D; Garcia-Closas, Montserrat; Brinton, Louise; Lissowska, Jolanta; Hooning, Maartje J; Hollestelle, Antoinette; Oldenburg, Rogier A; van den Ouweland, Ans M W; Cox, Angela; Reed, Malcolm W R; Shah, Mitul; Jakubowska, Ania; Lubinski, Jan; Jaworska, Katarzyna; Durda, Katarzyna; Jones, Michael; Schoemaker, Minouk; Ashworth, Alan; Swerdlow, Anthony; Beesley, Jonathan; Chen, Xiaoqing; Muir, Kenneth R; Lophatananon, Artitaya; Rattanamongkongul, Suthee; Chaiwerawattana, Arkom; Kang, Daehee; Yoo, Keun-Young; Noh, Dong-Young; Shen, Chen-Yang; Yu, Jyh-Cherng; Wu, Pei-Ei; Hsiung, Chia-Ni; Perkins, Annie; Swann, Ruth; Velentzis, Louiza; Eccles, Diana M; Tapper, Will J; Gerty, Susan M; Graham, Nikki J; Ponder, Bruce A J; Chenevix-Trench, Georgia; Pharoah, Paul D P; Lathrop, Mark; Dunning, Alison M; Rahman, Nazneen; Peto, Julian; Easton, Douglas F

    2012-01-22

    Breast cancer is the most common cancer among women. To date, 22 common breast cancer susceptibility loci have been identified accounting for ∼8% of the heritability of the disease. We attempted to replicate 72 promising associations from two independent genome-wide association studies (GWAS) in ∼70,000 cases and ∼68,000 controls from 41 case-control studies and 9 breast cancer GWAS. We identified three new breast cancer risk loci at 12p11 (rs10771399; P = 2.7 × 10(-35)), 12q24 (rs1292011; P = 4.3 × 10(-19)) and 21q21 (rs2823093; P = 1.1 × 10(-12)). rs10771399 was associated with similar relative risks for both estrogen receptor (ER)-negative and ER-positive breast cancer, whereas the other two loci were associated only with ER-positive disease. Two of the loci lie in regions that contain strong plausible candidate genes: PTHLH (12p11) has a crucial role in mammary gland development and the establishment of bone metastasis in breast cancer, and NRIP1 (21q21) encodes an ER cofactor and has a role in the regulation of breast cancer cell growth.

  20. Genome-wide association analysis identifies three new breast cancer susceptibility loci

    PubMed Central

    Ghoussaini, Maya; Fletcher, Olivia; Michailidou, Kyriaki; Turnbull, Clare; Schmidt, Marjanka K; Dicks, Ed; Dennis, Joe; Wang, Qin; Humphreys, Manjeet K; Luccarini, Craig; Baynes, Caroline; Conroy, Don; Maranian, Melanie; Ahmed, Shahana; Driver, Kristy; Johnson, Nichola; Orr, Nicholas; Silva, Isabel dos Santos; Waisfisz, Quinten; Meijers-Heijboer, Hanne; Uitterlinden, Andre G.; Rivadeneira, Fernando; Hall, Per; Czene, Kamila; Irwanto, Astrid; Liu, Jianjun; Nevanlinna, Heli; Aittomäki, Kristiina; Blomqvist, Carl; Meindl, Alfons; Schmutzler, Rita K; Müller-Myhsok, Bertram; Lichtner, Peter; Chang-Claude, Jenny; Hein, Rebecca; Nickels, Stefan; Flesch-Janys, Dieter; Tsimiklis, Helen; Makalic, Enes; Schmidt, Daniel; Bui, Minh; Hopper, John L; Apicella, Carmel; Park, Daniel J; Southey, Melissa; Hunter, David J; Chanock, Stephen J; Broeks, Annegien; Verhoef, Senno; Hogervorst, Frans BL; Fasching, Peter A.; Lux, Michael P.; Beckmann, Matthias W.; Ekici, Arif B.; Sawyer, Elinor; Tomlinson, Ian; Kerin, Michael; Marme, Frederik; Schneeweiss, Andreas; Sohn, Christof; Burwinkel, Barbara; Guénel, Pascal; Truong, Thérèse; Cordina-Duverger, Emilie; Menegaux, Florence; Bojesen, Stig E; Nordestgaard, Børge G; Nielsen, Sune F; Flyger, Henrik; Milne, Roger L.; Alonso, M. Rosario; González-Neira, Anna; Benítez, Javier; Anton-Culver, Hoda; Ziogas, Argyrios; Bernstein, Leslie; Dur, Christina Clarke; Brenner, Hermann; Müller, Heiko; Arndt, Volker; Stegmaier, Christa; Justenhoven, Christina; Brauch, Hiltrud; Brüning, Thomas; Wang-Gohrke, Shan; Eilber, Ursula; Dörk, Thilo; Schürmann, Peter; Bremer, Michael; Hillemanns, Peter; Bogdanova, Natalia V.; Antonenkova, Natalia N.; Rogov, Yuri I.; Karstens, Johann H.; Bermisheva, Marina; Prokofieva, Darya; Khusnutdinova, Elza; Lindblom, Annika; Margolin, Sara; Mannermaa, Arto; Kataja, Vesa; Kosma, Veli-Matti; Hartikainen, Jaana M; Lambrechts, Diether; Yesilyurt, Betul T.; Floris, Giuseppe; Leunen, Karin; Manoukian, Siranoush; Bonanni, Bernardo; Fortuzzi, Stefano; Peterlongo, Paolo; Couch, Fergus J; Wang, Xianshu; Stevens, Kristen; Lee, Adam; Giles, Graham G.; Baglietto, Laura; Severi, Gianluca; McLean, Catriona; Alnæs, Grethe Grenaker; Kristensen, Vessela; Børrensen-Dale, Anne-Lise; John, Esther M.; Miron, Alexander; Winqvist, Robert; Pylkäs, Katri; Jukkola-Vuorinen, Arja; Kauppila, Saila; Andrulis, Irene L.; Glendon, Gord; Mulligan, Anna Marie; Devilee, Peter; van Asperen, Christie J.; Tollenaar, Rob A.E.M.; Seynaeve, Caroline; Figueroa, Jonine D; Garcia-Closas, Montserrat; Brinton, Louise; Lissowska, Jolanta; Hooning, Maartje J.; Hollestelle, Antoinette; Oldenburg, Rogier A.; van den Ouweland, Ans M.W.; Cox, Angela; Reed, Malcolm WR; Shah, Mitul; Jakubowska, Ania; Lubinski, Jan; Jaworska, Katarzyna; Durda, Katarzyna; Jones, Michael; Schoemaker, Minouk; Ashworth, Alan; Swerdlow, Anthony; Beesley, Jonathan; Chen, Xiaoqing; Muir, Kenneth R; Lophatananon, Artitaya; Rattanamongkongul, Suthee; Chaiwerawattana, Arkom; Kang, Daehee; Yoo, Keun-Young; Noh, Dong-Young; Shen, Chen-Yang; Yu, Jyh-Cherng; Wu, Pei-Ei; Hsiung, Chia-Ni; Perkins, Annie; Swann, Ruth; Velentzis, Louiza; Eccles, Diana M; Tapper, Will J; Gerty, Susan M; Graham, Nikki J; Ponder, Bruce A. J.; Chenevix-Trench, Georgia; Pharoah, Paul D.P.; Lathrop, Mark; Dunning, Alison M.; Rahman, Nazneen; Peto, Julian; Easton, Douglas F

    2013-01-01

    Breast cancer is the most common cancer among women. To date, 22 common breast cancer susceptibility loci have been identified accounting for ~ 8% of the heritability of the disease. We followed up 72 promising associations from two independent Genome Wide Association Studies (GWAS) in ~70,000 cases and ~68,000 controls from 41 case-control studies and nine breast cancer GWAS. We identified three new breast cancer risk loci on 12p11 (rs10771399; P=2.7 × 10−35), 12q24 (rs1292011; P=4.3×10−19) and 21q21 (rs2823093; P=1.1×10−12). SNP rs10771399 was associated with similar relative risks for both estrogen receptor (ER)-negative and ER-positive breast cancer, whereas the other two loci were associated only with ER-positive disease. Two of the loci lie in regions that contain strong plausible candidate genes: PTHLH (12p11) plays a crucial role in mammary gland development and the establishment of bone metastasis in breast cancer, while NRIP1 (21q21) encodes an ER co-factor and has a role in the regulation of breast cancer cell growth. PMID:22267197

  1. Genome scans on experimentally evolved populations reveal candidate regions for adaptation to plant resistance in the potato cyst nematode Globodera pallida.

    PubMed

    Eoche-Bosy, D; Gautier, M; Esquibet, M; Legeai, F; Bretaudeau, A; Bouchez, O; Fournet, S; Grenier, E; Montarry, J

    2017-09-01

    Improving resistance durability involves to be able to predict the adaptation speed of pathogen populations. Identifying the genetic bases of pathogen adaptation to plant resistances is a useful step to better understand and anticipate this phenomenon. Globodera pallida is a major pest of potato crop for which a resistance QTL, GpaV vrn , has been identified in Solanum vernei. However, its durability is threatened as G. pallida populations are able to adapt to the resistance in few generations. The aim of this study was to investigate the genomic regions involved in the resistance breakdown by coupling experimental evolution and high-density genome scan. We performed a whole-genome resequencing of pools of individuals (Pool-Seq) belonging to G. pallida lineages derived from two independent populations having experimentally evolved on susceptible and resistant potato cultivars. About 1.6 million SNPs were used to perform the genome scan using a recent model testing for adaptive differentiation and association to population-specific covariables. We identified 275 outliers and 31 of them, which also showed a significant reduction in diversity in adapted lineages, were investigated for their genic environment. Some candidate genomic regions contained genes putatively encoding effectors and were enriched in SPRYSECs, known in cyst nematodes to be involved in pathogenicity and in (a)virulence. Validated candidate SNPs will provide a useful molecular tool to follow frequencies of virulence alleles in natural G. pallida populations and define efficient strategies of use of potato resistances maximizing their durability. © 2017 John Wiley & Sons Ltd.

  2. Comparative Genomics of 12 Strains of Erwinia amylovora Identifies a Pan-Genome with a Large Conserved Core

    PubMed Central

    Mann, Rachel A.; Smits, Theo H. M.; Bühlmann, Andreas; Blom, Jochen; Goesmann, Alexander; Frey, Jürg E.; Plummer, Kim M.; Beer, Steven V.; Luck, Joanne; Duffy, Brion; Rodoni, Brendan

    2013-01-01

    The plant pathogen Erwinia amylovora can be divided into two host-specific groupings; strains infecting a broad range of hosts within the Rosaceae subfamily Spiraeoideae (e.g., Malus, Pyrus, Crataegus, Sorbus) and strains infecting Rubus (raspberries and blackberries). Comparative genomic analysis of 12 strains representing distinct populations (e.g., geographic, temporal, host origin) of E. amylovora was used to describe the pan-genome of this major pathogen. The pan-genome contains 5751 coding sequences and is highly conserved relative to other phytopathogenic bacteria comprising on average 89% conserved, core genes. The chromosomes of Spiraeoideae-infecting strains were highly homogeneous, while greater genetic diversity was observed between Spiraeoideae- and Rubus-infecting strains (and among individual Rubus-infecting strains), the majority of which was attributed to variable genomic islands. Based on genomic distance scores and phylogenetic analysis, the Rubus-infecting strain ATCC BAA-2158 was genetically more closely related to the Spiraeoideae-infecting strains of E. amylovora than it was to the other Rubus-infecting strains. Analysis of the accessory genomes of Spiraeoideae- and Rubus-infecting strains has identified putative host-specific determinants including variation in the effector protein HopX1Ea and a putative secondary metabolite pathway only present in Rubus-infecting strains. PMID:23409014

  3. Comparative genomics of 12 strains of Erwinia amylovora identifies a pan-genome with a large conserved core.

    PubMed

    Mann, Rachel A; Smits, Theo H M; Bühlmann, Andreas; Blom, Jochen; Goesmann, Alexander; Frey, Jürg E; Plummer, Kim M; Beer, Steven V; Luck, Joanne; Duffy, Brion; Rodoni, Brendan

    2013-01-01

    The plant pathogen Erwinia amylovora can be divided into two host-specific groupings; strains infecting a broad range of hosts within the Rosaceae subfamily Spiraeoideae (e.g., Malus, Pyrus, Crataegus, Sorbus) and strains infecting Rubus (raspberries and blackberries). Comparative genomic analysis of 12 strains representing distinct populations (e.g., geographic, temporal, host origin) of E. amylovora was used to describe the pan-genome of this major pathogen. The pan-genome contains 5751 coding sequences and is highly conserved relative to other phytopathogenic bacteria comprising on average 89% conserved, core genes. The chromosomes of Spiraeoideae-infecting strains were highly homogeneous, while greater genetic diversity was observed between Spiraeoideae- and Rubus-infecting strains (and among individual Rubus-infecting strains), the majority of which was attributed to variable genomic islands. Based on genomic distance scores and phylogenetic analysis, the Rubus-infecting strain ATCC BAA-2158 was genetically more closely related to the Spiraeoideae-infecting strains of E. amylovora than it was to the other Rubus-infecting strains. Analysis of the accessory genomes of Spiraeoideae- and Rubus-infecting strains has identified putative host-specific determinants including variation in the effector protein HopX1(Ea) and a putative secondary metabolite pathway only present in Rubus-infecting strains.

  4. Genome-wide analysis identifies 12 loci influencing human reproductive behavior

    PubMed Central

    Barban, Nicola; Jansen, Rick; de Vlaming, Ronald; Vaez, Ahmad; Mandemakers, Jornt J.; Tropf, Felix C.; Shen, Xia; Wilson, James F.; Chasman, Daniel I.; Nolte, Ilja M.; Tragante, Vinicius; van der Laan, Sander W.; Perry, John R. B.; Kong, Augustine; Ahluwalia, Tarunveer; Albrecht, Eva; Yerges-Armstrong, Laura; Atzmon, Gil; Auro, Kirsi; Ayers, Kristin; Bakshi, Andrew; Ben-Avraham, Danny; Berger, Klaus; Bergman, Aviv; Bertram, Lars; Bielak, Lawrence F.; Bjornsdottir, Gyda; Bonder, Marc Jan; Broer, Linda; Bui, Minh; Barbieri, Caterina; Cavadino, Alana; Chavarro, Jorge E; Turman, Constance; Concas, Maria Pina; Cordell, Heather J.; Davies, Gail; Eibich, Peter; Eriksson, Nicholas; Esko, Tõnu; Eriksson, Joel; Falahi, Fahimeh; Felix, Janine F.; Fontana, Mark Alan; Franke, Lude; Gandin, Ilaria; Gaskins, Audrey J.; Gieger, Christian; Gunderson, Erica P.; Guo, Xiuqing; Hayward, Caroline; He, Chunyan; Hofer, Edith; Huang, Hongyan; Joshi, Peter K.; Kanoni, Stavroula; Karlsson, Robert; Kiechl, Stefan; Kifley, Annette; Kluttig, Alexander; Kraft, Peter; Lagou, Vasiliki; Lecoeur, Cecile; Lahti, Jari; Li-Gao, Ruifang; Lind, Penelope A.; Liu, Tian; Makalic, Enes; Mamasoula, Crysovalanto; Matteson, Lindsay; Mbarek, Hamdi; McArdle, Patrick F.; McMahon, George; Meddens, S. Fleur W.; Mihailov, Evelin; Miller, Mike; Missmer, Stacey A.; Monnereau, Claire; van der Most, Peter J.; Myhre, Ronny; Nalls, Mike A.; Nutile, Teresa; Panagiota, Kalafati Ioanna; Porcu, Eleonora; Prokopenko, Inga; Rajan, Kumar B.; Rich-Edwards, Janet; Rietveld, Cornelius A.; Robino, Antonietta; Rose, Lynda M.; Rueedi, Rico; Ryan, Kathy; Saba, Yasaman; Schmidt, Daniel; Smith, Jennifer A.; Stolk, Lisette; Streeten, Elizabeth; Tonjes, Anke; Thorleifsson, Gudmar; Ulivi, Sheila; Wedenoja, Juho; Wellmann, Juergen; Willeit, Peter; Yao, Jie; Yengo, Loic; Zhao, Jing Hua; Zhao, Wei; Zhernakova, Daria V.; Amin, Najaf; Andrews, Howard; Balkau, Beverley; Barzilai, Nir; Bergmann, Sven; Biino, Ginevra; Bisgaard, Hans; Bønnelykke, Klaus; Boomsma, Dorret I.; Buring, Julie E.; Campbell, Harry; Cappellani, Stefania; Ciullo, Marina; Cox, Simon R.; Cucca, Francesco; Daniela, Toniolo; Davey-Smith, George; Deary, Ian J.; Dedoussis, George; Deloukas, Panos; van Duijn, Cornelia M.; de Geus, Eco JC.; Eriksson, Johan G.; Evans, Denis A.; Faul, Jessica D.; Felicita, Sala Cinzia; Froguel, Philippe; Gasparini, Paolo; Girotto, Giorgia; Grabe, Hans-Jörgen; Greiser, Karin Halina; Groenen, Patrick J.F.; de Haan, Hugoline G.; Haerting, Johannes; Harris, Tamara B.; Heath, Andrew C.; Heikkilä, Kauko; Hofman, Albert; Homuth, Georg; Holliday, Elizabeth G; Hopper, John; Hypponen, Elina; Jacobsson, Bo; Jaddoe, Vincent W. V.; Johannesson, Magnus; Jugessur, Astanand; Kähönen, Mika; Kajantie, Eero; Kardia, Sharon L.R.; Keavney, Bernard; Kolcic, Ivana; Koponen, Päivikki; Kovacs, Peter; Kronenberg, Florian; Kutalik, Zoltan; La Bianca, Martina; Lachance, Genevieve; Iacono, William; Lai, Sandra; Lehtimäki, Terho; Liewald, David C; Lindgren, Cecilia; Liu, Yongmei; Luben, Robert; Lucht, Michael; Luoto, Riitta; Magnus, Per; Magnusson, Patrik K.E.; Martin, Nicholas G.; McGue, Matt; McQuillan, Ruth; Medland, Sarah E.; Meisinger, Christa; Mellström, Dan; Metspalu, Andres; Michela, Traglia; Milani, Lili; Mitchell, Paul; Montgomery, Grant W.; Mook-Kanamori, Dennis; de Mutsert, Renée; Nohr, Ellen A; Ohlsson, Claes; Olsen, Jørn; Ong, Ken K.; Paternoster, Lavinia; Pattie, Alison; Penninx, Brenda WJH; Perola, Markus; Peyser, Patricia A.; Pirastu, Mario; Polasek, Ozren; Power, Chris; Kaprio, Jaakko; Raffel, Leslie J.; Räikkönen, Katri; Raitakari, Olli; Ridker, Paul M.; Ring, Susan M.; Roll, Kathryn; Rudan, Igor; Ruggiero, Daniela; Rujescu, Dan; Salomaa, Veikko; Schlessinger, David; Schmidt, Helena; Schmidt, Reinhold; Schupf, Nicole; Smit, Johannes; Sorice, Rossella; Spector, Tim D.; Starr, John M.; Stöckl, Doris; Strauch, Konstantin; Stumvoll, Michael; Swertz, Morris A.; Thorsteinsdottir, Unnur; Thurik, A. Roy; Timpson, Nicholas J.; Tönjes, Anke; Tung, Joyce Y.; Uitterlinden, André G.; Vaccargiu, Simona; Viikari, Jorma; Vitart, Veronique; Völzke, Henry; Vollenweider, Peter; Vuckovic, Dragana; Waage, Johannes; Wagner, Gert G.; Wang, Jie Jin; Wareham, Nicholas J.; Weir, David R.; Willemsen, Gonneke; Willeit, Johann; Wright, Alan F.; Zondervan, Krina T.; Stefansson, Kari; Krueger, Robert F.; Lee, James J.; Benjamin, Daniel J.; Cesarini, David; Koellinger, Philipp D.; den Hoed, Marcel; Snieder, Harold; Mills, Melinda C.

    2017-01-01

    The genetic architecture of human reproductive behavior – age at first birth (AFB) and number of children ever born (NEB) – has a strong relationship with fitness, human development, infertility and risk of neuropsychiatric disorders. However, very few genetic loci have been identified and the underlying mechanisms of AFB and NEB are poorly understood. We report the largest genome-wide association study to date of both sexes including 251,151 individuals for AFB and 343,072 for NEB. We identified 12 independent loci that are significantly associated with AFB and/or NEB in a SNP-based genome-wide association study, and four additional loci in a gene-based effort. These loci harbor genes that are likely to play a role – either directly or by affecting non-local gene expression – in human reproduction and infertility, thereby increasing our understanding of these complex traits. PMID:27798627

  5. Benchmarking database performance for genomic data.

    PubMed

    Khushi, Matloob

    2015-06-01

    Genomic regions represent features such as gene annotations, transcription factor binding sites and epigenetic modifications. Performing various genomic operations such as identifying overlapping/non-overlapping regions or nearest gene annotations are common research needs. The data can be saved in a database system for easy management, however, there is no comprehensive database built-in algorithm at present to identify overlapping regions. Therefore I have developed a novel region-mapping (RegMap) SQL-based algorithm to perform genomic operations and have benchmarked the performance of different databases. Benchmarking identified that PostgreSQL extracts overlapping regions much faster than MySQL. Insertion and data uploads in PostgreSQL were also better, although general searching capability of both databases was almost equivalent. In addition, using the algorithm pair-wise, overlaps of >1000 datasets of transcription factor binding sites and histone marks, collected from previous publications, were reported and it was found that HNF4G significantly co-locates with cohesin subunit STAG1 (SA1).Inc. © 2015 Wiley Periodicals, Inc.

  6. Identifying Potential Regions of Copy Number Variation for Bipolar Disorder

    PubMed Central

    Chen, Yi-Hsuan; Lu, Ru-Band; Hung, Hung; Kuo, Po-Hsiu

    2014-01-01

    Bipolar disorder is a complex psychiatric disorder with high heritability, but its genetic determinants are still largely unknown. Copy number variation (CNV) is one of the sources to explain part of the heritability. However, it is a challenge to estimate discrete values of the copy numbers using continuous signals calling from a set of markers, and to simultaneously perform association testing between CNVs and phenotypic outcomes. The goal of the present study is to perform a series of data filtering and analysis procedures using a DNA pooling strategy to identify potential CNV regions that are related to bipolar disorder. A total of 200 normal controls and 200 clinically diagnosed bipolar patients were recruited in this study, and were randomly divided into eight control and eight case pools. Genome-wide genotyping was employed using Illumina Human Omni1-Quad array with approximately one million markers for CNV calling. We aimed at setting a series of criteria to filter out the signal noise of marker data and to reduce the chance of false-positive findings for CNV regions. We first defined CNV regions for each pool. Potential CNV regions were reported based on the different patterns of CNV status between cases and controls. Genes that were mapped into the potential CNV regions were examined with association testing, Gene Ontology enrichment analysis, and checked with existing literature for their associations with bipolar disorder. We reported several CNV regions that are related to bipolar disorder. Two CNV regions on chromosome 11 and 22 showed significant signal differences between cases and controls (p < 0.05). Another five CNV regions on chromosome 6, 9, and 19 were overlapped with results in previous CNV studies. Experimental validation of two CNV regions lent some support to our reported findings. Further experimental and replication studies could be designed for these selected regions. PMID:27605030

  7. A genome-wide association study for somatic cell score using the Illumina high-density bovine beadchip identifies several novel QTL potentially related to mastitis susceptibility

    PubMed Central

    Meredith, Brian K.; Berry, Donagh P.; Kearney, Francis; Finlay, Emma K.; Fahey, Alan G.; Bradley, Daniel G.; Lynn, David J.

    2013-01-01

    Mastitis is an inflammation-driven disease of the bovine mammary gland that occurs in response to physical damage or infection and is one of the most costly production-related diseases in the dairy industry worldwide. We performed a genome-wide association study (GWAS) to identify genetic loci associated with somatic cell score (SCS), an indicator trait of mammary gland inflammation. A total of 702 Holstein-Friesian bulls were genotyped for 777,962 single nucleotide polymorphisms (SNPs) and associated with SCS phenotypes. The SCS phenotypes were expressed as daughter yield deviations (DYD) based on a large number of progeny performance records. A total of 138 SNPs on 15 different chromosomes reached genome-wide significance (corrected p-value ≤ 0.05) for association with SCS (after correction for multiple testing). We defined 28 distinct QTL regions and a number of candidate genes located in these QTL regions were identified. The most significant association (p-value = 1.70 × 10−7) was observed on chromosome 6. This QTL had no known genes annotated within it, however, the Ensembl Genome Browser predicted the presence of a small non-coding RNA (a Y RNA gene) in this genomic region. This Y RNA gene was 99% identical to human RNY4. Y RNAs are a rare type of non-coding RNA that were originally discovered due to their association with the autoimmune disease, systemic lupus erythematosus. Examining small-RNA sequencing (RNAseq) data being generated by us in multiple different mastitis-pathogen challenged cell-types has revealed that this Y RNA is expressed (but not differentially expressed) in these cells. Other QTL regions identified in this study also encoded strong candidate genes for mastitis susceptibility. A QTL region on chromosome 13, for example, was found to contain a cluster of β-defensin genes, a gene family with known roles in innate immunity. Due to the increased SNP density, this study also refined the boundaries for several known QTL for SCS and

  8. Genomic Alterations in Biliary Atresia Suggests Region of Potential Disease Susceptibility in 2q37.3

    PubMed Central

    Leyva-Vega, Melissa; Gerfen, Jennifer; Thiel, Brian D.; Jurkiewicz, Dorota; Rand, Elizabeth B.; Pawlowska, Joanna; Kaminska, Diana; Russo, Pierre; Gai, Xiaowu; Krantz, Ian D.; Kamath, Binita M.; Hakonarson, Hakon; Haber, Barbara A.; Spinner, Nancy B.

    2010-01-01

    Biliary atresia (BA) is a progressive, idiopathic obliteration of the extrahepatic biliary system occurring exclusively in the neonatal period. It is the most common disease leading to liver transplantation in children. The etiology of BA is unknown, although infectious, immune and genetic causes have been suggested. While the recurrence of BA in families is not common, there are more than 30 multiplex families reported and an underlying genetic susceptibility has been hypothesized. We screened a cohort of 35 BA patients for genomic alterations that might confer susceptibility to BA. DNA was genotyped on the Illumina Quad550 platform, which analyzes over 550,000 single nucleotide polymorphisms (SNPs) for genomic deletions and duplications. Areas of increased and decreased copy number were compared to those found in control populations. In order to identify regions that could serve as susceptibility factors for BA, we searched for regions that were found in BA patients, but not in controls. We identified two unrelated BA patients with overlapping heterozygous deletions of 2q37.3. Patient 1 had a 1.76 Mb (280 SNP), heterozygous deletion containing thirty genes. Patient 2 had a 5.87 Mb (1,346 SNP) heterozygous deletion containing fifty-five genes. The overlapping 1.76 Mb deletion on chromosome 2q37.3 from 240,936,900 to 242,692,820 constitutes the critical region and the genes within this region could be candidates for susceptibility to BA. PMID:20358598

  9. Identification of genomic regions associated with feed efficiency in Nelore cattle.

    PubMed

    de Oliveira, Priscila S N; Cesar, Aline S M; do Nascimento, Michele L; Chaves, Amália S; Tizioto, Polyana C; Tullio, Rymer R; Lanna, Dante P D; Rosa, Antonio N; Sonstegard, Tad S; Mourao, Gerson B; Reecy, James M; Garrick, Dorian J; Mudadu, Maurício A; Coutinho, Luiz L; Regitano, Luciana C A

    2014-09-26

    Feed efficiency is jointly determined by productivity and feed requirements, both of which are economically relevant traits in beef cattle production systems. The objective of this study was to identify genes/QTLs associated with components of feed efficiency in Nelore cattle using Illumina BovineHD BeadChip (770 k SNP) genotypes from 593 Nelore steers. The traits analyzed included: average daily gain (ADG), dry matter intake (DMI), feed-conversion ratio (FCR), feed efficiency (FE), residual feed intake (RFI), maintenance efficiency (ME), efficiency of gain (EG), partial efficiency of growth (PEG) and relative growth rate (RGR). The Bayes B analysis was completed with Gensel software parameterized to fit fewer markers than animals. Genomic windows containing all the SNP loci in each 1 Mb that accounted for more than 1.0% of genetic variance were considered as QTL region. Candidate genes within windows that explained more than 1% of genetic variance were selected by putative function based on DAVID and Gene Ontology. Thirty-six QTL (1-Mb SNP window) were identified on chromosomes 1, 2, 3, 5, 6, 7, 8, 9, 10, 12, 14, 15, 16, 18, 19, 20, 21, 22, 24, 25 and 26 (UMD 3.1). The amount of genetic variance explained by individual QTL windows for feed efficiency traits ranged from 0.5% to 9.07%. Some of these QTL minimally overlapped with previously reported feed efficiency QTL for Bos taurus. The QTL regions described in this study harbor genes with biological functions related to metabolic processes, lipid and protein metabolism, generation of energy and growth. Among the positional candidate genes selected for feed efficiency are: HRH4, ALDH7A1, APOA2, LIN7C, CXADR, ADAM12 and MAP7. Some genomic regions and some positional candidate genes reported in this study have not been previously reported for feed efficiency traits in Bos indicus. Comparison with published results indicates that different QTLs and genes may be involved in the control of feed efficiency traits in this

  10. A survey of genome-wide single nucleotide polymorphisms through genome resequencing in the Périgord black truffle (Tuber melanosporum Vittad.).

    PubMed

    Payen, Thibaut; Murat, Claude; Gigant, Anaïs; Morin, Emmanuelle; De Mita, Stéphane; Martin, Francis

    2015-09-01

    The Périgord black truffle (Tuber melanosporum Vittad.), considered a gastronomic delicacy worldwide, is an ectomycorrhizal filamentous fungus that is ecologically important in Mediterranean French, Italian and Spanish woodlands. In this study, we developed a novel resource of single nucleotide polymorphisms (SNPs) for T. melanosporum using Illumina high-throughput resequencing. The genome from six T. melanosporum geographical accessions was sequenced to a depth of approximately 20×. These geographical accessions were selected from different populations within the northern and southern regions of the geographical species distribution. Approximately 80% of the reads for each of the six resequenced geographical accessions mapped against the reference T. melanosporum genome assembly, estimating the core genome size of this organism to be approximately 110 Mbp. A total of 442 326 SNPs corresponding to 3540 SNPs/Mbps were identified as being included in all seven genomes. The SNPs occurred more frequently in repeated sequences (85%), although 4501 SNPs were also identified in the coding regions of 2587 genes. Using the ratio of nonsynonymous mutations per nonsynonymous site (pN) to synonymous mutations per synonymous site (pS) and Tajima's D index scanning the whole genome, we were able to identify genomic regions and genes potentially subjected to positive or purifying selection. The SNPs identified represent a valuable resource for future population genetics and genomics studies. © 2015 John Wiley & Sons Ltd.

  11. Discovering genetic variants in Crohn's disease by exploring genomic regions enriched of weak association signals.

    PubMed

    D'Addabbo, Annarita; Palmieri, Orazio; Maglietta, Rosalia; Latiano, Anna; Mukherjee, Sayan; Annese, Vito; Ancona, Nicola

    2011-08-01

    A meta-analysis has re-analysed previous genome-wide association scanning definitively confirming eleven genes and further identifying 21 new loci. However, the identified genes/loci still explain only the minority of genetic predisposition of Crohn's disease. To identify genes weakly involved in disease predisposition by analysing chromosomal regions enriched of single nucleotide polymorphisms with modest statistical association. We utilized the WTCCC data set evaluating 1748 CD and 2938 controls. The identification of candidate genes/loci was performed by a two-step procedure: first of all chromosomal regions enriched of weak association signals were localized; subsequently, weak signals clustered in gene regions were identified. The statistical significance was assessed by non parametric permutation tests. The cytoband enrichment analysis highlighted 44 regions (P≤0.05) enriched with single nucleotide polymorphisms significantly associated with the trait including 23 out of 31 previously confirmed and replicated genes. Importantly, we highlight further 20 novel chromosomal regions carrying approximately one hundred genes/loci with modest association. Amongst these we find compelling functional candidate genes such as MAPT, GRB2 and CREM, LCT, and IL12RB2. Our study suggests a different statistical perspective to discover genes weakly associated with a given trait, although further confirmatory functional studies are needed. Copyright © 2011 Editrice Gastroenterologica Italiana S.r.l. All rights reserved.

  12. Gross rearrangements within the 5'-untranslated region of the picornaviral genomes.

    PubMed

    Pilipenko, E V; Blinov, V M; Agol, V I

    1990-06-11

    An analysis of reported nucleotide sequences revealed several cases of gross rearrangements in the 5'-untranslated region (5-UTR) of picornaviral genomes. A large (greater than 100 nt) duplication was discovered in a downstream region of poliovirus 5-UTR involved in the translational control. Properties of the poliovirus mutants with large deletions [Kuge and Nomoto (1987) J. Virol. 61, 1478-1487] show that a single copy of the appropriate repeating unit is compatible with a wild type phenotype of the virus. In contrast to poliovirus and another enterovirus genomes, human rhinovirus RNAs contain only a single copy of this repeating unit. Another similarly large repeat was found in an upstream segment of the bovine enterovirus 5-UTR. A comparison of the primary and secondary structures of cardio- and aphthovirus 5-UTRs demonstrated the existence of a large (ca. 250 nucleotides) insertion/deletion in a region preceding the poly(C) tract. The two latter rearrangements appear to involve elements of the viral genome replication machinery. Possible origin as well as evolutionary and functional implications of these structural peculiarities are discussed.

  13. Interpreting Mammalian Evolution using Fugu Genome Comparisons

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Stubbs, L; Ovcharenko, I; Loots, G G

    2004-04-02

    Comparative sequence analysis of the human and the pufferfish Fugu rubripes (fugu) genomes has revealed several novel functional coding and noncoding regions in the human genome. In particular, the fugu genome has been extremely valuable for identifying transcriptional regulatory elements in human loci harboring unusually high levels of evolutionary conservation to rodent genomes. In such regions, the large evolutionary distance between human and fishes provides an additional filter through which functional noncoding elements can be detected with high efficiency.

  14. A combinatorial approach of comprehensive QTL-based comparative genome mapping and transcript profiling identified a seed weight-regulating candidate gene in chickpea

    PubMed Central

    Bajaj, Deepak; Upadhyaya, Hari D.; Khan, Yusuf; Das, Shouvik; Badoni, Saurabh; Shree, Tanima; Kumar, Vinod; Tripathi, Shailesh; Gowda, C. L. L.; Singh, Sube; Sharma, Shivali; Tyagi, Akhilesh K.; Chattopdhyay, Debasis; Parida, Swarup K.

    2015-01-01

    High experimental validation/genotyping success rate (94–96%) and intra-specific polymorphic potential (82–96%) of 1536 SNP and 472 SSR markers showing in silico polymorphism between desi ICC 4958 and kabuli ICC 12968 chickpea was obtained in a 190 mapping population (ICC 4958 × ICC 12968) and 92 diverse desi and kabuli genotypes. A high-density 2001 marker-based intra-specific genetic linkage map comprising of eight LGs constructed is comparatively much saturated (mean map-density: 0.94 cM) in contrast to existing intra-specific genetic maps in chickpea. Fifteen robust QTLs (PVE: 8.8–25.8% with LOD: 7.0–13.8) associated with pod and seed number/plant (PN and SN) and 100 seed weight (SW) were identified and mapped on 10 major genomic regions of eight LGs. One of 126.8 kb major genomic region harbouring a strong SW-associated robust QTL (Caq'SW1.1: 169.1–171.3 cM) has been delineated by integrating high-resolution QTL mapping with comprehensive marker-based comparative genome mapping and differential expression profiling. This identified one potential regulatory SNP (G/A) in the cis-acting element of candidate ERF (ethylene responsive factor) TF (transcription factor) gene governing seed weight in chickpea. The functionally relevant molecular tags identified have potential to be utilized for marker-assisted genetic improvement of chickpea. PMID:25786576

  15. Indel-seq: a fast-forward genetics approach for identification of trait-associated putative candidate genomic regions and its application in pigeonpea (Cajanus cajan).

    PubMed

    Singh, Vikas K; Khan, Aamir W; Saxena, Rachit K; Sinha, Pallavi; Kale, Sandip M; Parupalli, Swathi; Kumar, Vinay; Chitikineni, Annapurna; Vechalapu, Suryanarayana; Sameer Kumar, Chanda Venkata; Sharma, Mamta; Ghanta, Anuradha; Yamini, Kalinati Narasimhan; Muniswamy, Sonnappa; Varshney, Rajeev K

    2017-07-01

    Identification of candidate genomic regions associated with target traits using conventional mapping methods is challenging and time-consuming. In recent years, a number of single nucleotide polymorphism (SNP)-based mapping approaches have been developed and used for identification of candidate/putative genomic regions. However, in the majority of these studies, insertion-deletion (Indel) were largely ignored. For efficient use of Indels in mapping target traits, we propose Indel-seq approach, which is a combination of whole-genome resequencing (WGRS) and bulked segregant analysis (BSA) and relies on the Indel frequencies in extreme bulks. Deployment of Indel-seq approach for identification of candidate genomic regions associated with fusarium wilt (FW) and sterility mosaic disease (SMD) resistance in pigeonpea has identified 16 Indels affecting 26 putative candidate genes. Of these 26 affected putative candidate genes, 24 genes showed effect in the upstream/downstream of the genic region and two genes showed effect in the genes. Validation of these 16 candidate Indels in other FW- and SMD-resistant and FW- and SMD-susceptible genotypes revealed a significant association of five Indels (three for FW and two for SMD resistance). Comparative analysis of Indel-seq with other genetic mapping approaches highlighted the importance of the approach in identification of significant genomic regions associated with target traits. Therefore, the Indel-seq approach can be used for quick and precise identification of candidate genomic regions for any target traits in any crop species. © 2016 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.

  16. Burkholderia pseudomallei sequencing identifies genomic clades with distinct recombination, accessory, and epigenetic profiles

    PubMed Central

    Nandi, Tannistha; Holden, Matthew T.G.; Didelot, Xavier; Mehershahi, Kurosh; Boddey, Justin A.; Beacham, Ifor; Peak, Ian; Harting, John; Baybayan, Primo; Guo, Yan; Wang, Susana; How, Lee Chee; Sim, Bernice; Essex-Lopresti, Angela; Sarkar-Tyson, Mitali; Nelson, Michelle; Smither, Sophie; Ong, Catherine; Aw, Lay Tin; Hoon, Chua Hui; Michell, Stephen; Studholme, David J.; Titball, Richard; Chen, Swaine L.; Parkhill, Julian

    2015-01-01

    Burkholderia pseudomallei (Bp) is the causative agent of the infectious disease melioidosis. To investigate population diversity, recombination, and horizontal gene transfer in closely related Bp isolates, we performed whole-genome sequencing (WGS) on 106 clinical, animal, and environmental strains from a restricted Asian locale. Whole-genome phylogenies resolved multiple genomic clades of Bp, largely congruent with multilocus sequence typing (MLST). We discovered widespread recombination in the Bp core genome, involving hundreds of regions associated with multiple haplotypes. Highly recombinant regions exhibited functional enrichments that may contribute to virulence. We observed clade-specific patterns of recombination and accessory gene exchange, and provide evidence that this is likely due to ongoing recombination between clade members. Reciprocally, interclade exchanges were rarely observed, suggesting mechanisms restricting gene flow between clades. Interrogation of accessory elements revealed that each clade harbored a distinct complement of restriction-modification (RM) systems, predicted to cause clade-specific patterns of DNA methylation. Using methylome sequencing, we confirmed that representative strains from separate clades indeed exhibit distinct methylation profiles. Finally, using an E. coli system, we demonstrate that Bp RM systems can inhibit uptake of non-self DNA. Our data suggest that RM systems borne on mobile elements, besides preventing foreign DNA invasion, may also contribute to limiting exchanges of genetic material between individuals of the same species. Genomic clades may thus represent functional units of genetic isolation in Bp, modulating intraspecies genetic diversity. PMID:25236617

  17. A Genome-Wide Association Study Identifies Risk Loci to Equine Recurrent Uveitis in German Warmblood Horses

    PubMed Central

    Kulbrock, Maike; Lehner, Stefanie; Metzger, Julia; Ohnesorge, Bernhard; Distl, Ottmar

    2013-01-01

    Equine recurrent uveitis (ERU) is a common eye disease affecting up to 3–15% of the horse population. A genome-wide association study (GWAS) using the Illumina equine SNP50 bead chip was performed to identify loci conferring risk to ERU. The sample included a total of 144 German warmblood horses. A GWAS showed a significant single nucleotide polymorphism (SNP) on horse chromosome (ECA) 20 at 49.3 Mb, with IL-17A and IL-17F being the closest genes. This locus explained a fraction of 23% of the phenotypic variance for ERU. A GWAS taking into account the severity of ERU, revealed a SNP on ECA18 nearby to the crystalline gene cluster CRYGA-CRYGF. For both genomic regions on ECA18 and 20, significantly associated haplotypes containing the genome-wide significant SNPs could be demonstrated. In conclusion, our results are indicative for a genetic component regulating the possible critical role of IL-17A and IL-17F in the pathogenesis of ERU. The associated SNP on ECA18 may be indicative for cataract formation in the course of ERU. PMID:23977091

  18. Combining Genome-Scale Experimental and Computational Methods To Identify Essential Genes in Rhodobacter sphaeroides

    DOE PAGES

    Burger, Brian T.; Imam, Saheed; Scarborough, Matthew J.; ...

    2017-06-06

    Rhodobacter sphaeroides is one of the best-studied alphaproteobacteria from biochemical, genetic, and genomic perspectives. To gain a better systems-level understanding of this organism, we generated a large transposon mutant library and used transposon sequencing (Tn-seq) to identify genes that are essential under several growth conditions. Using newly developed Tn-seq analysis software (TSAS), we identified 493 genes as essential for aerobic growth on a rich medium. We then used the mutant library to identify conditionally essential genes under two laboratory growth conditions, identifying 85 additional genes required for aerobic growth in a minimal medium and 31 additional genes required for photosyntheticmore » growth. In all instances, our analyses confirmed essentiality for many known genes and identified genes not previously considered to be essential. We used the resulting Tn-seq data to refine and improve a genome-scale metabolic network model (GEM) for R. sphaeroides. Together, we demonstrate how genetic, genomic, and computational approaches can be combined to obtain a systems-level understanding of the genetic framework underlying metabolic diversity in bacterial species.« less

  19. A segment of the apospory-specific genomic region is highly microsyntenic not only between the apomicts Pennisetum squamulatum and buffelgrass, but also with a rice chromosome 11 centromeric-proximal genomic region.

    PubMed

    Gualtieri, Gustavo; Conner, Joann A; Morishige, Daryl T; Moore, L David; Mullet, John E; Ozias-Akins, Peggy

    2006-03-01

    Bacterial artificial chromosome (BAC) clones from apomicts Pennisetum squamulatum and buffelgrass (Cenchrus ciliaris), isolated with the apospory-specific genomic region (ASGR) marker ugt197, were assembled into contigs that were extended by chromosome walking. Gene-like sequences from contigs were identified by shotgun sequencing and BLAST searches, and used to isolate orthologous rice contigs. Additional gene-like sequences in the apomicts' contigs were identified by bioinformatics using fully sequenced BACs from orthologous rice contigs as templates, as well as by interspecies, whole-contig cross-hybridizations. Hierarchical contig orthology was rapidly assessed by constructing detailed long-range contig molecular maps showing the distribution of gene-like sequences and markers, and searching for microsyntenic patterns of sequence identity and spatial distribution within and across species contigs. We found microsynteny between P. squamulatum and buffelgrass contigs. Importantly, this approach also enabled us to isolate from within the rice (Oryza sativa) genome contig Rice A, which shows the highest microsynteny and is most orthologous to the ugt197-containing C1C buffelgrass contig. Contig Rice A belongs to the rice genome database contig 77 (according to the current September 12, 2003, rice fingerprint contig build) that maps proximal to the chromosome 11 centromere, a feature that interestingly correlates with the mapping of ASGR-linked BACs proximal to the centromere or centromere-like sequences. Thus, relatedness between these two orthologous contigs is supported both by their molecular microstructure and by their centromeric-proximal location. Our discoveries promote the use of a microsynteny-based positional-cloning approach using the rice genome as a template to aid in constructing the ASGR toward the isolation of genes underlying apospory.

  20. Fine organization of genomic regions tagged to the 5S rDNA locus of the bread wheat 5B chromosome.

    PubMed

    Sergeeva, Ekaterina M; Shcherban, Andrey B; Adonina, Irina G; Nesterov, Michail A; Beletsky, Alexey V; Rakitin, Andrey L; Mardanov, Andrey V; Ravin, Nikolai V; Salina, Elena A

    2017-11-14

    The multigene family encoding the 5S rRNA, one of the most important structurally-functional part of the large ribosomal subunit, is an obligate component of all eukaryotic genomes. 5S rDNA has long been a favored target for cytological and phylogenetic studies due to the inherent peculiarities of its structural organization, such as the tandem arrays of repetitive units and their high interspecific divergence. The complex polyploid nature of the genome of bread wheat, Triticum aestivum, and the technically difficult task of sequencing clusters of tandem repeats mean that the detailed organization of extended genomic regions containing 5S rRNA genes remains unclear. This is despite the recent progress made in wheat genomic sequencing. Using pyrosequencing of BAC clones, in this work we studied the organization of two distinct 5S rDNA-tagged regions of the 5BS chromosome of bread wheat. Three BAC-clones containing 5S rDNA were identified in the 5BS chromosome-specific BAC-library of Triticum aestivum. Using the results of pyrosequencing and assembling, we obtained six 5S rDNA- containing contigs with a total length of 140,417 bp, and two sets (pools) of individual 5S rDNA sequences belonging to separate, but closely located genomic regions on the 5BS chromosome. Both regions are characterized by the presence of approximately 70-80 copies of 5S rDNA, however, they are completely different in their structural organization. The first region contained highly diverged short-type 5S rDNA units that were disrupted by multiple insertions of transposable elements. The second region contained the more conserved long-type 5S rDNA, organized as a single tandem array. FISH using probes specific to both 5S rDNA unit types showed differences in the distribution and intensity of signals on the chromosomes of polyploid wheat species and their diploid progenitors. A detailed structural organization of two closely located 5S rDNA-tagged genomic regions on the 5BS chromosome of bread

  1. Genome-wide study of resistant hypertension identified from electronic health records.

    PubMed

    Dumitrescu, Logan; Ritchie, Marylyn D; Denny, Joshua C; El Rouby, Nihal M; McDonough, Caitrin W; Bradford, Yuki; Ramirez, Andrea H; Bielinski, Suzette J; Basford, Melissa A; Chai, High Seng; Peissig, Peggy; Carrell, David; Pathak, Jyotishman; Rasmussen, Luke V; Wang, Xiaoming; Pacheco, Jennifer A; Kho, Abel N; Hayes, M Geoffrey; Matsumoto, Martha; Smith, Maureen E; Li, Rongling; Cooper-DeHoff, Rhonda M; Kullo, Iftikhar J; Chute, Christopher G; Chisholm, Rex L; Jarvik, Gail P; Larson, Eric B; Carey, David; McCarty, Catherine A; Williams, Marc S; Roden, Dan M; Bottinger, Erwin; Johnson, Julie A; de Andrade, Mariza; Crawford, Dana C

    2017-01-01

    Resistant hypertension is defined as high blood pressure that remains above treatment goals in spite of the concurrent use of three antihypertensive agents from different classes. Despite the important health consequences of resistant hypertension, few studies of resistant hypertension have been conducted. To perform a genome-wide association study for resistant hypertension, we defined and identified cases of resistant hypertension and hypertensives with treated, controlled hypertension among >47,500 adults residing in the US linked to electronic health records (EHRs) and genotyped as part of the electronic MEdical Records & GEnomics (eMERGE) Network. Electronic selection logic using billing codes, laboratory values, text queries, and medication records was used to identify resistant hypertension cases and controls at each site, and a total of 3,006 cases of resistant hypertension and 876 controlled hypertensives were identified among eMERGE Phase I and II sites. After imputation and quality control, a total of 2,530,150 SNPs were tested for an association among 2,830 multi-ethnic cases of resistant hypertension and 876 controlled hypertensives. No test of association was genome-wide significant in the full dataset or in the dataset limited to European American cases (n = 1,719) and controls (n = 708). The most significant finding was CLNK rs13144136 at p = 1.00x10-6 (odds ratio = 0.68; 95% CI = 0.58-0.80) in the full dataset with similar results in the European American only dataset. We also examined whether SNPs known to influence blood pressure or hypertension also influenced resistant hypertension. None was significant after correction for multiple testing. These data highlight both the difficulties and the potential utility of EHR-linked genomic data to study clinically-relevant traits such as resistant hypertension.

  2. Phylogenetic shadowing of primate sequences to find functional regions of the human genome.

    PubMed

    Boffelli, Dario; McAuliffe, Jon; Ovcharenko, Dmitriy; Lewis, Keith D; Ovcharenko, Ivan; Pachter, Lior; Rubin, Edward M

    2003-02-28

    Nonhuman primates represent the most relevant model organisms to understand the biology of Homo sapiens. The recent divergence and associated overall sequence conservation between individual members of this taxon have nonetheless largely precluded the use of primates in comparative sequence studies. We used sequence comparisons of an extensive set of Old World and New World monkeys and hominoids to identify functional regions in the human genome. Analysis of these data enabled the discovery of primate-specific gene regulatory elements and the demarcation of the exons of multiple genes. Much of the information content of the comprehensive primate sequence comparisons could be captured with a small subset of phylogenetically close primates. These results demonstrate the utility of intraprimate sequence comparisons to discover common mammalian as well as primate-specific functional elements in the human genome, which are unattainable through the evaluation of more evolutionarily distant species.

  3. A robust clustering algorithm for identifying problematic samples in genome-wide association studies.

    PubMed

    Bellenguez, Céline; Strange, Amy; Freeman, Colin; Donnelly, Peter; Spencer, Chris C A

    2012-01-01

    High-throughput genotyping arrays provide an efficient way to survey single nucleotide polymorphisms (SNPs) across the genome in large numbers of individuals. Downstream analysis of the data, for example in genome-wide association studies (GWAS), often involves statistical models of genotype frequencies across individuals. The complexities of the sample collection process and the potential for errors in the experimental assay can lead to biases and artefacts in an individual's inferred genotypes. Rather than attempting to model these complications, it has become a standard practice to remove individuals whose genome-wide data differ from the sample at large. Here we describe a simple, but robust, statistical algorithm to identify samples with atypical summaries of genome-wide variation. Its use as a semi-automated quality control tool is demonstrated using several summary statistics, selected to identify different potential problems, and it is applied to two different genotyping platforms and sample collections. The algorithm is written in R and is freely available at www.well.ox.ac.uk/chris-spencer chris.spencer@well.ox.ac.uk Supplementary data are available at Bioinformatics online.

  4. Using comparative genome analysis to identify problems in annotated microbial genomes.

    PubMed

    Poptsova, Maria S; Gogarten, J Peter

    2010-07-01

    Genome annotation is a tedious task that is mostly done by automated methods; however, the accuracy of these approaches has been questioned since the beginning of the sequencing era. Genome annotation is a multilevel process, and errors can emerge at different stages: during sequencing, as a result of gene-calling procedures, and in the process of assigning gene functions. Missed or wrongly annotated genes differentially impact different types of analyses. Here we discuss and demonstrate how the methods of comparative genome analysis can refine annotations by locating missing orthologues. We also discuss possible reasons for errors and show that the second-generation annotation systems, which combine multiple gene-calling programs with similarity-based methods, perform much better than the first annotation tools. Since old errors may propagate to the newly sequenced genomes, we emphasize that the problem of continuously updating popular public databases is an urgent and unresolved one. Due to the progress in genome-sequencing technologies, automated annotation techniques will remain the main approach in the future. Researchers need to be aware of the existing errors in the annotation of even well-studied genomes, such as Escherichia coli, and consider additional quality control for their results.

  5. Mammalian Comparative Genomics Reveals Genetic and Epigenetic Features Associated with Genome Reshuffling in Rodentia

    PubMed Central

    Capilla, Laia; Sánchez-Guillén, Rosa Ana; Farré, Marta; Paytuví-Gallart, Andreu; Malinverni, Roberto; Ventura, Jacint; Larkin, Denis M.

    2016-01-01

    Abstract Understanding how mammalian genomes have been reshuffled through structural changes is fundamental to the dynamics of its composition, evolutionary relationships between species and, in the long run, speciation. In this work, we reveal the evolutionary genomic landscape in Rodentia, the most diverse and speciose mammalian order, by whole-genome comparisons of six rodent species and six representative outgroup mammalian species. The reconstruction of the evolutionary breakpoint regions across rodent phylogeny shows an increased rate of genome reshuffling that is approximately two orders of magnitude greater than in other mammalian species here considered. We identified novel lineage and clade-specific breakpoint regions within Rodentia and analyzed their gene content, recombination rates and their relationship with constitutive lamina genomic associated domains, DNase I hypersensitivity sites and chromatin modifications. We detected an accumulation of protein-coding genes in evolutionary breakpoint regions, especially genes implicated in reproduction and pheromone detection and mating. Moreover, we found an association of the evolutionary breakpoint regions with active chromatin state landscapes, most probably related to gene enrichment. Our results have two important implications for understanding the mechanisms that govern and constrain mammalian genome evolution. The first is that the presence of genes related to species-specific phenotypes in evolutionary breakpoint regions reinforces the adaptive value of genome reshuffling. Second, that chromatin conformation, an aspect that has been often overlooked in comparative genomic studies, might play a role in modeling the genomic distribution of evolutionary breakpoints. PMID:28175287

  6. Mammalian Comparative Genomics Reveals Genetic and Epigenetic Features Associated with Genome Reshuffling in Rodentia.

    PubMed

    Capilla, Laia; Sánchez-Guillén, Rosa Ana; Farré, Marta; Paytuví-Gallart, Andreu; Malinverni, Roberto; Ventura, Jacint; Larkin, Denis M; Ruiz-Herrera, Aurora

    2016-12-01

    Understanding how mammalian genomes have been reshuffled through structural changes is fundamental to the dynamics of its composition, evolutionary relationships between species and, in the long run, speciation. In this work, we reveal the evolutionary genomic landscape in Rodentia, the most diverse and speciose mammalian order, by whole-genome comparisons of six rodent species and six representative outgroup mammalian species. The reconstruction of the evolutionary breakpoint regions across rodent phylogeny shows an increased rate of genome reshuffling that is approximately two orders of magnitude greater than in other mammalian species here considered. We identified novel lineage and clade-specific breakpoint regions within Rodentia and analyzed their gene content, recombination rates and their relationship with constitutive lamina genomic associated domains, DNase I hypersensitivity sites and chromatin modifications. We detected an accumulation of protein-coding genes in evolutionary breakpoint regions, especially genes implicated in reproduction and pheromone detection and mating. Moreover, we found an association of the evolutionary breakpoint regions with active chromatin state landscapes, most probably related to gene enrichment. Our results have two important implications for understanding the mechanisms that govern and constrain mammalian genome evolution. The first is that the presence of genes related to species-specific phenotypes in evolutionary breakpoint regions reinforces the adaptive value of genome reshuffling. Second, that chromatin conformation, an aspect that has been often overlooked in comparative genomic studies, might play a role in modeling the genomic distribution of evolutionary breakpoints.

  7. PSAT: A web tool to compare genomic neighborhoods of multiple prokaryotic genomes

    PubMed Central

    Fong, Christine; Rohmer, Laurence; Radey, Matthew; Wasnick, Michael; Brittnacher, Mitchell J

    2008-01-01

    Background The conservation of gene order among prokaryotic genomes can provide valuable insight into gene function, protein interactions, or events by which genomes have evolved. Although some tools are available for visualizing and comparing the order of genes between genomes of study, few support an efficient and organized analysis between large numbers of genomes. The Prokaryotic Sequence homology Analysis Tool (PSAT) is a web tool for comparing gene neighborhoods among multiple prokaryotic genomes. Results PSAT utilizes a database that is preloaded with gene annotation, BLAST hit results, and gene-clustering scores designed to help identify regions of conserved gene order. Researchers use the PSAT web interface to find a gene of interest in a reference genome and efficiently retrieve the sequence homologs found in other bacterial genomes. The tool generates a graphic of the genomic neighborhood surrounding the selected gene and the corresponding regions for its homologs in each comparison genome. Homologs in each region are color coded to assist users with analyzing gene order among various genomes. In contrast to common comparative analysis methods that filter sequence homolog data based on alignment score cutoffs, PSAT leverages gene context information for homologs, including those with weak alignment scores, enabling a more sensitive analysis. Features for constraining or ordering results are designed to help researchers browse results from large numbers of comparison genomes in an organized manner. PSAT has been demonstrated to be useful for helping to identify gene orthologs and potential functional gene clusters, and detecting genome modifications that may result in loss of function. Conclusion PSAT allows researchers to investigate the order of genes within local genomic neighborhoods of multiple genomes. A PSAT web server for public use is available for performing analyses on a growing set of reference genomes through any web browser with no client

  8. Genome-wide association study identifies three novel loci for type 2 diabetes.

    PubMed

    Hara, Kazuo; Fujita, Hayato; Johnson, Todd A; Yamauchi, Toshimasa; Yasuda, Kazuki; Horikoshi, Momoko; Peng, Chen; Hu, Cheng; Ma, Ronald C W; Imamura, Minako; Iwata, Minoru; Tsunoda, Tatsuhiko; Morizono, Takashi; Shojima, Nobuhiro; So, Wing Yee; Leung, Ting Fan; Kwan, Patrick; Zhang, Rong; Wang, Jie; Yu, Weihui; Maegawa, Hiroshi; Hirose, Hiroshi; Kaku, Kohei; Ito, Chikako; Watada, Hirotaka; Tanaka, Yasushi; Tobe, Kazuyuki; Kashiwagi, Atsunori; Kawamori, Ryuzo; Jia, Weiping; Chan, Juliana C N; Teo, Yik Ying; Shyong, Tai E; Kamatani, Naoyuki; Kubo, Michiaki; Maeda, Shiro; Kadowaki, Takashi

    2014-01-01

    Although over 60 loci for type 2 diabetes (T2D) have been identified, there still remains a large genetic component to be clarified. To explore unidentified loci for T2D, we performed a genome-wide association study (GWAS) of 6 209 637 single-nucleotide polymorphisms (SNPs), which were directly genotyped or imputed using East Asian references from the 1000 Genomes Project (June 2011 release) in 5976 Japanese patients with T2D and 20 829 nondiabetic individuals. Nineteen unreported loci were selected and taken forward to follow-up analyses. Combined discovery and follow-up analyses (30 392 cases and 34 814 controls) identified three new loci with genome-wide significance, which were MIR129-LEP [rs791595; risk allele = A; risk allele frequency (RAF) = 0.080; P = 2.55 × 10(-13); odds ratio (OR) = 1.17], GPSM1 [rs11787792; risk allele = A; RAF = 0.874; P = 1.74 × 10(-10); OR = 1.15] and SLC16A13 (rs312457; risk allele = G; RAF = 0.078; P = 7.69 × 10(-13); OR = 1.20). This study demonstrates that GWASs based on the imputation of genotypes using modern reference haplotypes such as that from the 1000 Genomes Project data can assist in identification of new loci for common diseases.

  9. Variability among the Most Rapidly Evolving Plastid Genomic Regions is Lineage-Specific: Implications of Pairwise Genome Comparisons in Pyrus (Rosaceae) and Other Angiosperms for Marker Choice

    PubMed Central

    Ter-Voskanyan, Hasmik; Allgaier, Martin; Borsch, Thomas

    2014-01-01

    Plastid genomes exhibit different levels of variability in their sequences, depending on the respective kinds of genomic regions. Genes are usually more conserved while noncoding introns and spacers evolve at a faster pace. While a set of about thirty maximum variable noncoding genomic regions has been suggested to provide universally promising phylogenetic markers throughout angiosperms, applications often require several regions to be sequenced for many individuals. Our project aims to illuminate evolutionary relationships and species-limits in the genus Pyrus (Rosaceae)—a typical case with very low genetic distances between taxa. In this study, we have sequenced the plastid genome of Pyrus spinosa and aligned it to the already available P. pyrifolia sequence. The overall p-distance of the two Pyrus genomes was 0.00145. The intergenic spacers between ndhC–trnV, trnR–atpA, ndhF–rpl32, psbM–trnD, and trnQ–rps16 were the most variable regions, also comprising the highest total numbers of substitutions, indels and inversions (potentially informative characters). Our comparative analysis of further plastid genome pairs with similar low p-distances from Oenothera (representing another rosid), Olea (asterids) and Cymbidium (monocots) showed in each case a different ranking of genomic regions in terms of variability and potentially informative characters. Only two intergenic spacers (ndhF–rpl32 and trnK–rps16) were consistently found among the 30 top-ranked regions. We have mapped the occurrence of substitutions and microstructural mutations in the four genome pairs. High AT content in specific sequence elements seems to foster frequent mutations. We conclude that the variability among the fastest evolving plastid genomic regions is lineage-specific and thus cannot be precisely predicted across angiosperms. The often lineage-specific occurrence of stem-loop elements in the sequences of introns and spacers also governs lineage-specific mutations

  10. ESPERR: learning strong and weak signals in genomic sequence alignments to identify functional elements.

    PubMed

    Taylor, James; Tyekucheva, Svitlana; King, David C; Hardison, Ross C; Miller, Webb; Chiaromonte, Francesca

    2006-12-01

    Genomic sequence signals - such as base composition, presence of particular motifs, or evolutionary constraint - have been used effectively to identify functional elements. However, approaches based only on specific signals known to correlate with function can be quite limiting. When training data are available, application of computational learning algorithms to multispecies alignments has the potential to capture broader and more informative sequence and evolutionary patterns that better characterize a class of elements. However, effective exploitation of patterns in multispecies alignments is impeded by the vast number of possible alignment columns and by a limited understanding of which particular strings of columns may characterize a given class. We have developed a computational method, called ESPERR (evolutionary and sequence pattern extraction through reduced representations), which uses training examples to learn encodings of multispecies alignments into reduced forms tailored for the prediction of chosen classes of functional elements. ESPERR produces a greatly improved Regulatory Potential score, which can discriminate regulatory regions from neutral sites with excellent accuracy ( approximately 94%). This score captures strong signals (GC content and conservation), as well as subtler signals (with small contributions from many different alignment patterns) that characterize the regulatory elements in our training set. ESPERR is also effective for predicting other classes of functional elements, as we show for DNaseI hypersensitive sites and highly conserved regions with developmental enhancer activity. Our software, training data, and genome-wide predictions are available from our Web site (http://www.bx.psu.edu/projects/esperr).

  11. Genomic positional conservation identifies topological anchor point RNAs linked to developmental loci.

    PubMed

    Amaral, Paulo P; Leonardi, Tommaso; Han, Namshik; Viré, Emmanuelle; Gascoigne, Dennis K; Arias-Carrasco, Raúl; Büscher, Magdalena; Pandolfini, Luca; Zhang, Anda; Pluchino, Stefano; Maracaja-Coutinho, Vinicius; Nakaya, Helder I; Hemberg, Martin; Shiekhattar, Ramin; Enright, Anton J; Kouzarides, Tony

    2018-03-15

    The mammalian genome is transcribed into large numbers of long noncoding RNAs (lncRNAs), but the definition of functional lncRNA groups has proven difficult, partly due to their low sequence conservation and lack of identified shared properties. Here we consider promoter conservation and positional conservation as indicators of functional commonality. We identify 665 conserved lncRNA promoters in mouse and human that are preserved in genomic position relative to orthologous coding genes. These positionally conserved lncRNA genes are primarily associated with developmental transcription factor loci with which they are coexpressed in a tissue-specific manner. Over half of positionally conserved RNAs in this set are linked to chromatin organization structures, overlapping binding sites for the CTCF chromatin organiser and located at chromatin loop anchor points and borders of topologically associating domains (TADs). We define these RNAs as topological anchor point RNAs (tapRNAs). Characterization of these noncoding RNAs and their associated coding genes shows that they are functionally connected: they regulate each other's expression and influence the metastatic phenotype of cancer cells in vitro in a similar fashion. Furthermore, we find that tapRNAs contain conserved sequence domains that are enriched in motifs for zinc finger domain-containing RNA-binding proteins and transcription factors, whose binding sites are found mutated in cancers. This work leverages positional conservation to identify lncRNAs with potential importance in genome organization, development and disease. The evidence that many developmental transcription factors are physically and functionally connected to lncRNAs represents an exciting stepping-stone to further our understanding of genome regulation.

  12. Genome-wide association study with 1000 genomes imputation identifies signals for nine sex hormone-related phenotypes.

    PubMed

    Ruth, Katherine S; Campbell, Purdey J; Chew, Shelby; Lim, Ee Mun; Hadlow, Narelle; Stuckey, Bronwyn G A; Brown, Suzanne J; Feenstra, Bjarke; Joseph, John; Surdulescu, Gabriela L; Zheng, Hou Feng; Richards, J Brent; Murray, Anna; Spector, Tim D; Wilson, Scott G; Perry, John R B

    2016-02-01

    Genetic factors contribute strongly to sex hormone levels, yet knowledge of the regulatory mechanisms remains incomplete. Genome-wide association studies (GWAS) have identified only a small number of loci associated with sex hormone levels, with several reproductive hormones yet to be assessed. The aim of the study was to identify novel genetic variants contributing to the regulation of sex hormones. We performed GWAS using genotypes imputed from the 1000 Genomes reference panel. The study used genotype and phenotype data from a UK twin register. We included 2913 individuals (up to 294 males) from the Twins UK study, excluding individuals receiving hormone treatment. Phenotypes were standardised for age, sex, BMI, stage of menstrual cycle and menopausal status. We tested 7,879,351 autosomal SNPs for association with levels of dehydroepiandrosterone sulphate (DHEAS), oestradiol, free androgen index (FAI), follicle-stimulating hormone (FSH), luteinizing hormone (LH), prolactin, progesterone, sex hormone-binding globulin and testosterone. Eight independent genetic variants reached genome-wide significance (P<5 × 10(-8)), with minor allele frequencies of 1.3-23.9%. Novel signals included variants for progesterone (P=7.68 × 10(-12)), oestradiol (P=1.63 × 10(-8)) and FAI (P=1.50 × 10(-8)). A genetic variant near the FSHB gene was identified which influenced both FSH (P=1.74 × 10(-8)) and LH (P=3.94 × 10(-9)) levels. A separate locus on chromosome 7 was associated with both DHEAS (P=1.82 × 10(-14)) and progesterone (P=6.09 × 10(-14)). This study highlights loci that are relevant to reproductive function and suggests overlap in the genetic basis of hormone regulation.

  13. A genome-wide association study for body weight in Japanese Thoroughbred racehorses clarifies candidate regions on chromosomes 3, 9, 15, and 18

    PubMed Central

    TOZAKI, Teruaki; KIKUCHI, Mio; KAKOI, Hironaga; HIROTA, Kei-ichi; NAGATA, Shun-ichi

    2017-01-01

    ABSTRACT Body weight is an important trait to confirm growth and development in humans and animals. In Thoroughbred racehorses, it is measured in the postnatal, training, and racing periods to evaluate growth and training degrees. The body weight of mature Thoroughbred racehorses generally ranges from 400 to 600 kg, and this broad range is likely influenced by environmental and genetic factors. Therefore, a genome-wide association study (GWAS) using the Equine SNP70 BeadChip was performed to identify the genomic regions associated with body weight in Japanese Thoroughbred racehorses using 851 individuals. The average body weight of these horses was 473.9 kg (standard deviation: 28.0) at the age of 3, and GWAS identified statistically significant SNPs on chromosomes 3 (BIEC2_808466, P=2.32E-14), 9 (BIEC2_1105503, P=1.03E-7), 15 (BIEC2_322669, P=9.50E-6), and 18 (BIEC2_417274, P=1.44E-14), which were associated with body weight as a quantitative trait. The genomic regions on chromosomes 3, 9, 15, and 18 included ligand-dependent nuclear receptor compressor-like protein (LCORL), zinc finger and AT hook domain containing (ZFAT), tribbles pseudokinase 2 (TRIB2), and myostatin (MSTN), respectively, as candidate genes. LCORL and ZFAT are associated with withers height in horses, whereas MSTN affects muscle mass. Thus, the genomic regions identified in this study seem to affect the body weight of Thoroughbred racehorses. Although this information is useful for breeding and growth management of the horses, the production of genetically modified animals and gene doping (abuse/misuse of gene therapy) should be prohibited to maintain horse racing integrity. PMID:29270069

  14. Genomic comparison of multi-drug resistant invasive and colonizing Acinetobacter baumannii isolated from diverse human body sites reveals genomic plasticity.

    PubMed

    Sahl, Jason W; Johnson, J Kristie; Harris, Anthony D; Phillippy, Adam M; Hsiao, William W; Thom, Kerri A; Rasko, David A

    2011-06-04

    Acinetobacter baumannii has recently emerged as a significant global pathogen, with a surprisingly rapid acquisition of antibiotic resistance and spread within hospitals and health care institutions. This study examines the genomic content of three A. baumannii strains isolated from distinct body sites. Isolates from blood, peri-anal, and wound sources were examined in an attempt to identify genetic features that could be correlated to each isolation source. Pulsed-field gel electrophoresis, multi-locus sequence typing and antibiotic resistance profiles demonstrated genotypic and phenotypic variation. Each isolate was sequenced to high-quality draft status, which allowed for comparative genomic analyses with existing A. baumannii genomes. A high resolution, whole genome alignment method detailed the phylogenetic relationships of sequenced A. baumannii and found no correlation between phylogeny and body site of isolation. This method identified genomic regions unique to both those isolates found on the surface of the skin or in wounds, termed colonization isolates, and those identified from body fluids, termed invasive isolates; these regions may play a role in the pathogenesis and spread of this important pathogen. A PCR-based screen of 74 A. baumanii isolates demonstrated that these unique genes are not exclusive to either phenotype or isolation source; however, a conserved genomic region exclusive to all sequenced A. baumannii was identified and verified. The results of the comparative genome analysis and PCR assay show that A. baumannii is a diverse and genomically variable pathogen that appears to have the potential to cause a range of human disease regardless of the isolation source.

  15. A fungal avirulence factor encoded in a highly plastic genomic region triggers partial resistance to septoria tritici blotch.

    PubMed

    Meile, Lukas; Croll, Daniel; Brunner, Patrick C; Plissonneau, Clémence; Hartmann, Fanny E; McDonald, Bruce A; Sánchez-Vallet, Andrea

    2018-04-25

    Cultivar-strain specificity in the wheat-Zymoseptoria tritici pathosystem determines the infection outcome and is controlled by resistance genes on the host side, many of which have been identified. On the pathogen side, however, the molecular determinants of specificity remain largely unknown. We used genetic mapping, targeted gene disruption and allele swapping to characterise the recognition of the new avirulence factor Avr3D1. We then combined population genetic and comparative genomic analyses to characterise the evolutionary trajectory of Avr3D1. Avr3D1 is specifically recognised by wheat cultivars harbouring the Stb7 resistance gene, triggering a strong defence response without preventing pathogen infection and reproduction. Avr3D1 resides in a cluster of putative effector genes located in a genome region populated by independent transposable element insertions. The gene was present in all 132 investigated strains and is highly polymorphic, with 30 different protein variants identified. We demonstrated that specific amino acid substitutions in Avr3D1 led to evasion of recognition. These results demonstrate that quantitative resistance and gene-for-gene interactions are not mutually exclusive. Localising avirulence genes in highly plastic genomic regions probably facilitates accelerated evolution that enables escape from recognition by resistance proteins. © 2018 The Authors. New Phytologist © 2018 New Phytologist Trust.

  16. Genome-wide association study identifies multiple loci associated with bladder cancer risk

    PubMed Central

    Figueroa, Jonine D.; Ye, Yuanqing; Siddiq, Afshan; Garcia-Closas, Montserrat; Chatterjee, Nilanjan; Prokunina-Olsson, Ludmila; Cortessis, Victoria K.; Kooperberg, Charles; Cussenot, Olivier; Benhamou, Simone; Prescott, Jennifer; Porru, Stefano; Dinney, Colin P.; Malats, Núria; Baris, Dalsu; Purdue, Mark; Jacobs, Eric J.; Albanes, Demetrius; Wang, Zhaoming; Deng, Xiang; Chung, Charles C.; Tang, Wei; Bas Bueno-de-Mesquita, H.; Trichopoulos, Dimitrios; Ljungberg, Börje; Clavel-Chapelon, Françoise; Weiderpass, Elisabete; Krogh, Vittorio; Dorronsoro, Miren; Travis, Ruth; Tjønneland, Anne; Brenan, Paul; Chang-Claude, Jenny; Riboli, Elio; Conti, David; Gago-Dominguez, Manuela; Stern, Mariana C.; Pike, Malcolm C.; Van Den Berg, David; Yuan, Jian-Min; Hohensee, Chancellor; Rodabough, Rebecca; Cancel-Tassin, Geraldine; Roupret, Morgan; Comperat, Eva; Chen, Constance; De Vivo, Immaculata; Giovannucci, Edward; Hunter, David J.; Kraft, Peter; Lindstrom, Sara; Carta, Angela; Pavanello, Sofia; Arici, Cecilia; Mastrangelo, Giuseppe; Kamat, Ashish M.; Lerner, Seth P.; Barton Grossman, H.; Lin, Jie; Gu, Jian; Pu, Xia; Hutchinson, Amy; Burdette, Laurie; Wheeler, William; Kogevinas, Manolis; Tardón, Adonina; Serra, Consol; Carrato, Alfredo; García-Closas, Reina; Lloreta, Josep; Schwenn, Molly; Karagas, Margaret R.; Johnson, Alison; Schned, Alan; Armenti, Karla R.; Hosain, G.M.; Andriole, Gerald; Grubb, Robert; Black, Amanda; Ryan Diver, W.; Gapstur, Susan M.; Weinstein, Stephanie J.; Virtamo, Jarmo; Haiman, Chris A.; Landi, Maria T.; Caporaso, Neil; Fraumeni, Joseph F.; Vineis, Paolo; Wu, Xifeng; Silverman, Debra T.; Chanock, Stephen; Rothman, Nathaniel

    2014-01-01

    Candidate gene and genome-wide association studies (GWAS) have identified 11 independent susceptibility loci associated with bladder cancer risk. To discover additional risk variants, we conducted a new GWAS of 2422 bladder cancer cases and 5751 controls, followed by a meta-analysis with two independently published bladder cancer GWAS, resulting in a combined analysis of 6911 cases and 11 814 controls of European descent. TaqMan genotyping of 13 promising single nucleotide polymorphisms with P < 1 × 10−5 was pursued in a follow-up set of 801 cases and 1307 controls. Two new loci achieved genome-wide statistical significance: rs10936599 on 3q26.2 (P = 4.53 × 10−9) and rs907611 on 11p15.5 (P = 4.11 × 10−8). Two notable loci were also identified that approached genome-wide statistical significance: rs6104690 on 20p12.2 (P = 7.13 × 10−7) and rs4510656 on 6p22.3 (P = 6.98 × 10−7); these require further studies for confirmation. In conclusion, our study has identified new susceptibility alleles for bladder cancer risk that require fine-mapping and laboratory investigation, which could further understanding into the biological underpinnings of bladder carcinogenesis. PMID:24163127

  17. A new method for detecting signal regions in ordered sequences of real numbers, and application to viral genomic data.

    PubMed

    Gog, Julia R; Lever, Andrew M L; Skittrall, Jordan P

    2018-01-01

    We present a fast, robust and parsimonious approach to detecting signals in an ordered sequence of numbers. Our motivation is in seeking a suitable method to take a sequence of scores corresponding to properties of positions in virus genomes, and find outlying regions of low scores. Suitable statistical methods without using complex models or making many assumptions are surprisingly lacking. We resolve this by developing a method that detects regions of low score within sequences of real numbers. The method makes no assumptions a priori about the length of such a region; it gives the explicit location of the region and scores it statistically. It does not use detailed mechanistic models so the method is fast and will be useful in a wide range of applications. We present our approach in detail, and test it on simulated sequences. We show that it is robust to a wide range of signal morphologies, and that it is able to capture multiple signals in the same sequence. Finally we apply it to viral genomic data to identify regions of evolutionary conservation within influenza and rotavirus.

  18. MinGenome: An In Silico Top-Down Approach for the Synthesis of Minimized Genomes.

    PubMed

    Wang, Lin; Maranas, Costas D

    2018-02-16

    Genome minimized strains offer advantages as production chassis by reducing transcriptional cost, eliminating competing functions and limiting unwanted regulatory interactions. Existing approaches for identifying stretches of DNA to remove are largely ad hoc based on information on presumably dispensable regions through experimentally determined nonessential genes and comparative genomics. Here we introduce a versatile genome reduction algorithm MinGenome that implements a mixed-integer linear programming (MILP) algorithm to identify in size descending order all dispensable contiguous sequences without affecting the organism's growth or other desirable traits. Known essential genes or genes that cause significant fitness or performance loss can be flagged and their deletion can be prohibited. MinGenome also preserves needed transcription factors and promoter regions ensuring that retained genes will be properly transcribed while also avoiding the simultaneous deletion of synthetic lethal pairs. The potential benefit of removing even larger contiguous stretches of DNA if only one or two essential genes (to be reinserted elsewhere) are within the deleted sequence is explored. We applied the algorithm to design a minimized E. coli strain and found that we were able to recapitulate the long deletions identified in previous experimental studies and discover alternative combinations of deletions that have not yet been explored in vivo.

  19. Transposon Mutagenesis of the Zika Virus Genome Highlights Regions Essential for RNA Replication and Restricted for Immune Evasion.

    PubMed

    Fulton, Benjamin O; Sachs, David; Schwarz, Megan C; Palese, Peter; Evans, Matthew J

    2017-08-01

    The molecular constraints affecting Zika virus (ZIKV) evolution are not well understood. To investigate ZIKV genetic flexibility, we used transposon mutagenesis to add 15-nucleotide insertions throughout the ZIKV MR766 genome and subsequently deep sequenced the viable mutants. Few ZIKV insertion mutants replicated, which likely reflects a high degree of functional constraints on the genome. The NS1 gene exhibited distinct mutational tolerances at different stages of the screen. This result may define regions of the NS1 protein that are required for the different stages of the viral life cycle. The ZIKV structural genes showed the highest degree of insertional tolerance. Although the envelope (E) protein exhibited particular flexibility, the highly conserved envelope domain II (EDII) fusion loop of the E protein was intolerant of transposon insertions. The fusion loop is also a target of pan-flavivirus antibodies that are generated against other flaviviruses and neutralize a broad range of dengue virus and ZIKV isolates. The genetic restrictions identified within the epitopes in the EDII fusion loop likely explain the sequence and antigenic conservation of these regions in ZIKV and among multiple flaviviruses. Thus, our results provide insights into the genetic restrictions on ZIKV that may affect the evolution of this virus. IMPORTANCE Zika virus recently emerged as a significant human pathogen. Determining the genetic constraints on Zika virus is important for understanding the factors affecting viral evolution. We used a genome-wide transposon mutagenesis screen to identify where mutations were tolerated in replicating viruses. We found that the genetic regions involved in RNA replication were mostly intolerant of mutations. The genes coding for structural proteins were more permissive to mutations. Despite the flexibility observed in these regions, we found that epitopes bound by broadly reactive antibodies were genetically constrained. This finding may explain

  20. The FUN of identifying gene function in bacterial pathogens; insights from Salmonella functional genomics.

    PubMed

    Hammarlöf, Disa L; Canals, Rocío; Hinton, Jay C D

    2013-10-01

    The availability of thousands of genome sequences of bacterial pathogens poses a particular challenge because each genome contains hundreds of genes of unknown function (FUN). How can we easily discover which FUN genes encode important virulence factors? One solution is to combine two different functional genomic approaches. First, transcriptomics identifies bacterial FUN genes that show differential expression during the process of mammalian infection. Second, global mutagenesis identifies individual FUN genes that the pathogen requires to cause disease. The intersection of these datasets can reveal a small set of candidate genes most likely to encode novel virulence attributes. We demonstrate this approach with the Salmonella infection model, and propose that a similar strategy could be used for other bacterial pathogens. Copyright © 2013 Elsevier Ltd. All rights reserved.

  1. Five Complete Chloroplast Genome Sequences from Diospyros: Genome Organization and Comparative Analysis.

    PubMed

    Fu, Jianmin; Liu, Huimin; Hu, Jingjing; Liang, Yuqin; Liang, Jinjun; Wuyun, Tana; Tan, Xiaofeng

    2016-01-01

    Diospyros is the largest genus in Ebenaceae, comprising more than 500 species with remarkable economic value, especially Diospyros kaki Thunb., which has traditionally been an important food resource in China, Korea, and Japan. Complete chloroplast (cp) genomes from D. kaki, D. lotus L., D. oleifera Cheng., D. glaucifolia Metc., and Diospyros 'Jinzaoshi' were sequenced using Illumina sequencing technology. This is the first cp genome reported in Ebenaceae. The cp genome sequences of Diospyros ranged from 157,300 to 157,784 bp in length, presenting a typical quadripartite structure with two inverted repeats each separated by one large and one small single-copy region. For each cp genome, 134 genes were annotated, including 80 protein-coding, 31 tRNA, and 4 rRNA unique genes. In all, 179 repeats and 283 single sequence repeats were identified. Four hypervariable regions, namely, intergenic region of trnQ_rps16, trnV_ndhC, and psbD_trnT, and intron of ndhA, were identified in the Diospyros genomes. Phylogenetic analyses based on the whole cp genome, protein-coding, and intergenic and intron sequences indicated that D. oleifera is closely related to D. kaki and could be used as a model plant for future research on D. kaki; to our knowledge, this is proposed for the first time. Further, these analyses together with two large deletions (301 and 140 bp) in the cp genome of D. 'Jinzaoshi', support its placement as a new species in Diospyros. Both maximum parsimony and likelihood analyses for 19 taxa indicated the basal position of Ericales in asterids and suggested that Ebenaceae is monophyletic in Ericales.

  2. Five Complete Chloroplast Genome Sequences from Diospyros: Genome Organization and Comparative Analysis

    PubMed Central

    Hu, Jingjing; Liang, Yuqin; Liang, Jinjun; Wuyun, Tana; Tan, Xiaofeng

    2016-01-01

    Diospyros is the largest genus in Ebenaceae, comprising more than 500 species with remarkable economic value, especially Diospyros kaki Thunb., which has traditionally been an important food resource in China, Korea, and Japan. Complete chloroplast (cp) genomes from D. kaki, D. lotus L., D. oleifera Cheng., D. glaucifolia Metc., and Diospyros ‘Jinzaoshi’ were sequenced using Illumina sequencing technology. This is the first cp genome reported in Ebenaceae. The cp genome sequences of Diospyros ranged from 157,300 to 157,784 bp in length, presenting a typical quadripartite structure with two inverted repeats each separated by one large and one small single-copy region. For each cp genome, 134 genes were annotated, including 80 protein-coding, 31 tRNA, and 4 rRNA unique genes. In all, 179 repeats and 283 single sequence repeats were identified. Four hypervariable regions, namely, intergenic region of trnQ_rps16, trnV_ndhC, and psbD_trnT, and intron of ndhA, were identified in the Diospyros genomes. Phylogenetic analyses based on the whole cp genome, protein-coding, and intergenic and intron sequences indicated that D. oleifera is closely related to D. kaki and could be used as a model plant for future research on D. kaki; to our knowledge, this is proposed for the first time. Further, these analyses together with two large deletions (301 and 140 bp) in the cp genome of D. ‘Jinzaoshi’, support its placement as a new species in Diospyros. Both maximum parsimony and likelihood analyses for 19 taxa indicated the basal position of Ericales in asterids and suggested that Ebenaceae is monophyletic in Ericales. PMID:27442423

  3. Genome-Wide Linkage and Association Analysis Identifies Major Gene Loci for Guttural Pouch Tympany in Arabian and German Warmblood Horses

    PubMed Central

    Metzger, Julia; Ohnesorge, Bernhard; Distl, Ottmar

    2012-01-01

    Equine guttural pouch tympany (GPT) is a hereditary condition affecting foals in their first months of life. Complex segregation analyses in Arabian and German warmblood horses showed the involvement of a major gene as very likely. Genome-wide linkage and association analyses including a high density marker set of single nucleotide polymorphisms (SNPs) were performed to map the genomic region harbouring the potential major gene for GPT. A total of 85 Arabian and 373 German warmblood horses were genotyped on the Illumina equine SNP50 beadchip. Non-parametric multipoint linkage analyses showed genome-wide significance on horse chromosomes (ECA) 3 for German warmblood at 16–26 Mb and 34–55 Mb and for Arabian on ECA15 at 64–65 Mb. Genome-wide association analyses confirmed the linked regions for both breeds. In Arabian, genome-wide association was detected at 64 Mb within the region with the highest linkage peak on ECA15. For German warmblood, signals for genome-wide association were close to the peak region of linkage at 52 Mb on ECA3. The odds ratio for the SNP with the highest genome-wide association was 0.12 for the Arabian. In conclusion, the refinement of the regions with the Illumina equine SNP50 beadchip is an important step to unravel the responsible mutations for GPT. PMID:22848553

  4. Identification of genomic region controlling resistance to aflatoxin contamination in a peanut recombinant inbred line population (Tifrunner x GT-C20)

    USDA-ARS?s Scientific Manuscript database

    Aflatoxin contamination of peanut is a significant threat to global food safety. In this study we performed quantitative trait loci (QTL) analysis to identify peanut genomic regions contributing to aflatoxin contamination resistance in a recombinant inbred line (RIL) population derived from the Tifr...

  5. Piggy: a rapid, large-scale pan-genome analysis tool for intergenic regions in bacteria.

    PubMed

    Thorpe, Harry A; Bayliss, Sion C; Sheppard, Samuel K; Feil, Edward J

    2018-04-01

    The concept of the "pan-genome," which refers to the total complement of genes within a given sample or species, is well established in bacterial genomics. Rapid and scalable pipelines are available for managing and interpreting pan-genomes from large batches of annotated assemblies. However, despite overwhelming evidence that variation in intergenic regions in bacteria can directly influence phenotypes, most current approaches for analyzing pan-genomes focus exclusively on protein-coding sequences. To address this we present Piggy, a novel pipeline that emulates Roary except that it is based only on intergenic regions. A key utility provided by Piggy is the detection of highly divergent ("switched") intergenic regions (IGRs) upstream of genes. We demonstrate the use of Piggy on large datasets of clinically important lineages of Staphylococcus aureus and Escherichia coli. For S. aureus, we show that highly divergent (switched) IGRs are associated with differences in gene expression and we establish a multilocus reference database of IGR alleles (igMLST; implemented in BIGSdb).

  6. Genomic region operation kit for flexible processing of deep sequencing data.

    PubMed

    Ovaska, Kristian; Lyly, Lauri; Sahu, Biswajyoti; Jänne, Olli A; Hautaniemi, Sampsa

    2013-01-01

    Computational analysis of data produced in deep sequencing (DS) experiments is challenging due to large data volumes and requirements for flexible analysis approaches. Here, we present a mathematical formalism based on set algebra for frequently performed operations in DS data analysis to facilitate translation of biomedical research questions to language amenable for computational analysis. With the help of this formalism, we implemented the Genomic Region Operation Kit (GROK), which supports various DS-related operations such as preprocessing, filtering, file conversion, and sample comparison. GROK provides high-level interfaces for R, Python, Lua, and command line, as well as an extension C++ API. It supports major genomic file formats and allows storing custom genomic regions in efficient data structures such as red-black trees and SQL databases. To demonstrate the utility of GROK, we have characterized the roles of two major transcription factors (TFs) in prostate cancer using data from 10 DS experiments. GROK is freely available with a user guide from >http://csbi.ltdk.helsinki.fi/grok/.

  7. Genome-wide association analysis of milk yield traits in Nordic Red Cattle using imputed whole genome sequence variants.

    PubMed

    Iso-Touru, T; Sahana, G; Guldbrandtsen, B; Lund, M S; Vilkki, J

    2016-03-22

    The Nordic Red Cattle consisting of three different populations from Finland, Sweden and Denmark are under a joint breeding value estimation system. The long history of recording of production and health traits offers a great opportunity to study production traits and identify causal variants behind them. In this study, we used whole genome sequence level data from 4280 progeny tested Nordic Red Cattle bulls to scan the genome for loci affecting milk, fat and protein yields. Using a genome-wise significance threshold, regions on Bos taurus chromosomes 5, 14, 23, 25 and 26 were associated with fat yield. Regions on chromosomes 5, 14, 16, 19, 20 and 25 were associated with milk yield and chromosomes 5, 14 and 25 had regions associated with protein yield. Significantly associated variations were found in 227 genes for fat yield, 72 genes for milk yield and 30 genes for protein yield. Ingenuity Pathway Analysis was used to identify networks connecting these genes displaying significant hits. When compared to previously mapped genomic regions associated with fertility, significantly associated variations were found in 5 genes common for fat yield and fertility, thus linking these two traits via biological networks. This is the first time when whole genome sequence data is utilized to study genomic regions affecting milk production in the Nordic Red Cattle population. Sequence level data offers the possibility to study quantitative traits in detail but still cannot unambiguously reveal which of the associated variations is causative. Linkage disequilibrium creates difficulties to pinpoint the causative genes and variations. One solution to overcome these difficulties is the identification of the functional gene networks and pathways to reveal important interacting genes as candidates for the observed effects. This information on target genomic regions may be exploited to improve genomic prediction.

  8. A Genome-Wide Association Study to Identify Genomic Modulators of Rate Control Therapy in Patients with Atrial Fibrillation

    PubMed Central

    Kolek, Matthew J.; Edwards, Todd L.; Muhammad, Raafia; Balouch, Adnan; Shoemaker, M. Benjamin; Blair, Marcia A.; Kor, Kaylen C.; Takahashi, Atsushi; Kubo, Michiaki; Roden, Dan M.; Tanaka, Toshihiro; Darbar, Dawood

    2014-01-01

    For many patients with atrial fibrillation (AF), ventricular rate control with atrioventricular (AV) nodal blockers is considered first-line therapy, though response to treatment is highly variable. Using an extreme phenotype of failure of rate control necessitating AV nodal ablation and pacemaker implantation, we conducted a genome wide association study (GWAS) to identify genomic modulators of rate control therapy. Cases included 95 patients who failed rate control therapy. Controls (N=190) achieved adequate rate control therapy with ≤2 AV nodal blockers using a conventional clinical definition. Genotyping was performed on the Illumina 610-Quad platform, and results were imputed to the 1000 Genomes reference haplotypes. 554,041 single nucleotide polymorphisms (SNPs) met criteria for minor allele frequency (>0.01), call rate (>95%), and quality control, and 6,055,224 SNPs were available after imputation. No SNP reached the canonical threshold for significance for GWAS of P<5 × 10−8. Sixty-three SNPs with P<10−5 at 6 genomic loci were genotyped in a validation cohort of 130 cases and 157 controls. These included 6q24.3 (near SAMD5/SASH1, P=9.36 × 10−8), 4q12 (IGFBP7, P=1.75 × 10−7), 6q22.33 (C6orf174, P=4.86 × 10−7), 3p21.31 (CDCP1, P=1.18 × 10−6), 12p12.1 (SOX5, P=1.62 × 10−6), and 7p11 (LANCL2, P=6.51 × 10−6). However, none of these were significant in the replication cohort or in a meta-analysis of both cohorts. In conclusion, we identified several potentially important genomic modulators of rate control therapy in AF, particularly SOX5, which was previously associated with resting heart rate and PR interval. However these failed to reach genome-wide significance. PMID:25015694

  9. A genome-wide association study to identify genomic modulators of rate control therapy in patients with atrial fibrillation.

    PubMed

    Kolek, Matthew J; Edwards, Todd L; Muhammad, Raafia; Balouch, Adnan; Shoemaker, M Benjamin; Blair, Marcia A; Kor, Kaylen C; Takahashi, Atsushi; Kubo, Michiaki; Roden, Dan M; Tanaka, Toshihiro; Darbar, Dawood

    2014-08-15

    For many patients with atrial fibrillation, ventricular rate control with atrioventricular (AV) nodal blockers is considered first-line therapy, although response to treatment is highly variable. Using an extreme phenotype of failure of rate control necessitating AV nodal ablation and pacemaker implantation, we conducted a genome-wide association study (GWAS) to identify genomic modulators of rate control therapy. Cases included 95 patients who failed rate control therapy. Controls (n = 190) achieved adequate rate control therapy with ≤2 AV nodal blockers using a conventional clinical definition. Genotyping was performed on the Illumina 610-Quad platform, and results were imputed to the 1000 Genomes reference haplotypes. A total of 554,041 single-nucleotide polymorphisms (SNPs) met criteria for minor allele frequency (>0.01), call rate (>95%), and quality control, and 6,055,224 SNPs were available after imputation. No SNP reached the canonical threshold for significance for GWAS of p <5 × 10(-8). Sixty-three SNPs with p <10(-5) at 6 genomic loci were genotyped in a validation cohort of 130 cases and 157 controls. These included 6q24.3 (near SAMD5/SASH1, p = 9.36 × 10(-8)), 4q12 (IGFBP7, p = 1.75 × 10(-7)), 6q22.33 (C6orf174, p = 4.86 × 10(-7)), 3p21.31 (CDCP1, p = 1.18 × 10(-6)), 12p12.1 (SOX5, p = 1.62 × 10(-6)), and 7p11 (LANCL2, p = 6.51 × 10(-6)). However, none of these were significant in the replication cohort or in a meta-analysis of both cohorts. In conclusion, we identified several potentially important genomic modulators of rate control therapy in atrial fibrillation, particularly SOX5, which was previously associated with heart rate at rest and PR interval. However, these failed to reach genome-wide significance. Copyright © 2014 Elsevier Inc. All rights reserved.

  10. Characterization of the genomic organization of the region bordering the centromere of chromosome V of Podospora anserina by direct sequencing.

    PubMed

    Silar, Philippe; Barreau, Christian; Debuchy, Robert; Kicka, Sébastien; Turcq, Béatrice; Sainsard-Chanet, Annie; Sellem, Carole H; Billault, Alain; Cattolico, Laurence; Duprat, Simone; Weissenbach, Jean

    2003-08-01

    A Podospora anserina BAC library of 4800 clones has been constructed in the vector pBHYG allowing direct selection in fungi. Screening of the BAC collection for centromeric sequences of chromosome V allowed the recovery of clones localized on either sides of the centromere, but no BAC clone was found to contain the centromere. Seven BAC clones containing 322,195 and 156,244bp from either sides of the centromeric region were sequenced and annotated. One 5S rRNA gene, 5 tRNA genes, and 163 putative coding sequences (CDS) were identified. Among these, only six CDS seem specific to P. anserina. The gene density in the centromeric region is approximately one gene every 2.8kb. Extrapolation of this gene density to the whole genome of P. anserina suggests that the genome contains about 11,000 genes. Synteny analyses between P. anserina and Neurospora crassa show that co-linearity extends at the most to a few genes, suggesting rapid genome rearrangements between these two species.

  11. A Segment of the Apospory-Specific Genomic Region Is Highly Microsyntenic Not Only between the Apomicts Pennisetum squamulatum and Buffelgrass, But Also with a Rice Chromosome 11 Centromeric-Proximal Genomic Region1[W

    PubMed Central

    Gualtieri, Gustavo; Conner, Joann A.; Morishige, Daryl T.; Moore, L. David; Mullet, John E.; Ozias-Akins, Peggy

    2006-01-01

    Bacterial artificial chromosome (BAC) clones from apomicts Pennisetum squamulatum and buffelgrass (Cenchrus ciliaris), isolated with the apospory-specific genomic region (ASGR) marker ugt197, were assembled into contigs that were extended by chromosome walking. Gene-like sequences from contigs were identified by shotgun sequencing and BLAST searches, and used to isolate orthologous rice contigs. Additional gene-like sequences in the apomicts' contigs were identified by bioinformatics using fully sequenced BACs from orthologous rice contigs as templates, as well as by interspecies, whole-contig cross-hybridizations. Hierarchical contig orthology was rapidly assessed by constructing detailed long-range contig molecular maps showing the distribution of gene-like sequences and markers, and searching for microsyntenic patterns of sequence identity and spatial distribution within and across species contigs. We found microsynteny between P. squamulatum and buffelgrass contigs. Importantly, this approach also enabled us to isolate from within the rice (Oryza sativa) genome contig Rice A, which shows the highest microsynteny and is most orthologous to the ugt197-containing C1C buffelgrass contig. Contig Rice A belongs to the rice genome database contig 77 (according to the current September 12, 2003, rice fingerprint contig build) that maps proximal to the chromosome 11 centromere, a feature that interestingly correlates with the mapping of ASGR-linked BACs proximal to the centromere or centromere-like sequences. Thus, relatedness between these two orthologous contigs is supported both by their molecular microstructure and by their centromeric-proximal location. Our discoveries promote the use of a microsynteny-based positional-cloning approach using the rice genome as a template to aid in constructing the ASGR toward the isolation of genes underlying apospory. PMID:16415213

  12. Identification of accessory genome regions in poultry Clostridium perfringens isolates carrying the netB plasmid.

    PubMed

    Lepp, D; Gong, J; Songer, J G; Boerlin, P; Parreira, V R; Prescott, J F

    2013-03-01

    Necrotic enteritis (NE) is an economically important disease of poultry caused by certain Clostridium perfringens type A strains. NE pathogenesis involves the NetB toxin, which is encoded on a large conjugative plasmid within a 42-kb pathogenicity locus. Recent multilocus sequence type (MLST) studies have identified two predominant NE-associated clonal groups, suggesting that host genes are also involved in NE pathogenesis. We used microarray comparative genomic hybridization (CGH) to assess the gene content of 54 poultry isolates from birds that were healthy or that suffered from NE. A total of 400 genes were variably present among the poultry isolates and nine nonpoultry strains, many of which had putative functions related to nutrient uptake and metabolism and cell wall and capsule biosynthesis. The variable genes were organized into 142 genomic regions, 49 of which contained genes significantly associated with netB-positive isolates. These regions included three previously identified NE-associated loci as well as several apparent fitness-related loci, such as a carbohydrate ABC transporter, a ferric-iron siderophore uptake system, and an adhesion locus. Additional loci were related to plasmid maintenance. Cluster analysis of the CGH data grouped all of the netB-positive poultry isolates into two major groups, separated according to two prevalent clonal groups based on MLST analysis. This study identifies chromosomal loci associated with netB-positive poultry strains, suggesting that the chromosomal background can confer a selective advantage to NE-causing strains, possibly through mechanisms involving iron acquisition, carbohydrate metabolism, and plasmid maintenance.

  13. Demographically-Based Evaluation of Genomic Regions under Selection in Domestic Dogs

    PubMed Central

    Freedman, Adam H.; Schweizer, Rena M.; Ortega-Del Vecchyo, Diego; Han, Eunjung; Davis, Brian W.; Gronau, Ilan; Silva, Pedro M.; Galaverni, Marco; Fan, Zhenxin; Marx, Peter; Lorente-Galdos, Belen; Ramirez, Oscar; Hormozdiari, Farhad; Alkan, Can; Vilà, Carles; Squire, Kevin; Geffen, Eli; Kusak, Josip; Boyko, Adam R.; Parker, Heidi G.; Lee, Clarence; Tadigotla, Vasisht; Siepel, Adam; Bustamante, Carlos D.; Harkins, Timothy T.; Nelson, Stanley F.; Marques-Bonet, Tomas; Ostrander, Elaine A.; Wayne, Robert K.; Novembre, John

    2016-01-01

    Controlling for background demographic effects is important for accurately identifying loci that have recently undergone positive selection. To date, the effects of demography have not yet been explicitly considered when identifying loci under selection during dog domestication. To investigate positive selection on the dog lineage early in the domestication, we examined patterns of polymorphism in six canid genomes that were previously used to infer a demographic model of dog domestication. Using an inferred demographic model, we computed false discovery rates (FDR) and identified 349 outlier regions consistent with positive selection at a low FDR. The signals in the top 100 regions were frequently centered on candidate genes related to brain function and behavior, including LHFPL3, CADM2, GRIK3, SH3GL2, MBP, PDE7B, NTAN1, and GLRA1. These regions contained significant enrichments in behavioral ontology categories. The 3rd top hit, CCRN4L, plays a major role in lipid metabolism, that is supported by additional metabolism related candidates revealed in our scan, including SCP2D1 and PDXC1. Comparing our method to an empirical outlier approach that does not directly account for demography, we found only modest overlaps between the two methods, with 60% of empirical outliers having no overlap with our demography-based outlier detection approach. Demography-aware approaches have lower-rates of false discovery. Our top candidates for selection, in addition to expanding the set of neurobehavioral candidate genes, include genes related to lipid metabolism, suggesting a dietary target of selection that was important during the period when proto-dogs hunted and fed alongside hunter-gatherers. PMID:26943675

  14. Analysis of genomic regions of Trichoderma harzianum IOC-3844 related to biomass degradation.

    PubMed

    Crucello, Aline; Sforça, Danilo Augusto; Horta, Maria Augusta Crivelente; dos Santos, Clelton Aparecido; Viana, Américo José Carvalho; Beloti, Lilian Luzia; de Toledo, Marcelo Augusto Szymanski; Vincentz, Michel; Kuroshu, Reginaldo Massanobu; de Souza, Anete Pereira

    2015-01-01

    Trichoderma harzianum IOC-3844 secretes high levels of cellulolytic-active enzymes and is therefore a promising strain for use in biotechnological applications in second-generation bioethanol production. However, the T. harzianum biomass degradation mechanism has not been well explored at the genetic level. The present work investigates six genomic regions (~150 kbp each) in this fungus that are enriched with genes related to biomass conversion. A BAC library consisting of 5,760 clones was constructed, with an average insert length of 90 kbp. The assembled BAC sequences revealed 232 predicted genes, 31.5% of which were related to catabolic pathways, including those involved in biomass degradation. An expression profile analysis based on RNA-Seq data demonstrated that putative regulatory elements, such as membrane transport proteins and transcription factors, are located in the same genomic regions as genes related to carbohydrate metabolism and exhibit similar expression profiles. Thus, we demonstrate a rapid and efficient tool that focuses on specific genomic regions by combining a BAC library with transcriptomic data. This is the first BAC-based structural genomic study of the cellulolytic fungus T. harzianum, and its findings provide new perspectives regarding the use of this species in biomass degradation processes.

  15. Genome-Wide Search Identifies 1.9 Mb from the Polar Bear Y Chromosome for Evolutionary Analyses

    PubMed Central

    Bidon, Tobias; Schreck, Nancy; Hailer, Frank; Nilsson, Maria A.; Janke, Axel

    2015-01-01

    The male-inherited Y chromosome is the major haploid fraction of the mammalian genome, rendering Y-linked sequences an indispensable resource for evolutionary research. However, despite recent large-scale genome sequencing approaches, only a handful of Y chromosome sequences have been characterized to date, mainly in model organisms. Using polar bear (Ursus maritimus) genomes, we compare two different in silico approaches to identify Y-linked sequences: 1) Similarity to known Y-linked genes and 2) difference in the average read depth of autosomal versus sex chromosomal scaffolds. Specifically, we mapped available genomic sequencing short reads from a male and a female polar bear against the reference genome and identify 112 Y-chromosomal scaffolds with a combined length of 1.9 Mb. We verified the in silico findings for the longer polar bear scaffolds by male-specific in vitro amplification, demonstrating the reliability of the average read depth approach. The obtained Y chromosome sequences contain protein-coding sequences, single nucleotide polymorphisms, microsatellites, and transposable elements that are useful for evolutionary studies. A high-resolution phylogeny of the polar bear patriline shows two highly divergent Y chromosome lineages, obtained from analysis of the identified Y scaffolds in 12 previously published male polar bear genomes. Moreover, we find evidence of gene conversion among ZFX and ZFY sequences in the giant panda lineage and in the ancestor of ursine and tremarctine bears. Thus, the identification of Y-linked scaffold sequences from unordered genome sequences yields valuable data to infer phylogenomic and population-genomic patterns in bears. PMID:26019166

  16. Genomic Analyses Yield Markers for Identifying Agronomically Important Genes in Potato

    USDA-ARS?s Scientific Manuscript database

    This study explores the genetic architecture underling the potato evolution through a comprehensive assessment of wild and cultivated potato species based on the re-sequencing of 201 accessions of Solanum section Petota with >12 × genome coverage. We identified 450 domesticated genes, which showed e...

  17. Genome-wide methylation analysis identifies genes silenced in non-seminoma cell lines

    PubMed Central

    Noor, Dzul Azri Mohamed; Jeyapalan, Jennie N; Alhazmi, Safiah; Carr, Matthew; Squibb, Benjamin; Wallace, Claire; Tan, Christopher; Cusack, Martin; Hughes, Jaime; Reader, Tom; Shipley, Janet; Sheer, Denise; Scotting, Paul J

    2016-01-01

    Silencing of genes by DNA methylation is a common phenomenon in many types of cancer. However, the genome-wide effect of DNA methylation on gene expression has been analysed in relatively few cancers. Germ cell tumours (GCTs) are a complex group of malignancies. They are unique in developing from a pluripotent progenitor cell. Previous analyses have suggested that non-seminomas exhibit much higher levels of DNA methylation than seminomas. The genomic targets that are methylated, the extent to which this results in gene silencing and the identity of the silenced genes most likely to play a role in the tumours’ biology have not yet been established. In this study, genome-wide methylation and expression analysis of GCT cell lines was combined with gene expression data from primary tumours to address this question. Genome methylation was analysed using the Illumina infinium HumanMethylome450 bead chip system and gene expression was analysed using Affymetrix GeneChip Human Genome U133 Plus 2.0 arrays. Regulation by methylation was confirmed by demethylation using 5-aza-2-deoxycytidine and reverse transcription–quantitative PCR. Large differences in the level of methylation of the CpG islands of individual genes between tumour cell lines correlated well with differential gene expression. Treatment of non-seminoma cells with 5-aza-2-deoxycytidine verified that methylation of all genes tested played a role in their silencing in yolk sac tumour cells and many of these genes were also differentially expressed in primary tumours. Genes silenced by methylation in the various GCT cell lines were identified. Several pluripotency-associated genes were identified as a major functional group of silenced genes. PMID:29263807

  18. Genome-wide methylation analysis identifies genes silenced in non-seminoma cell lines.

    PubMed

    Noor, Dzul Azri Mohamed; Jeyapalan, Jennie N; Alhazmi, Safiah; Carr, Matthew; Squibb, Benjamin; Wallace, Claire; Tan, Christopher; Cusack, Martin; Hughes, Jaime; Reader, Tom; Shipley, Janet; Sheer, Denise; Scotting, Paul J

    2016-01-01

    Silencing of genes by DNA methylation is a common phenomenon in many types of cancer. However, the genome-wide effect of DNA methylation on gene expression has been analysed in relatively few cancers. Germ cell tumours (GCTs) are a complex group of malignancies. They are unique in developing from a pluripotent progenitor cell. Previous analyses have suggested that non-seminomas exhibit much higher levels of DNA methylation than seminomas. The genomic targets that are methylated, the extent to which this results in gene silencing and the identity of the silenced genes most likely to play a role in the tumours' biology have not yet been established. In this study, genome-wide methylation and expression analysis of GCT cell lines was combined with gene expression data from primary tumours to address this question. Genome methylation was analysed using the Illumina infinium HumanMethylome450 bead chip system and gene expression was analysed using Affymetrix GeneChip Human Genome U133 Plus 2.0 arrays. Regulation by methylation was confirmed by demethylation using 5-aza-2-deoxycytidine and reverse transcription-quantitative PCR. Large differences in the level of methylation of the CpG islands of individual genes between tumour cell lines correlated well with differential gene expression. Treatment of non-seminoma cells with 5-aza-2-deoxycytidine verified that methylation of all genes tested played a role in their silencing in yolk sac tumour cells and many of these genes were also differentially expressed in primary tumours. Genes silenced by methylation in the various GCT cell lines were identified. Several pluripotency-associated genes were identified as a major functional group of silenced genes.

  19. Genome-wide association study identifies genes associated with neuropathy in patients with head and neck cancer.

    PubMed

    Reyes-Gibby, Cielito C; Wang, Jian; Yeung, Sai-Ching J; Chaftari, Patrick; Yu, Robert K; Hanna, Ehab Y; Shete, Sanjay

    2018-06-08

    Neuropathic pain (NP), defined as pain initiated or caused by a primary lesion or dysfunction in the nervous system, is a debilitating chronic pain condition often resulting from cancer treatment. Among cancer patients, neuropathy during cancer treatment is a predisposing event for NP. To identify genetic variants influencing the development of NP, we conducted a genome-wide association study in 1,043 patients with squamous cell carcinoma of the head and neck, based on 714,494 tagging single-nucleotide polymorphisms (SNPs) (130 cases, 913 controls). About 12.5% of the patients, who previously had cancer treatment, had neuropathy-associated diagnoses, as defined using the ICD-9/ICD-10 codes. We identified four common SNPs representing four genomic regions: 7q22.3 (rs10950641; SNX8; P = 3.39 × 10 -14 ), 19p13.2 (rs4804217; PCP2; P = 2.95 × 10 -9 ), 3q27.3 (rs6796803; KNG1; P = 6.42 × 10 -9 ) and 15q22.2 (rs4775319; RORA; P = 1.02 × 10 -8 ), suggesting SNX8, PCP2, KNG1 and RORA might be novel target genes for NP in patients with head and neck cancer. Future experimental validation to explore physiological effects of the identified SNPs will provide a better understanding of the biological mechanisms underlying NP and may provide insights into novel therapeutic targets for treatment and management of NP.

  20. Genome-wide analysis of regulatory proteases sequences identified through bioinformatics data mining in Taenia solium.

    PubMed

    Yan, Hong-Bin; Lou, Zhong-Zi; Li, Li; Brindley, Paul J; Zheng, Yadong; Luo, Xuenong; Hou, Junling; Guo, Aijiang; Jia, Wan-Zhong; Cai, Xuepeng

    2014-06-04

    Cysticercosis remains a major neglected tropical disease of humanity in many regions, especially in sub-Saharan Africa, Central America and elsewhere. Owing to the emerging drug resistance and the inability of current drugs to prevent re-infection, identification of novel vaccines and chemotherapeutic agents against Taenia solium and related helminth pathogens is a public health priority. The T. solium genome and the predicted proteome were reported recently, providing a wealth of information from which new interventional targets might be identified. In order to characterize and classify the entire repertoire of protease-encoding genes of T. solium, which act fundamental biological roles in all life processes, we analyzed the predicted proteins of this cestode through a combination of bioinformatics tools. Functional annotation was performed to yield insights into the signaling processes relevant to the complex developmental cycle of this tapeworm and to highlight a suite of the proteases as potential intervention targets. Within the genome of this helminth parasite, we identified 200 open reading frames encoding proteases from five clans, which correspond to 1.68% of the 11,902 protein-encoding genes predicted to be present in its genome. These proteases include calpains, cytosolic, mitochondrial signal peptidases, ubiquitylation related proteins, and others. Many not only show significant similarity to proteases in the Conserved Domain Database but have conserved active sites and catalytic domains. KEGG Automatic Annotation Server (KAAS) analysis indicated that ~60% of these proteases share strong sequence identities with proteins of the KEGG database, which are involved in human disease, metabolic pathways, genetic information processes, cellular processes, environmental information processes and organismal systems. Also, we identified signal peptides and transmembrane helices through comparative analysis with classes of important regulatory proteases

  1. Comparative mitochondrial genomics of snakes: extraordinary substitution rate dynamics and functionality of the duplicate control region

    PubMed Central

    Jiang, Zhi J; Castoe, Todd A; Austin, Christopher C; Burbrink, Frank T; Herron, Matthew D; McGuire, Jimmy A; Parkinson, Christopher L; Pollock, David D

    2007-01-01

    Background The mitochondrial genomes of snakes are characterized by an overall evolutionary rate that appears to be one of the most accelerated among vertebrates. They also possess other unusual features, including short tRNAs and other genes, and a duplicated control region that has been stably maintained since it originated more than 70 million years ago. Here, we provide a detailed analysis of evolutionary dynamics in snake mitochondrial genomes to better understand the basis of these extreme characteristics, and to explore the relationship between mitochondrial genome molecular evolution, genome architecture, and molecular function. We sequenced complete mitochondrial genomes from Slowinski's corn snake (Pantherophis slowinskii) and two cottonmouths (Agkistrodon piscivorus) to complement previously existing mitochondrial genomes, and to provide an improved comparative view of how genome architecture affects molecular evolution at contrasting levels of divergence. Results We present a Bayesian genetic approach that suggests that the duplicated control region can function as an additional origin of heavy strand replication. The two control regions also appear to have different intra-specific versus inter-specific evolutionary dynamics that may be associated with complex modes of concerted evolution. We find that different genomic regions have experienced substantial accelerated evolution along early branches in snakes, with different genes having experienced dramatic accelerations along specific branches. Some of these accelerations appear to coincide with, or subsequent to, the shortening of various mitochondrial genes and the duplication of the control region and flanking tRNAs. Conclusion Fluctuations in the strength and pattern of selection during snake evolution have had widely varying gene-specific effects on substitution rates, and these rate accelerations may have been functionally related to unusual changes in genomic architecture. The among-lineage and

  2. Genomic regions associated with kyphosis in swine

    PubMed Central

    2010-01-01

    Background A back curvature defect similar to kyphosis in humans has been observed in swine herds. The defect ranges from mild to severe curvature of the thoracic vertebrate in split carcasses and has an estimated heritability of 0.3. The objective of this study was to identify genomic regions that affect this trait. Results Single nucleotide polymorphism (SNP) associations performed with 198 SNPs and microsatellite markers in a Duroc-Landrace-Yorkshire resource population (U.S. Meat Animal Research Center, USMARC resource population) of swine provided regions of association with this trait on 15 chromosomes. Positional candidate genes, especially those involved in human skeletal development pathways, were selected for SNP identification. SNPs in 16 candidate genes were genotyped in an F2 population (n = 371) and the USMARC resource herd (n = 1,257) with kyphosis scores. SNPs in KCNN2 on SSC2, RYR1 and PLOD1 on SSC6 and MYST4 on SSC14 were significantly associated with kyphosis in the resource population of swine (P ≤ 0.05). SNPs in CER1 and CDH7 on SSC1, PSMA5 on SSC4, HOXC6 and HOXC8 on SSC5, ADAMTS18 on SSC6 and SOX9 on SSC12 were significantly associated with the kyphosis trait in the F2 population of swine (P ≤ 0.05). Conclusions These data suggest that this kyphosis trait may be affected by several loci and that these may differ by population. Carcass value could be improved by effectively removing this undesirable trait from pig populations. PMID:21176156

  3. Heuristic Bayesian segmentation for discovery of coexpressed genes within genomic regions.

    PubMed

    Pehkonen, Petri; Wong, Garry; Törönen, Petri

    2010-01-01

    Segmentation aims to separate homogeneous areas from the sequential data, and plays a central role in data mining. It has applications ranging from finance to molecular biology, where bioinformatics tasks such as genome data analysis are active application fields. In this paper, we present a novel application of segmentation in locating genomic regions with coexpressed genes. We aim at automated discovery of such regions without requirement for user-given parameters. In order to perform the segmentation within a reasonable time, we use heuristics. Most of the heuristic segmentation algorithms require some decision on the number of segments. This is usually accomplished by using asymptotic model selection methods like the Bayesian information criterion. Such methods are based on some simplification, which can limit their usage. In this paper, we propose a Bayesian model selection to choose the most proper result from heuristic segmentation. Our Bayesian model presents a simple prior for the segmentation solutions with various segment numbers and a modified Dirichlet prior for modeling multinomial data. We show with various artificial data sets in our benchmark system that our model selection criterion has the best overall performance. The application of our method in yeast cell-cycle gene expression data reveals potential active and passive regions of the genome.

  4. Fourteen-Genome Comparison Identifies DNA Markers for Severe-Disease-Associated Strains of Clostridium difficile▿†

    PubMed Central

    Forgetta, Vincenzo; Oughton, Matthew T.; Marquis, Pascale; Brukner, Ivan; Blanchette, Ruth; Haub, Kevin; Magrini, Vince; Mardis, Elaine R.; Gerding, Dale N.; Loo, Vivian G.; Miller, Mark A.; Mulvey, Michael R.; Rupnik, Maja; Dascal, Andre; Dewar, Ken

    2011-01-01

    Clostridium difficile is a common cause of infectious diarrhea in hospitalized patients. A severe and increased incidence of C. difficile infection (CDI) is associated predominantly with the NAP1 strain; however, the existence of other severe-disease-associated (SDA) strains and the extensive genetic diversity across C. difficile complicate reliable detection and diagnosis. Comparative genome analysis of 14 sequenced genomes, including those of a subset of NAP1 isolates, allowed the assessment of genetic diversity within and between strain types to identify DNA markers that are associated with severe disease. Comparative genome analysis of 14 isolates, including five publicly available strains, revealed that C. difficile has a core genome of 3.4 Mb, comprising ∼3,000 genes. Analysis of the core genome identified candidate DNA markers that were subsequently evaluated using a multistrain panel of 177 isolates, representing more than 50 pulsovars and 8 toxinotypes. A subset of 117 isolates from the panel had associated patient data that allowed assessment of an association between the DNA markers and severe CDI. We identified 20 candidate DNA markers for species-wide detection and 10,683 single nucleotide polymorphisms (SNPs) associated with the predominant SDA strain (NAP1). A species-wide detection candidate marker, the sspA gene, was found to be the same across 177 sequenced isolates and lacked significant similarity to those of other species. Candidate SNPs in genes CD1269 and CD1265 were found to associate more closely with disease severity than currently used diagnostic markers, as they were also present in the toxin A-negative and B-positive (A-B+) strain types. The genetic markers identified illustrate the potential of comparative genomics for the discovery of diagnostic DNA-based targets that are species specific or associated with multiple SDA strains. PMID:21508155

  5. Phylogeny Inference of Closely Related Bacterial Genomes: Combining the Features of Both Overlapping Genes and Collinear Genomic Regions

    PubMed Central

    Zhang, Yan-Cong; Lin, Kui

    2015-01-01

    Overlapping genes (OGs) represent one type of widespread genomic feature in bacterial genomes and have been used as rare genomic markers in phylogeny inference of closely related bacterial species. However, the inference may experience a decrease in performance for phylogenomic analysis of too closely or too distantly related genomes. Another drawback of OGs as phylogenetic markers is that they usually take little account of the effects of genomic rearrangement on the similarity estimation, such as intra-chromosome/genome translocations, horizontal gene transfer, and gene losses. To explore such effects on the accuracy of phylogeny reconstruction, we combine phylogenetic signals of OGs with collinear genomic regions, here called locally collinear blocks (LCBs). By putting these together, we refine our previous metric of pairwise similarity between two closely related bacterial genomes. As a case study, we used this new method to reconstruct the phylogenies of 88 Enterobacteriale genomes of the class Gammaproteobacteria. Our results demonstrated that the topological accuracy of the inferred phylogeny was improved when both OGs and LCBs were simultaneously considered, suggesting that combining these two phylogenetic markers may reduce, to some extent, the influence of gene loss on phylogeny inference. Such phylogenomic studies, we believe, will help us to explore a more effective approach to increasing the robustness of phylogeny reconstruction of closely related bacterial organisms. PMID:26715828

  6. Use of whole-genome sequencing to trace, control and characterize the regional expansion of extended-spectrum β-lactamase producing ST15 Klebsiella pneumoniae.

    PubMed

    Zhou, Kai; Lokate, Mariette; Deurenberg, Ruud H; Tepper, Marga; Arends, Jan P; Raangs, Erwin G C; Lo-Ten-Foe, Jerome; Grundmann, Hajo; Rossen, John W A; Friedrich, Alexander W

    2016-02-11

    The study describes the transmission of a CTX-M-15-producing ST15 Klebsiella pneumoniae between patients treated in a single center and the subsequent inter-institutional spread by patient referral occurring between May 2012 and September 2013. A suspected epidemiological link between clinical K. pneumoniae isolates was supported by patient contact tracing and genomic phylogenetic analysis from May to November 2012. By May 2013, a patient treated in three institutions in two cities was involved in an expanding cluster caused by this high-risk clone (HiRiC) (local expansion, CTX-M-15 producing, and containing hypervirulence factors). A clone-specific multiplex PCR was developed for patient screening by which another patient was identified in September 2013. Genomic phylogenetic analysis including published ST15 genomes revealed a close homology with isolates previously found in the USA. Environmental contamination and lack of consistent patient screening were identified as being responsible for the clone dissemination. The investigation addresses the advantages of whole-genome sequencing in the early detection of HiRiC with a high propensity of nosocomial transmission and prolonged circulation in the regional patient population. Our study suggests the necessity for inter-institutional/regional collaboration for infection/outbreak management of K. pneumoniae HiRiCs.

  7. Whole-Genome Sequencing of Sordaria macrospora Mutants Identifies Developmental Genes.

    PubMed

    Nowrousian, Minou; Teichert, Ines; Masloff, Sandra; Kück, Ulrich

    2012-02-01

    The study of mutants to elucidate gene functions has a long and successful history; however, to discover causative mutations in mutants that were generated by random mutagenesis often takes years of laboratory work and requires previously generated genetic and/or physical markers, or resources like DNA libraries for complementation. Here, we present an alternative method to identify defective genes in developmental mutants of the filamentous fungus Sordaria macrospora through Illumina/Solexa whole-genome sequencing. We sequenced pooled DNA from progeny of crosses of three mutants and the wild type and were able to pinpoint the causative mutations in the mutant strains through bioinformatics analysis. One mutant is a spore color mutant, and the mutated gene encodes a melanin biosynthesis enzyme. The causative mutation is a G to A change in the first base of an intron, leading to a splice defect. The second mutant carries an allelic mutation in the pro41 gene encoding a protein essential for sexual development. In the mutant, we detected a complex pattern of deletion/rearrangements at the pro41 locus. In the third mutant, a point mutation in the stop codon of a transcription factor-encoding gene leads to the production of immature fruiting bodies. For all mutants, transformation with a wild type-copy of the affected gene restored the wild-type phenotype. Our data demonstrate that whole-genome sequencing of mutant strains is a rapid method to identify developmental genes in an organism that can be genetically crossed and where a reference genome sequence is available, even without prior mapping information.

  8. Whole-Genome Sequencing of Sordaria macrospora Mutants Identifies Developmental Genes

    PubMed Central

    Nowrousian, Minou; Teichert, Ines; Masloff, Sandra; Kück, Ulrich

    2012-01-01

    The study of mutants to elucidate gene functions has a long and successful history; however, to discover causative mutations in mutants that were generated by random mutagenesis often takes years of laboratory work and requires previously generated genetic and/or physical markers, or resources like DNA libraries for complementation. Here, we present an alternative method to identify defective genes in developmental mutants of the filamentous fungus Sordaria macrospora through Illumina/Solexa whole-genome sequencing. We sequenced pooled DNA from progeny of crosses of three mutants and the wild type and were able to pinpoint the causative mutations in the mutant strains through bioinformatics analysis. One mutant is a spore color mutant, and the mutated gene encodes a melanin biosynthesis enzyme. The causative mutation is a G to A change in the first base of an intron, leading to a splice defect. The second mutant carries an allelic mutation in the pro41 gene encoding a protein essential for sexual development. In the mutant, we detected a complex pattern of deletion/rearrangements at the pro41 locus. In the third mutant, a point mutation in the stop codon of a transcription factor-encoding gene leads to the production of immature fruiting bodies. For all mutants, transformation with a wild type-copy of the affected gene restored the wild-type phenotype. Our data demonstrate that whole-genome sequencing of mutant strains is a rapid method to identify developmental genes in an organism that can be genetically crossed and where a reference genome sequence is available, even without prior mapping information. PMID:22384404

  9. Genomic locus modulating corneal thickness in the mouse identifies POU6F2 as a potential risk of developing glaucoma

    PubMed Central

    Li, Ying; Wang, Jiaxing; Allingham, R. Rand; Hauser, Michael A.; Wiggs, Janey L.; Geisert, Eldon E.

    2018-01-01

    Central corneal thickness (CCT) is one of the most heritable ocular traits and it is also a phenotypic risk factor for primary open angle glaucoma (POAG). The present study uses the BXD Recombinant Inbred (RI) strains to identify novel quantitative trait loci (QTLs) modulating CCT in the mouse with the potential of identifying a molecular link between CCT and risk of developing POAG. The BXD RI strain set was used to define mammalian genomic loci modulating CCT, with a total of 818 corneas measured from 61 BXD RI strains (between 60–100 days of age). The mice were anesthetized and the eyes were positioned in front of the lens of the Phoenix Micron IV Image-Guided OCT system or the Bioptigen OCT system. CCT data for each strain was averaged and used to QTLs modulating this phenotype using the bioinformatics tools on GeneNetwork (www.genenetwork.org). The candidate genes and genomic loci identified in the mouse were then directly compared with the summary data from a human POAG genome wide association study (NEIGHBORHOOD) to determine if any genomic elements modulating mouse CCT are also risk factors for POAG.This analysis revealed one significant QTL on Chr 13 and a suggestive QTL on Chr 7. The significant locus on Chr 13 (13 to 19 Mb) was examined further to define candidate genes modulating this eye phenotype. For the Chr 13 QTL in the mouse, only one gene in the region (Pou6f2) contained nonsynonymous SNPs. Of these five nonsynonymous SNPs in Pou6f2, two resulted in changes in the amino acid proline which could result in altered secondary structure affecting protein function. The 7 Mb region under the mouse Chr 13 peak distributes over 2 chromosomes in the human: Chr 1 and Chr 7. These genomic loci were examined in the NEIGHBORHOOD database to determine if they are potential risk factors for human glaucoma identified using meta-data from human GWAS. The top 50 hits all resided within one gene (POU6F2), with the highest significance level of p = 10−6 for SNP

  10. Cancer Genomic Resources and Present Needs in the Latin American Region.

    PubMed

    Torres, Ángela; Oliver, Javier; Frecha, Cecilia; Montealegre, Ana Lorena; Quezada-Urbán, Rosalía; Díaz-Velásquez, Clara Estela; Vaca-Paniagua, Felipe; Perdomo, Sandra

    2017-01-01

    In Latin America (LA), cancer is the second leading cause of death, and little is known about the capacities and needs for the development of research in the field of cancer genomics. In order to evaluate the current capacity for and development of cancer genomics in LA, we collected the available information on genomics, including the number of next-generation sequencing (NGS) platforms, the number of cancer research institutions and research groups, publications in the last 10 years, educational programs, and related national cancer control policies. Currently, there are 221 NGS platforms and 118 research groups in LA developing cancer genomics projects. A total of 272 articles in the field of cancer genetics/genomics were published by authors affiliated to Latin American institutions. Educational programs in genomics are scarce, almost exclusive of graduate programs, and only few are concerning cancer. Only 14 countries have national cancer control plans, but all of them consider secondary prevention strategies for early diagnosis, opportune treatment, and decreasing mortality, where genomic analyses could be implemented. Despite recent advances in introducing knowledge about cancer genomics and its application to LA, the region lacks development of integrated genomic research projects, improved use of NGS platforms, implementation of associated educational programs, and health policies that could have an impact on cancer care. © 2017 S. Karger AG, Basel.

  11. Genome-wide association study identified three major QTL for carcass weight including the PLAG1-CHCHD7 QTN for stature in Japanese Black cattle

    PubMed Central

    2012-01-01

    Background Significant quantitative trait loci (QTL) for carcass weight were previously mapped on several chromosomes in Japanese Black half-sib families. Two QTL, CW-1 and CW-2, were narrowed down to 1.1-Mb and 591-kb regions, respectively. Recent advances in genomic tools allowed us to perform a genome-wide association study (GWAS) in cattle to detect associations in a general population and estimate their effect size. Here, we performed a GWAS for carcass weight using 1156 Japanese Black steers. Results Bonferroni-corrected genome-wide significant associations were detected in three chromosomal regions on bovine chromosomes (BTA) 6, 8, and 14. The associated single nucleotide polymorphisms (SNP) on BTA 6 were in linkage disequilibrium with the SNP encoding NCAPG Ile442Met, which was previously identified as a candidate quantitative trait nucleotide for CW-2. In contrast, the most highly associated SNP on BTA 14 was located 2.3-Mb centromeric from the previously identified CW-1 region. Linkage disequilibrium mapping led to a revision of the CW-1 region within a 0.9-Mb interval around the associated SNP, and targeted resequencing followed by association analysis highlighted the quantitative trait nucleotides for bovine stature in the PLAG1-CHCHD7 intergenic region. The association on BTA 8 was accounted for by two SNP on the BovineSNP50 BeadChip and corresponded to CW-3, which was simultaneously detected by linkage analyses using half-sib families. The allele substitution effects of CW-1, CW-2, and CW-3 were 28.4, 35.3, and 35.0 kg per allele, respectively. Conclusion The GWAS revealed the genetic architecture underlying carcass weight variation in Japanese Black cattle in which three major QTL accounted for approximately one-third of the genetic variance. PMID:22607022

  12. Sparse whole genome sequencing identifies two loci for major depressive disorder

    PubMed Central

    2015-01-01

    Major depressive disorder (MDD), one of the most frequently encountered forms of mental illness and a leading cause of disability worldwide1, poses a major challenge to genetic analysis. To date no robustly replicated genetic loci have been identified 2, despite analysis of more than 9,000 cases3. Using low coverage genome sequence of 5,303 Chinese women with recurrent MDD selected to reduce phenotypic heterogeneity, and 5,337 controls screened to exclude MDD, we identified and replicated two genome-wide significant loci contributing to risk of MDD on chromosome 10: one near the SIRT1 gene (P-value = 2.53×10−10) the other in an intron of the LHPP gene (P = 6.45×10−12). Analysis of 4,509 cases with a severe subtype of MDD, melancholia, yielded an increased genetic signal at the SIRT1 locus. We attribute our success to the recruitment of relatively homogeneous cases with severe illness. PMID:26176920

  13. A genome-wide association study reveals novel genomic regions and positional candidate genes for fat deposition in broiler chickens.

    PubMed

    Moreira, Gabriel Costa Monteiro; Boschiero, Clarissa; Cesar, Aline Silva Mello; Reecy, James M; Godoy, Thaís Fernanda; Trevisoli, Priscila Anchieta; Cantão, Maurício E; Ledur, Mônica Corrêa; Ibelli, Adriana Mércia Guaratini; Peixoto, Jane de Oliveira; Moura, Ana Silvia Alves Meira Tavares; Garrick, Dorian; Coutinho, Luiz Lehmann

    2018-05-21

    Excess fat content in chickens has a negative impact on poultry production. The discovery of QTL associated with fat deposition in the carcass allows the identification of positional candidate genes (PCGs) that might regulate fat deposition and be useful for selection against excess fat content in chicken's carcass. This study aimed to estimate genomic heritability coefficients and to identify QTLs and PCGs for abdominal fat (ABF) and skin (SKIN) traits in a broiler chicken population, originated from the White Plymouth Rock and White Cornish breeds. ABF and SKIN are moderately heritable traits in our broiler population with estimates ranging from 0.23 to 0.33. Using a high density SNP panel (355,027 informative SNPs), we detected nine unique QTLs that were associated with these fat traits. Among these, four QTL were novel, while five have been previously reported in the literature. Thirteen PCGs were identified that might regulate fat deposition in these QTL regions: JDP2, PLCG1, HNF4A, FITM2, ADIPOR1, PTPN11, MVK, APOA1, APOA4, APOA5, ENSGALG00000000477, ENSGALG00000000483, and ENSGALG00000005043. We used sequence information from founder animals to detect 4843 SNPs in the 13 PCGs. Among those, two were classified as potentially deleterious and two as high impact SNPs. This study generated novel results that can contribute to a better understanding of fat deposition in chickens. The use of high density array of SNPs increases genome coverage and improves QTL resolution than would have been achieved with low density. The identified PCGs were involved in many biological processes that regulate lipid storage. The SNPs identified in the PCGs, especially those predicted as potentially deleterious and high impact, may affect fat deposition. Validation should be undertaken before using these SNPs for selection against carcass fat accumulation and to improve feed efficiency in broiler chicken production.

  14. Genomes2Drugs: Identifies Target Proteins and Lead Drugs from Proteome Data

    PubMed Central

    Toomey, David; Hoppe, Heinrich C.; Brennan, Marian P.; Nolan, Kevin B.; Chubb, Anthony J.

    2009-01-01

    Background Genome sequencing and bioinformatics have provided the full hypothetical proteome of many pathogenic organisms. Advances in microarray and mass spectrometry have also yielded large output datasets of possible target proteins/genes. However, the challenge remains to identify new targets for drug discovery from this wealth of information. Further analysis includes bioinformatics and/or molecular biology tools to validate the findings. This is time consuming and expensive, and could fail to yield novel drugs if protein purification and crystallography is impossible. To pre-empt this, a researcher may want to rapidly filter the output datasets for proteins that show good homology to proteins that have already been structurally characterised or proteins that are already targets for known drugs. Critically, those researchers developing novel antibiotics need to select out the proteins that show close homology to any human proteins, as future inhibitors are likely to cross-react with the host protein, causing off-target toxicity effects later in clinical trials. Methodology/Principal Findings To solve many of these issues, we have developed a free online resource called Genomes2Drugs which ranks sequences to identify proteins that are (i) homologous to previously crystallized proteins or (ii) targets of known drugs, but are (iii) not homologous to human proteins. When tested using the Plasmodium falciparum malarial genome the program correctly enriched the ranked list of proteins with known drug target proteins. Conclusions/Significance Genomes2Drugs rapidly identifies proteins that are likely to succeed in drug discovery pipelines. This free online resource helps in the identification of potential drug targets. Importantly, the program further highlights proteins that are likely to be inhibited by FDA-approved drugs. These drugs can then be rapidly moved into Phase IV clinical studies under ‘change-of-application’ patents. PMID:19593435

  15. Whole-genome sequencing of a quarter-century melioidosis outbreak in temperate Australia uncovers a region of low-prevalence endemicity

    PubMed Central

    Chapple, Stephanie N. J.; Sarovich, Derek S.; Holden, Matthew T. G.; Peacock, Sharon J.; Buller, Nicky; Golledge, Clayton; Mayo, Mark; Currie, Bart J.

    2016-01-01

    Melioidosis, caused by the highly recombinogenic bacterium Burkholderia pseudomallei, is a disease with high mortality. Tracing the origin of melioidosis outbreaks and understanding how the bacterium spreads and persists in the environment are essential to protecting public and veterinary health and reducing mortality associated with outbreaks. We used whole-genome sequencing to compare isolates from a historical quarter-century outbreak that occurred between 1966 and 1991 in the Avon Valley, Western Australia, a region far outside the known range of B. pseudomallei endemicity. All Avon Valley outbreak isolates shared the same multilocus sequence type (ST-284), which has not been identified outside this region. We found substantial genetic diversity among isolates based on a comparison of genome-wide variants, with no clear correlation between genotypes and temporal, geographical or source data. We observed little evidence of recombination in the outbreak strains, indicating that genetic diversity among these isolates has primarily accrued by mutation. Phylogenomic analysis demonstrated that the isolates confidently grouped within the Australian B. pseudomallei clade, thereby ruling out introduction from a melioidosis-endemic region outside Australia. Collectively, our results point to B. pseudomallei ST-284 being present in the Avon Valley for longer than previously recognized, with its persistence and genomic diversity suggesting long-term, low-prevalence endemicity in this temperate region. Our findings provide a concerning demonstration of the potential for environmental persistence of B. pseudomallei far outside the conventional endemic regions. An expected increase in extreme weather events may reactivate latent B. pseudomallei populations in this region. PMID:28348862

  16. Genome-wide DNA methylation profile identified a unique set of differentially methylated immune genes in oral squamous cell carcinoma patients in India.

    PubMed

    Basu, Baidehi; Chakraborty, Joyeeta; Chandra, Aditi; Katarkar, Atul; Baldevbhai, Jadav Ritesh Kumar; Dhar Chowdhury, Debjit; Ray, Jay Gopal; Chaudhuri, Keya; Chatterjee, Raghunath

    2017-01-01

    Oral squamous cell carcinoma (OSCC) is one of the common malignancies in Southeast Asia. Epigenetic changes, mainly the altered DNA methylation, have been implicated in many cancers. Considering the varied environmental and genotoxic exposures among the Indian population, we conducted a genome-wide DNA methylation study on paired tumor and adjacent normal tissues of ten well-differentiated OSCC patients and validated in an additional 53 well-differentiated OSCC and adjacent normal samples. Genome-wide DNA methylation analysis identified several novel differentially methylated regions associated with OSCC. Hypermethylation is primarily enriched in the CpG-rich regions, while hypomethylation is mainly in the open sea. Distinct epigenetic drifts for hypo- and hypermethylation across CpG islands suggested independent mechanisms of hypo- and hypermethylation in OSCC development. Aberrant DNA methylation in the promoter regions are concomitant with gene expression. Hypomethylation of immune genes reflect the lymphocyte infiltration into the tumor microenvironment. Comparison of methylome data with 312 TCGA HNSCC samples identified a unique set of hypomethylated promoters among the OSCC patients in India. Pathway analysis of unique hypomethylated promoters indicated that the OSCC patients in India induce an anti-tumor T cell response, with mobilization of T lymphocytes in the neoplastic environment. Survival analysis of these epigenetically regulated immune genes suggested their prominent role in OSCC progression. Our study identified a unique set of hypomethylated regions, enriched in the promoters of immune response genes, and indicated the presence of a strong immune component in the tumor microenvironment. These methylation changes may serve as potential molecular markers to define risk and to monitor the prognosis of OSCC patients in India.

  17. Genome-Wide Search Identifies 1.9 Mb from the Polar Bear Y Chromosome for Evolutionary Analyses.

    PubMed

    Bidon, Tobias; Schreck, Nancy; Hailer, Frank; Nilsson, Maria A; Janke, Axel

    2015-05-27

    The male-inherited Y chromosome is the major haploid fraction of the mammalian genome, rendering Y-linked sequences an indispensable resource for evolutionary research. However, despite recent large-scale genome sequencing approaches, only a handful of Y chromosome sequences have been characterized to date, mainly in model organisms. Using polar bear (Ursus maritimus) genomes, we compare two different in silico approaches to identify Y-linked sequences: 1) Similarity to known Y-linked genes and 2) difference in the average read depth of autosomal versus sex chromosomal scaffolds. Specifically, we mapped available genomic sequencing short reads from a male and a female polar bear against the reference genome and identify 112 Y-chromosomal scaffolds with a combined length of 1.9 Mb. We verified the in silico findings for the longer polar bear scaffolds by male-specific in vitro amplification, demonstrating the reliability of the average read depth approach. The obtained Y chromosome sequences contain protein-coding sequences, single nucleotide polymorphisms, microsatellites, and transposable elements that are useful for evolutionary studies. A high-resolution phylogeny of the polar bear patriline shows two highly divergent Y chromosome lineages, obtained from analysis of the identified Y scaffolds in 12 previously published male polar bear genomes. Moreover, we find evidence of gene conversion among ZFX and ZFY sequences in the giant panda lineage and in the ancestor of ursine and tremarctine bears. Thus, the identification of Y-linked scaffold sequences from unordered genome sequences yields valuable data to infer phylogenomic and population-genomic patterns in bears. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  18. Invited review: Inbreeding in the genomics era: Inbreeding, inbreeding depression, and management of genomic variability.

    PubMed

    Howard, Jeremy T; Pryce, Jennie E; Baes, Christine; Maltecca, Christian

    2017-08-01

    Traditionally, pedigree-based relationship coefficients have been used to manage the inbreeding and degree of inbreeding depression that exists within a population. The widespread incorporation of genomic information in dairy cattle genetic evaluations allows for the opportunity to develop and implement methods to manage populations at the genomic level. As a result, the realized proportion of the genome that 2 individuals share can be more accurately estimated instead of using pedigree information to estimate the expected proportion of shared alleles. Furthermore, genomic information allows genome-wide relationship or inbreeding estimates to be augmented to characterize relationships for specific regions of the genome. Region-specific stretches can be used to more effectively manage areas of low genetic diversity or areas that, when homozygous, result in reduced performance across economically important traits. The use of region-specific metrics should allow breeders to more precisely manage the trade-off between the genetic value of the progeny and undesirable side effects associated with inbreeding. Methods tailored toward more effectively identifying regions affected by inbreeding and their associated use to manage the genome at the herd level, however, still need to be developed. We have reviewed topics related to inbreeding, measures of relatedness, genetic diversity and methods to manage populations at the genomic level, and we discuss future challenges related to managing populations through implementing genomic methods at the herd and population levels. Copyright © 2017 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  19. Genome-Wide Association Study Identifies Candidate Genes That Affect Plant Height in Chinese Elite Maize (Zea mays L.) Inbred Lines

    PubMed Central

    Wang, Jianjun; Liu, Changlin; Li, Mingshun; Zhang, Degui; Bai, Li; Zhang, Shihuang; Li, Xinhai

    2011-01-01

    Background The harvest index for many crops can be improved through introduction of dwarf stature to increase lodging resistance, combined with early maturity. The inbred line Shen5003 has been widely used in maize breeding in China as a key donor line for the dwarf trait. Also, one major quantitative trait locus (QTL) controlling plant height has been identified in bin 5.05–5.06, across several maize bi-parental populations. With the progress of publicly available maize genome sequence, the objective of this work was to identify the candidate genes that affect plant height among Chinese maize inbred lines with genome wide association studies (GWAS). Methods and Findings A total of 284 maize inbred lines were genotyped using over 55,000 evenly spaced SNPs, from which a set of 41,101 SNPs were filtered with stringent quality control for further data analysis. With the population structure controlled in a mixed linear model (MLM) implemented with the software TASSEL, we carried out a genome-wide association study (GWAS) for plant height. A total of 204 SNPs (P≤0.0001) and 105 genomic loci harboring coding regions were identified. Four loci containing genes associated with gibberellin (GA), auxin, and epigenetic pathways may be involved in natural variation that led to a dwarf phenotype in elite maize inbred lines. Among them, a favorable allele for dwarfing on chromosome 5 (SNP PZE-105115518) was also identified in six Shen5003 derivatives. Conclusions The fact that a large number of previously identified dwarf genes are missing from our study highlights the discovery of the consistently significant association of the gene harboring the SNP PZE-105115518 with plant height (P = 8.91e-10) and its confirmation in the Shen5003 introgression lines. Results from this study suggest that, in the maize breeding schema in China, specific alleles were selected, that have played important roles in maize production. PMID:22216221

  20. Genome-wide association study for longevity with whole-genome sequencing in 3 cattle breeds.

    PubMed

    Zhang, Qianqian; Guldbrandtsen, Bernt; Thomasen, Jørn Rind; Lund, Mogens Sandø; Sahana, Goutam

    2016-09-01

    Longevity is an important economic trait in dairy production. Improvements in longevity could increase the average number of lactations per cow, thereby affecting the profitability of the dairy cattle industry. Improved longevity for cows reduces the replacement cost of stock and enables animals to achieve the highest production period. Moreover, longevity is an indirect indicator of animal welfare. Using whole-genome sequencing variants in 3 dairy cattle breeds, we carried out an association study and identified 7 genomic regions in Holstein and 5 regions in Red Dairy Cattle that were associated with longevity. Meta-analyses of 3 breeds revealed 2 significant genomic regions, located on chromosomes 6 (META-CHR6-88MB) and 18 (META-CHR18-58MB). META-CHR6-88MB overlaps with 2 known genes: neuropeptide G-protein coupled receptor (NPFFR2; 89,052,210-89,059,348 bp) and vitamin D-binding protein precursor (GC; 88,695,940-88,739,180 bp). The NPFFR2 gene was previously identified as a candidate gene for mastitis resistance. META-CHR18-58MB overlaps with zinc finger protein 717 (ZNF717; 58,130,465-58,141,877 bp) and zinc finger protein 613 (ZNF613; 58,115,782-58,117,110 bp), which have been associated with calving difficulties. Information on longevity-associated genomic regions could be used to find causal genes/variants influencing longevity and exploited to improve the reliability of genomic prediction. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  1. Combining functional genomics and chemical biology to identify targets of bioactive compounds.

    PubMed

    Ho, Cheuk Hei; Piotrowski, Jeff; Dixon, Scott J; Baryshnikova, Anastasia; Costanzo, Michael; Boone, Charles

    2011-02-01

    Genome sequencing projects have revealed thousands of suspected genes, challenging researchers to develop efficient large-scale functional analysis methodologies. Determining the function of a gene product generally requires a means to alter its function. Genetically tractable model organisms have been widely exploited for the isolation and characterization of activating and inactivating mutations in genes encoding proteins of interest. Chemical genetics represents a complementary approach involving the use of small molecules capable of either inactivating or activating their targets. Saccharomyces cerevisiae has been an important test bed for the development and application of chemical genomic assays aimed at identifying targets and modes of action of known and uncharacterized compounds. Here we review yeast chemical genomic assays strategies for drug target identification. Copyright © 2010 Elsevier Ltd. All rights reserved.

  2. Next Generation Semiconductor Based Sequencing of the Donkey (Equus asinus) Genome Provided Comparative Sequence Data against the Horse Genome and a Few Millions of Single Nucleotide Polymorphisms

    PubMed Central

    Bertolini, Francesca; Scimone, Concetta; Geraci, Claudia; Schiavo, Giuseppina; Utzeri, Valerio Joe; Chiofalo, Vincenzo; Fontanesi, Luca

    2015-01-01

    Few studies investigated the donkey (Equus asinus) at the whole genome level so far. Here, we sequenced the genome of two male donkeys using a next generation semiconductor based sequencing platform (the Ion Proton sequencer) and compared obtained sequence information with the available donkey draft genome (and its Illumina reads from which it was originated) and with the EquCab2.0 assembly of the horse genome. Moreover, the Ion Torrent Personal Genome Analyzer was used to sequence reduced representation libraries (RRL) obtained from a DNA pool including donkeys of different breeds (Grigio Siciliano, Ragusano and Martina Franca). The number of next generation sequencing reads aligned with the EquCab2.0 horse genome was larger than those aligned with the draft donkey genome. This was due to the larger N50 for contigs and scaffolds of the horse genome. Nucleotide divergence between E. caballus and E. asinus was estimated to be ~ 0.52-0.57%. Regions with low nucleotide divergence were identified in several autosomal chromosomes and in the whole chromosome X. These regions might be evolutionally important in equids. Comparing Y-chromosome regions we identified variants that could be useful to track donkey paternal lineages. Moreover, about 4.8 million of single nucleotide polymorphisms (SNPs) in the donkey genome were identified and annotated combining sequencing data from Ion Proton (whole genome sequencing) and Ion Torrent (RRL) runs with Illumina reads. A higher density of SNPs was present in regions homologous to horse chromosome 12, in which several studies reported a high frequency of copy number variants. The SNPs we identified constitute a first resource useful to describe variability at the population genomic level in E. asinus and to establish monitoring systems for the conservation of donkey genetic resources. PMID:26151450

  3. Next Generation Semiconductor Based Sequencing of the Donkey (Equus asinus) Genome Provided Comparative Sequence Data against the Horse Genome and a Few Millions of Single Nucleotide Polymorphisms.

    PubMed

    Bertolini, Francesca; Scimone, Concetta; Geraci, Claudia; Schiavo, Giuseppina; Utzeri, Valerio Joe; Chiofalo, Vincenzo; Fontanesi, Luca

    2015-01-01

    Few studies investigated the donkey (Equus asinus) at the whole genome level so far. Here, we sequenced the genome of two male donkeys using a next generation semiconductor based sequencing platform (the Ion Proton sequencer) and compared obtained sequence information with the available donkey draft genome (and its Illumina reads from which it was originated) and with the EquCab2.0 assembly of the horse genome. Moreover, the Ion Torrent Personal Genome Analyzer was used to sequence reduced representation libraries (RRL) obtained from a DNA pool including donkeys of different breeds (Grigio Siciliano, Ragusano and Martina Franca). The number of next generation sequencing reads aligned with the EquCab2.0 horse genome was larger than those aligned with the draft donkey genome. This was due to the larger N50 for contigs and scaffolds of the horse genome. Nucleotide divergence between E. caballus and E. asinus was estimated to be ~ 0.52-0.57%. Regions with low nucleotide divergence were identified in several autosomal chromosomes and in the whole chromosome X. These regions might be evolutionally important in equids. Comparing Y-chromosome regions we identified variants that could be useful to track donkey paternal lineages. Moreover, about 4.8 million of single nucleotide polymorphisms (SNPs) in the donkey genome were identified and annotated combining sequencing data from Ion Proton (whole genome sequencing) and Ion Torrent (RRL) runs with Illumina reads. A higher density of SNPs was present in regions homologous to horse chromosome 12, in which several studies reported a high frequency of copy number variants. The SNPs we identified constitute a first resource useful to describe variability at the population genomic level in E. asinus and to establish monitoring systems for the conservation of donkey genetic resources.

  4. Full-genome sequences of hepatitis B virus subgenotype D3 isolates from the Brazilian Amazon Region.

    PubMed

    Spitz, Natália; Mello, Francisco C A; Araujo, Natalia Motta

    2015-02-01

    The Brazilian Amazon Region is a highly endemic area for hepatitis B virus (HBV). However, little is known regarding the genetic variability of the strains circulating in this geographical region. Here, we describe the first full-length genomes of HBV isolated in the Brazilian Amazon Region; these genomes are also the first complete HBV subgenotype D3 genomes reported for Brazil. The genomes of the five Brazilian isolates were all 3,182 base pairs in length and the isolates were classified as belonging to subgenotype D3, subtypes ayw2 (n = 3) and ayw3 (n = 2). Phylogenetic analysis suggested that the Brazilian sequences are not likely to be closely related to European D3 sequences. Such results will contribute to further epidemiological and evolutionary studies of HBV.

  5. Non-coding genomic regions possessing enhancer and silencer potential are associated with healthy aging and exceptional survival.

    PubMed

    Kim, Sangkyu; Welsh, David A; Myers, Leann; Cherry, Katie E; Wyckoff, Jennifer; Jazwinski, S Michal

    2015-02-28

    We have completed a genome-wide linkage scan for healthy aging using data collected from a family study, followed by fine-mapping by association in a separate population, the first such attempt reported. The family cohort consisted of parents of age 90 or above and their children ranging in age from 50 to 80. As a quantitative measure of healthy aging, we used a frailty index, called FI34, based on 34 health and function variables. The linkage scan found a single significant linkage peak on chromosome 12. Using an independent cohort of unrelated nonagenarians, we carried out a fine-scale association mapping of the region suggestive of linkage and identified three sites associated with healthy aging. These healthy-aging sites (HASs) are located in intergenic regions at 12q13-14. HAS-1 has been previously associated with multiple diseases, and an enhancer was recently mapped and experimentally validated within the site. HAS-2 is a previously uncharacterized site possessing genomic features suggestive of enhancer activity. HAS-3 contains features associated with Polycomb repression. The HASs also contain variants associated with exceptional longevity, based on a separate analysis. Our results provide insight into functional genomic networks involving non-coding regulatory elements that are involved in healthy aging and longevity.

  6. Non-coding genomic regions possessing enhancer and silencer potential are associated with healthy aging and exceptional survival

    PubMed Central

    Kim, Sangkyu; Welsh, David A.; Myers, Leann; Cherry, Katie E.; Wyckoff, Jennifer; Jazwinski, S. Michal

    2015-01-01

    We have completed a genome-wide linkage scan for healthy aging using data collected from a family study, followed by fine-mapping by association in a separate population, the first such attempt reported. The family cohort consisted of parents of age 90 or above and their children ranging in age from 50 to 80. As a quantitative measure of healthy aging, we used a frailty index, called FI34, based on 34 health and function variables. The linkage scan found a single significant linkage peak on chromosome 12. Using an independent cohort of unrelated nonagenarians, we carried out a fine-scale association mapping of the region suggestive of linkage and identified three sites associated with healthy aging. These healthy-aging sites (HASs) are located in intergenic regions at 12q13–14. HAS-1 has been previously associated with multiple diseases, and an enhancer was recently mapped and experimentally validated within the site. HAS-2 is a previously uncharacterized site possessing genomic features suggestive of enhancer activity. HAS-3 contains features associated with Polycomb repression. The HASs also contain variants associated with exceptional longevity, based on a separate analysis. Our results provide insight into functional genomic networks involving non-coding regulatory elements that are involved in healthy aging and longevity. PMID:25682868

  7. Phylogenetic Conflict in Bears Identified by Automated Discovery of Transposable Element Insertions in Low-Coverage Genomes

    PubMed Central

    Gallus, Susanne; Janke, Axel

    2017-01-01

    Abstract Phylogenetic reconstruction from transposable elements (TEs) offers an additional perspective to study evolutionary processes. However, detecting phylogenetically informative TE insertions requires tedious experimental work, limiting the power of phylogenetic inference. Here, we analyzed the genomes of seven bear species using high-throughput sequencing data to detect thousands of TE insertions. The newly developed pipeline for TE detection called TeddyPi (TE detection and discovery for Phylogenetic Inference) identified 150,513 high-quality TE insertions in the genomes of ursine and tremarctine bears. By integrating different TE insertion callers and using a stringent filtering approach, the TeddyPi pipeline produced highly reliable TE insertion calls, which were confirmed by extensive in vitro validation experiments. Analysis of single nucleotide substitutions in the flanking regions of the TEs shows that these substitutions correlate with the phylogenetic signal from the TE insertions. Our phylogenomic analyses show that TEs are a major driver of genomic variation in bears and enabled phylogenetic reconstruction of a well-resolved species tree, despite strong signals for incomplete lineage sorting and introgression. The analyses show that the Asiatic black, sun, and sloth bear form a monophyletic clade, in which phylogenetic incongruence originates from incomplete lineage sorting. TeddyPi is open source and can be adapted to various TE and structural variation callers. The pipeline makes it possible to confidently extract thousands of TE insertions even from low-coverage genomes (∼10×) of nonmodel organisms. This opens new possibilities for biologists to study phylogenies and evolutionary processes as well as rates and patterns of (retro-)transposition and structural variation. PMID:28985298

  8. Analysis of Genomic Regions of Trichoderma harzianum IOC-3844 Related to Biomass Degradation

    PubMed Central

    Crucello, Aline; Sforça, Danilo Augusto; Horta, Maria Augusta Crivelente; dos Santos, Clelton Aparecido; Viana, Américo José Carvalho; Beloti, Lilian Luzia; de Toledo, Marcelo Augusto Szymanski; Vincentz, Michel; Kuroshu, Reginaldo Massanobu; de Souza, Anete Pereira

    2015-01-01

    Trichoderma harzianum IOC-3844 secretes high levels of cellulolytic-active enzymes and is therefore a promising strain for use in biotechnological applications in second-generation bioethanol production. However, the T. harzianum biomass degradation mechanism has not been well explored at the genetic level. The present work investigates six genomic regions (~150 kbp each) in this fungus that are enriched with genes related to biomass conversion. A BAC library consisting of 5,760 clones was constructed, with an average insert length of 90 kbp. The assembled BAC sequences revealed 232 predicted genes, 31.5% of which were related to catabolic pathways, including those involved in biomass degradation. An expression profile analysis based on RNA-Seq data demonstrated that putative regulatory elements, such as membrane transport proteins and transcription factors, are located in the same genomic regions as genes related to carbohydrate metabolism and exhibit similar expression profiles. Thus, we demonstrate a rapid and efficient tool that focuses on specific genomic regions by combining a BAC library with transcriptomic data. This is the first BAC-based structural genomic study of the cellulolytic fungus T. harzianum, and its findings provide new perspectives regarding the use of this species in biomass degradation processes. PMID:25836973

  9. Recurrent DNA inversion rearrangements in the human genome

    PubMed Central

    Flores, Margarita; Morales, Lucía; Gonzaga-Jauregui, Claudia; Domínguez-Vidaña, Rocío; Zepeda, Cinthya; Yañez, Omar; Gutiérrez, María; Lemus, Tzitziki; Valle, David; Avila, Ma. Carmen; Blanco, Daniel; Medina-Ruiz, Sofía; Meza, Karla; Ayala, Erandi; García, Delfino; Bustos, Patricia; González, Víctor; Girard, Lourdes; Tusie-Luna, Teresa; Dávila, Guillermo; Palacios, Rafael

    2007-01-01

    Several lines of evidence suggest that reiterated sequences in the human genome are targets for nonallelic homologous recombination (NAHR), which facilitates genomic rearrangements. We have used a PCR-based approach to identify breakpoint regions of rearranged structures in the human genome. In particular, we have identified intrachromosomal identical repeats that are located in reverse orientation, which may lead to chromosomal inversions. A bioinformatic workflow pathway to select appropriate regions for analysis was developed. Three such regions overlapping with known human genes, located on chromosomes 3, 15, and 19, were analyzed. The relative proportion of wild-type to rearranged structures was determined in DNA samples from blood obtained from different, unrelated individuals. The results obtained indicate that recurrent genomic rearrangements occur at relatively high frequency in somatic cells. Interestingly, the rearrangements studied were significantly more abundant in adults than in newborn individuals, suggesting that such DNA rearrangements might start to appear during embryogenesis or fetal life and continue to accumulate after birth. The relevance of our results in regard to human genomic variation is discussed. PMID:17389356

  10. Comparative Physical Mapping of the Apospory-Specific Genomic Region in Two Apomictic Grasses: Pennisetum squamulatum and Cenchrus ciliaris

    PubMed Central

    Goel, Shailendra; Chen, Zhenbang; Akiyama, Yukio; Conner, Joann A.; Basu, Manojit; Gualtieri, Gustavo; Hanna, Wayne W.; Ozias-Akins, Peggy

    2006-01-01

    In gametophytic apomicts of the aposporous type, each cell of the embryo sac is genetically identical to somatic cells of the ovule because they are products of mitosis, not of meiosis. The egg of the aposporous embryo sac follows parthenogenetic development into an embryo; therefore, uniform progeny result even from heterozygous plants, a trait that would be valuable for many crop species. Attempts to introgress apomixis from wild relatives into major crops through traditional breeding have been hindered by low or no recombination within the chromosomal region governing this trait (the apospory-specific genomic region or ASGR). The lack of recombination also has been a major obstacle to positional cloning of key genes. To further delineate and characterize the nonrecombinant ASGR, we have identified eight new ASGR-linked, AFLP-based molecular markers, only one of which showed recombination with the trait for aposporous embryo sac development. Bacterial artificial chromosome (BAC) clones identified with the ASGR-linked AFLPs or previously mapped markers, when mapped by fluorescence in situ hybridization in Pennisetum squamulatum and Cenchrus ciliaris, showed almost complete macrosynteny between the two apomictic grasses throughout the ASGR, although with an inverted order. A BAC identified with the recombinant AFLP marker mapped most proximal to the centromere of the ASGR-carrier chromosome in P. squamulatum but was not located on the ASGR-carrier chromosome in C. ciliaris. Exceptional regions where synteny was disrupted probably are nonessential for expression of the aposporous trait. The ASGR appears to be maintained as a haplotype even though its position in the genome can be variable. PMID:16547108

  11. Genomic prediction and genome-wide association analysis of female longevity in a composite beef cattle breed

    USDA-ARS?s Scientific Manuscript database

    Longevity is a highly important trait to the efficiency of beef cattle production. The objective of this study was to evaluate the genomic prediction of longevity and identify genomic regions associated with this trait. The data used in this study consisted of 547 Composite Gene Combination (CGC) c...

  12. Genomic characterization of biliary tract cancers identifies driver genes and predisposing mutations.

    PubMed

    Wardell, Christopher P; Fujita, Masashi; Yamada, Toru; Simbolo, Michele; Fassan, Matteo; Karlic, Rosa; Polak, Paz; Kim, Jaegil; Hatanaka, Yutaka; Maejima, Kazuhiro; Lawlor, Rita T; Nakanishi, Yoshitsugu; Mitsuhashi, Tomoko; Fujimoto, Akihiro; Furuta, Mayuko; Ruzzenente, Andrea; Conci, Simone; Oosawa, Ayako; Sasaki-Oku, Aya; Nakano, Kaoru; Tanaka, Hiroko; Yamamoto, Yujiro; Michiaki, Kubo; Kawakami, Yoshiiku; Aikata, Hiroshi; Ueno, Masaki; Hayami, Shinya; Gotoh, Kunihito; Ariizumi, Shun-Ichi; Yamamoto, Masakazu; Yamaue, Hiroki; Chayama, Kazuaki; Miyano, Satoru; Getz, Gad; Scarpa, Aldo; Hirano, Satoshi; Nakamura, Toru; Nakagawa, Hidewaki

    2018-05-01

    Biliary tract cancers (BTCs) are clinically and pathologically heterogeneous and respond poorly to treatment. Genomic profiling can offer a clearer understanding of their carcinogenesis, classification and treatment strategy. We performed large-scale genome sequencing analyses on BTCs to investigate their somatic and germline driver events and characterize their genomic landscape. We analyzed 412 BTC samples from Japanese and Italian populations, 107 by whole-exome sequencing (WES), 39 by whole-genome sequencing (WGS), and a further 266 samples by targeted sequencing. The subtypes were 136 intrahepatic cholangiocarcinomas (ICCs), 101 distal cholangiocarcinomas (DCCs), 109 peri-hilar type cholangiocarcinomas (PHCs), and 66 gallbladder or cystic duct cancers (GBCs/CDCs). We identified somatic alterations and searched for driver genes in BTCs, finding pathogenic germline variants of cancer-predisposing genes. We predicted cell-of-origin for BTCs by combining somatic mutation patterns and epigenetic features. We identified 32 significantly and commonly mutated genes including TP53, KRAS, SMAD4, NF1, ARID1A, PBRM1, and ATR, some of which negatively affected patient prognosis. A novel deletion of MUC17 at 7q22.1 affected patient prognosis. Cell-of-origin predictions using WGS and epigenetic features suggest hepatocyte-origin of hepatitis-related ICCs. Deleterious germline mutations of cancer-predisposing genes such as BRCA1, BRCA2, RAD51D, MLH1, or MSH2 were detected in 11% (16/146) of BTC patients. BTCs have distinct genetic features including somatic events and germline predisposition. These findings could be useful to establish treatment and diagnostic strategies for BTCs based on genetic information. We here analyzed genomic features of 412 BTC samples from Japanese and Italian populations. A total of 32 significantly and commonly mutated genes were identified, some of which negatively affected patient prognosis, including a novel deletion of MUC17 at 7q22.1. Cell

  13. Genome-wide association analysis identifies six new loci associated with forced vital capacity.

    PubMed

    Loth, Daan W; Soler Artigas, María; Gharib, Sina A; Wain, Louise V; Franceschini, Nora; Koch, Beate; Pottinger, Tess D; Smith, Albert Vernon; Duan, Qing; Oldmeadow, Chris; Lee, Mi Kyeong; Strachan, David P; James, Alan L; Huffman, Jennifer E; Vitart, Veronique; Ramasamy, Adaikalavan; Wareham, Nicholas J; Kaprio, Jaakko; Wang, Xin-Qun; Trochet, Holly; Kähönen, Mika; Flexeder, Claudia; Albrecht, Eva; Lopez, Lorna M; de Jong, Kim; Thyagarajan, Bharat; Alves, Alexessander Couto; Enroth, Stefan; Omenaas, Ernst; Joshi, Peter K; Fall, Tove; Viñuela, Ana; Launer, Lenore J; Loehr, Laura R; Fornage, Myriam; Li, Guo; Wilk, Jemma B; Tang, Wenbo; Manichaikul, Ani; Lahousse, Lies; Harris, Tamara B; North, Kari E; Rudnicka, Alicja R; Hui, Jennie; Gu, Xiangjun; Lumley, Thomas; Wright, Alan F; Hastie, Nicholas D; Campbell, Susan; Kumar, Rajesh; Pin, Isabelle; Scott, Robert A; Pietiläinen, Kirsi H; Surakka, Ida; Liu, Yongmei; Holliday, Elizabeth G; Schulz, Holger; Heinrich, Joachim; Davies, Gail; Vonk, Judith M; Wojczynski, Mary; Pouta, Anneli; Johansson, Asa; Wild, Sarah H; Ingelsson, Erik; Rivadeneira, Fernando; Völzke, Henry; Hysi, Pirro G; Eiriksdottir, Gudny; Morrison, Alanna C; Rotter, Jerome I; Gao, Wei; Postma, Dirkje S; White, Wendy B; Rich, Stephen S; Hofman, Albert; Aspelund, Thor; Couper, David; Smith, Lewis J; Psaty, Bruce M; Lohman, Kurt; Burchard, Esteban G; Uitterlinden, André G; Garcia, Melissa; Joubert, Bonnie R; McArdle, Wendy L; Musk, A Bill; Hansel, Nadia; Heckbert, Susan R; Zgaga, Lina; van Meurs, Joyce B J; Navarro, Pau; Rudan, Igor; Oh, Yeon-Mok; Redline, Susan; Jarvis, Deborah L; Zhao, Jing Hua; Rantanen, Taina; O'Connor, George T; Ripatti, Samuli; Scott, Rodney J; Karrasch, Stefan; Grallert, Harald; Gaddis, Nathan C; Starr, John M; Wijmenga, Cisca; Minster, Ryan L; Lederer, David J; Pekkanen, Juha; Gyllensten, Ulf; Campbell, Harry; Morris, Andrew P; Gläser, Sven; Hammond, Christopher J; Burkart, Kristin M; Beilby, John; Kritchevsky, Stephen B; Gudnason, Vilmundur; Hancock, Dana B; Williams, O Dale; Polasek, Ozren; Zemunik, Tatijana; Kolcic, Ivana; Petrini, Marcy F; Wjst, Matthias; Kim, Woo Jin; Porteous, David J; Scotland, Generation; Smith, Blair H; Viljanen, Anne; Heliövaara, Markku; Attia, John R; Sayers, Ian; Hampel, Regina; Gieger, Christian; Deary, Ian J; Boezen, H Marike; Newman, Anne; Jarvelin, Marjo-Riitta; Wilson, James F; Lind, Lars; Stricker, Bruno H; Teumer, Alexander; Spector, Timothy D; Melén, Erik; Peters, Marjolein J; Lange, Leslie A; Barr, R Graham; Bracke, Ken R; Verhamme, Fien M; Sung, Joohon; Hiemstra, Pieter S; Cassano, Patricia A; Sood, Akshay; Hayward, Caroline; Dupuis, Josée; Hall, Ian P; Brusselle, Guy G; Tobin, Martin D; London, Stephanie J

    2014-07-01

    Forced vital capacity (FVC), a spirometric measure of pulmonary function, reflects lung volume and is used to diagnose and monitor lung diseases. We performed genome-wide association study meta-analysis of FVC in 52,253 individuals from 26 studies and followed up the top associations in 32,917 additional individuals of European ancestry. We found six new regions associated at genome-wide significance (P < 5 × 10(-8)) with FVC in or near EFEMP1, BMP6, MIR129-2-HSD17B12, PRDM11, WWOX and KCNJ2. Two loci previously associated with spirometric measures (GSTCD and PTCH1) were related to FVC. Newly implicated regions were followed up in samples from African-American, Korean, Chinese and Hispanic individuals. We detected transcripts for all six newly implicated genes in human lung tissue. The new loci may inform mechanisms involved in lung development and the pathogenesis of restrictive lung disease.

  14. Combining genomic selection and gene identification for crop improvement

    USDA-ARS?s Scientific Manuscript database

    The use of genetic information to predict the value of individuals in plant breeding populations began about 40 years ago. The original paradigm was to identify genomic regions with outsize influence on a trait of economic value, then to use markers in that genomic region to select individuals carry...

  15. Ulcerative colitis loci on chromosomes 1p36 and 12q15 identified by genome-wide association study

    PubMed Central

    Silverberg, Mark S.; Cho, Judy H.; Rioux, John D.; McGovern, Dermot P.B.; Wu, Jing; Annese, Vito; Achkar, Jean-Paul; Goyette, Philippe; Scott, Regan; Xu, Wei; Barmada, M. Michael; Klei, Lambertus; Daly, Mark J.; Abraham, Clara; Bayless, Theodore M.; Bossa, Fabrizio; Griffiths, Anne M.; Ippoliti, Andrew F.; Lahaie, Raymond G.; Latiano, Anna; Paré, Pierre; Proctor, Deborah D.; Regueiro, Miguel D.; Steinhart, A. Hillary; Targan, Stephan R.; Schumm, L. Philip; Kistner, Emily O.; Lee, Annette T.; Gregersen, Peter K.; Rotter, Jerome I.; Brant, Steven R.; Taylor, Kent D.; Roeder, Kathryn; Duerr, Richard H.

    2008-01-01

    Ulcerative colitis is a chronic inflammatory disease of the colon that presents as diarrhea and gastrointestinal bleeding. We performed a genome-wide association study using DNA samples from 1,052 individuals with ulcerative colitis and pre-existing data from 2,571 controls, all of European ancestry. In an analysis that controlled for gender and population structure, ulcerative colitis loci attaining genome-wide significance and subsequent replication in two independent populations were identified on chromosomes 1p36 (rs6426833, combined P = 5.1×10−13, combined OR = 0.73) and 12q15 (rs1558744, combined P = 2.5×10−12, combined OR = 1.35). In addition, combined genome-wide significant evidence for association was found in a region spanning BTNL2 to HLA-DQB1 on chromosome 6p21 (rs2395185, combined P = 1.0×10−16, combined OR = 0.66) and at the IL23R locus on chromosome 1p31 (rs11209026, combined P = 1.3×10−8, combined OR = 0.56; rs10889677, combined P = 1.3×10−8, combined OR = 1.29). PMID:19122664

  16. Can we use genetic and genomic approaches to identify candidate animals for targeted selective treatment.

    PubMed

    Laurenson, Yan C S M; Kyriazakis, Ilias; Bishop, Stephen C

    2013-10-18

    Estimated breeding values (EBV) for faecal egg count (FEC) and genetic markers for host resistance to nematodes may be used to identify resistant animals for selective breeding programmes. Similarly, targeted selective treatment (TST) requires the ability to identify the animals that will benefit most from anthelmintic treatment. A mathematical model was used to combine the concepts and evaluate the potential of using genetic-based methods to identify animals for a TST regime. EBVs obtained by genomic prediction were predicted to be the best determinant criterion for TST in terms of the impact on average empty body weight and average FEC, whereas pedigree-based EBVs for FEC were predicted to be marginally worse than using phenotypic FEC as a determinant criterion. Whilst each method has financial implications, if the identification of host resistance is incorporated into a wider genomic selection indices or selective breeding programmes, then genetic or genomic information may be plausibly included in TST regimes. Copyright © 2013 Elsevier B.V. All rights reserved.

  17. Genome-wide association study of 40,000 individuals identifies two novel loci associated with bipolar disorder

    PubMed Central

    Hou, Liping; Bergen, Sarah E.; Akula, Nirmala; Song, Jie; Hultman, Christina M.; Landén, Mikael; Adli, Mazda; Alda, Martin; Ardau, Raffaella; Arias, Bárbara; Aubry, Jean-Michel; Backlund, Lena; Badner, Judith A.; Barrett, Thomas B.; Bauer, Michael; Baune, Bernhard T.; Bellivier, Frank; Benabarre, Antonio; Bengesser, Susanne; Berrettini, Wade H.; Bhattacharjee, Abesh Kumar; Biernacka, Joanna M.; Birner, Armin; Bloss, Cinnamon S.; Brichant-Petitjean, Clara; Bui, Elise T.; Byerley, William; Cervantes, Pablo; Chillotti, Caterina; Cichon, Sven; Colom, Francesc; Coryell, William; Craig, David W.; Cruceanu, Cristiana; Czerski, Piotr M.; Davis, Tony; Dayer, Alexandre; Degenhardt, Franziska; Del Zompo, Maria; DePaulo, J. Raymond; Edenberg, Howard J.; Étain, Bruno; Falkai, Peter; Foroud, Tatiana; Forstner, Andreas J.; Frisén, Louise; Frye, Mark A.; Fullerton, Janice M.; Gard, Sébastien; Garnham, Julie S.; Gershon, Elliot S.; Goes, Fernando S.; Greenwood, Tiffany A.; Grigoroiu-Serbanescu, Maria; Hauser, Joanna; Heilbronner, Urs; Heilmann-Heimbach, Stefanie; Herms, Stefan; Hipolito, Maria; Hitturlingappa, Shashi; Hoffmann, Per; Hofmann, Andrea; Jamain, Stephane; Jiménez, Esther; Kahn, Jean-Pierre; Kassem, Layla; Kelsoe, John R.; Kittel-Schneider, Sarah; Kliwicki, Sebastian; Koller, Daniel L.; König, Barbara; Lackner, Nina; Laje, Gonzalo; Lang, Maren; Lavebratt, Catharina; Lawson, William B.; Leboyer, Marion; Leckband, Susan G.; Liu, Chunyu; Maaser, Anna; Mahon, Pamela B.; Maier, Wolfgang; Maj, Mario; Manchia, Mirko; Martinsson, Lina; McCarthy, Michael J.; McElroy, Susan L.; McInnis, Melvin G.; McKinney, Rebecca; Mitchell, Philip B.; Mitjans, Marina; Mondimore, Francis M.; Monteleone, Palmiero; Mühleisen, Thomas W.; Nievergelt, Caroline M.; Nöthen, Markus M.; Novák, Tomas; Nurnberger, John I.; Nwulia, Evaristus A.; Ösby, Urban; Pfennig, Andrea; Potash, James B.; Propping, Peter; Reif, Andreas; Reininghaus, Eva; Rice, John; Rietschel, Marcella; Rouleau, Guy A.; Rybakowski, Janusz K.; Schalling, Martin; Scheftner, William A.; Schofield, Peter R.; Schork, Nicholas J.; Schulze, Thomas G.; Schumacher, Johannes; Schweizer, Barbara W.; Severino, Giovanni; Shekhtman, Tatyana; Shilling, Paul D.; Simhandl, Christian; Slaney, Claire M.; Smith, Erin N.; Squassina, Alessio; Stamm, Thomas; Stopkova, Pavla; Streit, Fabian; Strohmaier, Jana; Szelinger, Szabolcs; Tighe, Sarah K.; Tortorella, Alfonso; Turecki, Gustavo; Vieta, Eduard; Volkert, Julia; Witt, Stephanie H.; Wright, Adam; Zandi, Peter P.; Zhang, Peng; Zollner, Sebastian; McMahon, Francis J.

    2016-01-01

    Bipolar disorder (BD) is a genetically complex mental illness characterized by severe oscillations of mood and behaviour. Genome-wide association studies (GWAS) have identified several risk loci that together account for a small portion of the heritability. To identify additional risk loci, we performed a two-stage meta-analysis of >9 million genetic variants in 9,784 bipolar disorder patients and 30,471 controls, the largest GWAS of BD to date. In this study, to increase power we used ∼2,000 lithium-treated cases with a long-term diagnosis of BD from the Consortium on Lithium Genetics, excess controls, and analytic methods optimized for markers on the X-chromosome. In addition to four known loci, results revealed genome-wide significant associations at two novel loci: an intergenic region on 9p21.3 (rs12553324, P =  5.87 × 10 − 9; odds ratio (OR) = 1.12) and markers within ERBB2 (rs2517959, P =  4.53 × 10 − 9; OR = 1.13). No significant X-chromosome associations were detected and X-linked markers explained very little BD heritability. The results add to a growing list of common autosomal variants involved in BD and illustrate the power of comparing well-characterized cases to an excess of controls in GWAS. PMID:27329760

  18. The database of chromosome imbalance regions and genes resided in lung cancer from Asian and Caucasian identified by array-comparative genomic hybridization

    PubMed Central

    2012-01-01

    Background Cancer-related genes show racial differences. Therefore, identification and characterization of DNA copy number alteration regions in different racial groups helps to dissect the mechanism of tumorigenesis. Methods Array-comparative genomic hybridization (array-CGH) was analyzed for DNA copy number profile in 40 Asian and 20 Caucasian lung cancer patients. Three methods including MetaCore analysis for disease and pathway correlations, concordance analysis between array-CGH database and the expression array database, and literature search for copy number variation genes were performed to select novel lung cancer candidate genes. Four candidate oncogenes were validated for DNA copy number and mRNA and protein expression by quantitative polymerase chain reaction (qPCR), chromogenic in situ hybridization (CISH), reverse transcriptase-qPCR (RT-qPCR), and immunohistochemistry (IHC) in more patients. Results We identified 20 chromosomal imbalance regions harboring 459 genes for Caucasian and 17 regions containing 476 genes for Asian lung cancer patients. Seven common chromosomal imbalance regions harboring 117 genes, included gain on 3p13-14, 6p22.1, 9q21.13, 13q14.1, and 17p13.3; and loss on 3p22.2-22.3 and 13q13.3 were found both in Asian and Caucasian patients. Gene validation for four genes including ARHGAP19 (10q24.1) functioning in Rho activity control, FRAT2 (10q24.1) involved in Wnt signaling, PAFAH1B1 (17p13.3) functioning in motility control, and ZNF322A (6p22.1) involved in MAPK signaling was performed using qPCR and RT-qPCR. Mean gene dosage and mRNA expression level of the four candidate genes in tumor tissues were significantly higher than the corresponding normal tissues (P<0.001~P=0.06). In addition, CISH analysis of patients indicated that copy number amplification indeed occurred for ARHGAP19 and ZNF322A genes in lung cancer patients. IHC analysis of paraffin blocks from Asian Caucasian patients demonstrated that the frequency of PAFAH1B1

  19. An integrative strategy to identify the entire protein coding potential of prokaryotic genomes by proteogenomics.

    PubMed

    Omasits, Ulrich; Varadarajan, Adithi R; Schmid, Michael; Goetze, Sandra; Melidis, Damianos; Bourqui, Marc; Nikolayeva, Olga; Québatte, Maxime; Patrignani, Andrea; Dehio, Christoph; Frey, Juerg E; Robinson, Mark D; Wollscheid, Bernd; Ahrens, Christian H

    2017-12-01

    Accurate annotation of all protein-coding sequences (CDSs) is an essential prerequisite to fully exploit the rapidly growing repertoire of completely sequenced prokaryotic genomes. However, large discrepancies among the number of CDSs annotated by different resources, missed functional short open reading frames (sORFs), and overprediction of spurious ORFs represent serious limitations. Our strategy toward accurate and complete genome annotation consolidates CDSs from multiple reference annotation resources, ab initio gene prediction algorithms and in silico ORFs (a modified six-frame translation considering alternative start codons) in an integrated proteogenomics database (iPtgxDB) that covers the entire protein-coding potential of a prokaryotic genome. By extending the PeptideClassifier concept of unambiguous peptides for prokaryotes, close to 95% of the identifiable peptides imply one distinct protein, largely simplifying downstream analysis. Searching a comprehensive Bartonella henselae proteomics data set against such an iPtgxDB allowed us to unambiguously identify novel ORFs uniquely predicted by each resource, including lipoproteins, differentially expressed and membrane-localized proteins, novel start sites and wrongly annotated pseudogenes. Most novelties were confirmed by targeted, parallel reaction monitoring mass spectrometry, including unique ORFs and single amino acid variations (SAAVs) identified in a re-sequenced laboratory strain that are not present in its reference genome. We demonstrate the general applicability of our strategy for genomes with varying GC content and distinct taxonomic origin. We release iPtgxDBs for B. henselae , Bradyrhizobium diazoefficiens and Escherichia coli and the software to generate both proteogenomics search databases and integrated annotation files that can be viewed in a genome browser for any prokaryote. © 2017 Omasits et al.; Published by Cold Spring Harbor Laboratory Press.

  20. Screen and clean: a tool for identifying interactions in genome-wide association studies.

    PubMed

    Wu, Jing; Devlin, Bernie; Ringquist, Steven; Trucco, Massimo; Roeder, Kathryn

    2010-04-01

    Epistasis could be an important source of risk for disease. How interacting loci might be discovered is an open question for genome-wide association studies (GWAS). Most researchers limit their statistical analyses to testing individual pairwise interactions (i.e., marginal tests for association). A more effective means of identifying important predictors is to fit models that include many predictors simultaneously (i.e., higher-dimensional models). We explore a procedure called screen and clean (SC) for identifying liability loci, including interactions, by using the lasso procedure, which is a model selection tool for high-dimensional regression. We approach the problem by using a varying dictionary consisting of terms to include in the model. In the first step the lasso dictionary includes only main effects. The most promising single-nucleotide polymorphisms (SNPs) are identified using a screening procedure. Next the lasso dictionary is adjusted to include these main effects and the corresponding interaction terms. Again, promising terms are identified using lasso screening. Then significant terms are identified through the cleaning process. Implementation of SC for GWAS requires algorithms to explore the complex model space induced by the many SNPs genotyped and their interactions. We propose and explore a set of algorithms and find that SC successfully controls Type I error while yielding good power to identify risk loci and their interactions. When the method is applied to data obtained from the Wellcome Trust Case Control Consortium study of Type 1 Diabetes it uncovers evidence supporting interaction within the HLA class II region as well as within Chromosome 12q24.

  1. Draft genome sequence of bitter gourd (Momordica charantia), a vegetable and medicinal plant in tropical and subtropical regions.

    PubMed

    Urasaki, Naoya; Takagi, Hiroki; Natsume, Satoshi; Uemura, Aiko; Taniai, Naoki; Miyagi, Norimichi; Fukushima, Mai; Suzuki, Shouta; Tarora, Kazuhiko; Tamaki, Moritoshi; Sakamoto, Moriaki; Terauchi, Ryohei; Matsumura, Hideo

    2017-02-01

    Bitter gourd (Momordica charantia) is an important vegetable and medicinal plant in tropical and subtropical regions globally. In this study, the draft genome sequence of a monoecious bitter gourd inbred line, OHB3-1, was analyzed. Through Illumina sequencing and de novo assembly, scaffolds of 285.5 Mb in length were generated, corresponding to ∼84% of the estimated genome size of bitter gourd (339 Mb). In this draft genome sequence, 45,859 protein-coding gene loci were identified, and transposable elements accounted for 15.3% of the whole genome. According to synteny mapping and phylogenetic analysis of conserved genes, bitter gourd was more related to watermelon (Citrullus lanatus) than to cucumber (Cucumis sativus) or melon (C. melo). Using RAD-seq analysis, 1507 marker loci were genotyped in an F2 progeny of two bitter gourd lines, resulting in an improved linkage map, comprising 11 linkage groups. By anchoring RAD tag markers, 255 scaffolds were assigned to the linkage map. Comparative analysis of genome sequences and predicted genes determined that putative trypsin-inhibitor and ribosome-inactivating genes were distinctive in the bitter gourd genome. These genes could characterize the bitter gourd as a medicinal plant. © The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  2. Complete mitochondrial genome of the frillneck lizard (Chlamydosaurus kingii, Reptilia; Agamidae), another squamate with two control regions.

    PubMed

    Ujvari, Beata; Madsen, Thomas

    2008-10-01

    Using PCR, the complete mitochondrial genome was sequenced in three frillneck lizards (Chlamydosaurus kingii). The mitochondria spanned over 16,761bp. As in other vertebrates, two rRNA genes, 22 tRNA genes and 13 protein coding genes were identified. However, similar to some other squamate reptiles, two control regions (CRI and CRII) were identified, spanning 801 and 812 bp, respectively. Our results were compared with another Australian member of the family Agamidae, the bearded dragon (Pogana vitticeps). The overall base composition of the light-strand sequence largely mirrored that observed in P vitticeps. Furthermore, similar to P. vitticeps, we observed an insertion 801 bp long between the ND5 and ND6 genes. However, in contrast to P vitticeps we did not observe a conserved sequence block III region. Based on a comparison among the three frillneck lizards, we also present data on the proportion of variable sites within the major mitochondrial regions.

  3. Legume genome evolution viewed through the Medicago truncatula and Lotus japonicus genomes

    PubMed Central

    Cannon, Steven B.; Sterck, Lieven; Rombauts, Stephane; Sato, Shusei; Cheung, Foo; Gouzy, Jérôme; Wang, Xiaohong; Mudge, Joann; Vasdewani, Jayprakash; Schiex, Thomas; Spannagl, Manuel; Monaghan, Erin; Nicholson, Christine; Humphray, Sean J.; Schoof, Heiko; Mayer, Klaus F. X.; Rogers, Jane; Quétier, Francis; Oldroyd, Giles E.; Debellé, Frédéric; Cook, Douglas R.; Retzel, Ernest F.; Roe, Bruce A.; Town, Christopher D.; Tabata, Satoshi; Van de Peer, Yves; Young, Nevin D.

    2006-01-01

    Genome sequencing of the model legumes, Medicago truncatula and Lotus japonicus, provides an opportunity for large-scale sequence-based comparison of two genomes in the same plant family. Here we report synteny comparisons between these species, including details about chromosome relationships, large-scale synteny blocks, microsynteny within blocks, and genome regions lacking clear correspondence. The Lotus and Medicago genomes share a minimum of 10 large-scale synteny blocks, each with substantial collinearity and frequently extending the length of whole chromosome arms. The proportion of genes syntenic and collinear within each synteny block is relatively homogeneous. Medicago–Lotus comparisons also indicate similar and largely homogeneous gene densities, although gene-containing regions in Mt occupy 20–30% more space than Lj counterparts, primarily because of larger numbers of Mt retrotransposons. Because the interpretation of genome comparisons is complicated by large-scale genome duplications, we describe synteny, synonymous substitutions and phylogenetic analyses to identify and date a probable whole-genome duplication event. There is no direct evidence for any recent large-scale genome duplication in either Medicago or Lotus but instead a duplication predating speciation. Phylogenetic comparisons place this duplication within the Rosid I clade, clearly after the split between legumes and Salicaceae (poplar). PMID:17003129

  4. Genome wide profiling in oral squamous cell carcinoma identifies a four genetic marker signature of prognostic significance

    PubMed Central

    Vincent-Chong, Vui King; Salahshourifar, Iman; Woo, Kar Mun; Anwar, Arif; Razali, Rozaimi; Gudimella, Ranganath; Rahman, Zainal Ariff Abdul; Ismail, Siti Mazlipah; Kallarakkal, Thomas George; Ramanathan, Anand; Wan Mustafa, Wan Mahadzir; Abraham, Mannil Thomas; Tay, Keng Kiong; Zain, Rosnah Binti

    2017-01-01

    Background Cancers of the oral cavity are primarily oral squamous cell carcinomas (OSCCs). Many of the OSCCs present at late stages with an exceptionally poor prognosis. A probable limitation in management of patients with OSCC lies in the insufficient knowledge pertaining to the linkage between copy number alterations in OSCC and oral tumourigenesis thereby resulting in an inability to deliver targeted therapy. Objectives The current study aimed to identify copy number alterations (CNAs) in OSCC using array comparative genomic hybridization (array CGH) and to correlate the CNAs with clinico-pathologic parameters and clinical outcomes. Materials and methods Using array CGH, genome-wide profiling was performed on 75 OSCCs. Selected genes that were harboured in the frequently amplified and deleted regions were validated using quantitative polymerase chain reaction (qPCR). Thereafter, pathway and network functional analysis were carried out using Ingenuity Pathway Analysis (IPA) software. Results Multiple chromosomal regions including 3q, 5p, 7p, 8q, 9p, 10p, 11q were frequently amplified, while 3p and 8p chromosomal regions were frequently deleted. These findings were in confirmation with our previous study using ultra-dense array CGH. In addition, amplification of 8q, 11q, 7p and 9p and deletion of 8p chromosomal regions showed a significant correlation with clinico-pathologic parameters such as the size of the tumour, metastatic lymph nodes and pathological staging. Co-amplification of 7p, 8q, 9p and 11q regions that harbored amplified genes namely CCND1, EGFR, TPM2 and LRP12 respectively, when combined, continues to be an independent prognostic factor in OSCC. Conclusion Amplification of 3q, 5p, 7p, 8q, 9p, 10p, 11q and deletion of 3p and 8p chromosomal regions were recurrent among OSCC patients. Co-alteration of 7p, 8q, 9p and 11q was found to be associated with clinico-pathologic parameters and poor survival. These regions contain genes that play critical roles

  5. Comparative genomics identifies candidate genes for infectious salmon anemia (ISA) resistance in Atlantic salmon (Salmo salar).

    PubMed

    Li, Jieying; Boroevich, Keith A; Koop, Ben F; Davidson, William S

    2011-04-01

    Infectious salmon anemia (ISA) has been described as the hoof and mouth disease of salmon farming. ISA is caused by a lethal and highly communicable virus, which can have a major impact on salmon aquaculture, as demonstrated by an outbreak in Chile in 2007. A quantitative trait locus (QTL) for ISA resistance has been mapped to three microsatellite markers on linkage group (LG) 8 (Chr 15) on the Atlantic salmon genetic map. We identified bacterial artificial chromosome (BAC) clones and three fingerprint contigs from the Atlantic salmon physical map that contains these markers. We made use of the extensive BAC end sequence database to extend these contigs by chromosome walking and identified additional two markers in this region. The BAC end sequences were used to search for conserved synteny between this segment of LG8 and the fish genomes that have been sequenced. An examination of the genes in the syntenic segments of the tetraodon and medaka genomes identified candidates for association with ISA resistance in Atlantic salmon based on differential expression profiles from ISA challenges or on the putative biological functions of the proteins they encode. One gene in particular, HIV-EP2/MBP-2, caught our attention as it may influence the expression of several genes that have been implicated in the response to infection by infectious salmon anemia virus (ISAV). Therefore, we suggest that HIV-EP2/MBP-2 is a very strong candidate for the gene associated with the ISAV resistance QTL in Atlantic salmon and is worthy of further study.

  6. Identification of novel RNA secondary structures within the hepatitis C virus genome reveals a cooperative involvement in genome packaging

    PubMed Central

    Stewart, H.; Bingham, R.J.; White, S. J.; Dykeman, E. C.; Zothner, C.; Tuplin, A. K.; Stockley, P. G.; Twarock, R.; Harris, M.

    2016-01-01

    The specific packaging of the hepatitis C virus (HCV) genome is hypothesised to be driven by Core-RNA interactions. To identify the regions of the viral genome involved in this process, we used SELEX (systematic evolution of ligands by exponential enrichment) to identify RNA aptamers which bind specifically to Core in vitro. Comparison of these aptamers to multiple HCV genomes revealed the presence of a conserved terminal loop motif within short RNA stem-loop structures. We postulated that interactions of these motifs, as well as sub-motifs which were present in HCV genomes at statistically significant levels, with the Core protein may drive virion assembly. We mutated 8 of these predicted motifs within the HCV infectious molecular clone JFH-1, thereby producing a range of mutant viruses predicted to possess altered RNA secondary structures. RNA replication and viral titre were unaltered in viruses possessing only one mutated structure. However, infectivity titres were decreased in viruses possessing a higher number of mutated regions. This work thus identified multiple novel RNA motifs which appear to contribute to genome packaging. We suggest that these structures act as cooperative packaging signals to drive specific RNA encapsidation during HCV assembly. PMID:26972799

  7. Genome-Wide siRNA-Based Functional Genomics of Pigmentation Identifies Novel Genes and Pathways That Impact Melanogenesis in Human Cells

    PubMed Central

    Bodemann, Brian; Petersen, Sean; Aruri, Jayavani; Koshy, Shiney; Richardson, Zachary; Le, Lu Q.; Krasieva, Tatiana; Roth, Michael G.; Farmer, Pat; White, Michael A.

    2008-01-01

    Melanin protects the skin and eyes from the harmful effects of UV irradiation, protects neural cells from toxic insults, and is required for sound conduction in the inner ear. Aberrant regulation of melanogenesis underlies skin disorders (melasma and vitiligo), neurologic disorders (Parkinson's disease), auditory disorders (Waardenburg's syndrome), and opthalmologic disorders (age related macular degeneration). Much of the core synthetic machinery driving melanin production has been identified; however, the spectrum of gene products participating in melanogenesis in different physiological niches is poorly understood. Functional genomics based on RNA-mediated interference (RNAi) provides the opportunity to derive unbiased comprehensive collections of pharmaceutically tractable single gene targets supporting melanin production. In this study, we have combined a high-throughput, cell-based, one-well/one-gene screening platform with a genome-wide arrayed synthetic library of chemically synthesized, small interfering RNAs to identify novel biological pathways that govern melanin biogenesis in human melanocytes. Ninety-two novel genes that support pigment production were identified with a low false discovery rate. Secondary validation and preliminary mechanistic studies identified a large panel of targets that converge on tyrosinase expression and stability. Small molecule inhibition of a family of gene products in this class was sufficient to impair chronic tyrosinase expression in pigmented melanoma cells and UV-induced tyrosinase expression in primary melanocytes. Isolation of molecular machinery known to support autophagosome biosynthesis from this screen, together with in vitro and in vivo validation, exposed a close functional relationship between melanogenesis and autophagy. In summary, these studies illustrate the power of RNAi-based functional genomics to identify novel genes, pathways, and pharmacologic agents that impact a biological phenotype and operate

  8. Genome-Wide Association Study Identifies Novel Loci Associated With Diisocyanate-Induced Occupational Asthma

    PubMed Central

    Yucesoy, Berran; Kaufman, Kenneth M.; Lummus, Zana L.; Weirauch, Matthew T.; Zhang, Ge; Cartier, André; Boulet, Louis-Philippe; Sastre, Joaquin; Quirce, Santiago; Tarlo, Susan M.; Cruz, Maria-Jesus; Munoz, Xavier; Harley, John B.; Bernstein, David I.

    2015-01-01

    Diisocyanates, reactive chemicals used to produce polyurethane products, are the most common causes of occupational asthma. The aim of this study is to identify susceptibility gene variants that could contribute to the pathogenesis of diisocyanate asthma (DA) using a Genome-Wide Association Study (GWAS) approach. Genome-wide single nucleotide polymorphism (SNP) genotyping was performed in 74 diisocyanate-exposed workers with DA and 824 healthy controls using Omni-2.5 and Omni-5 SNP microarrays. We identified 11 SNPs that exceeded genome-wide significance; the strongest association was for the rs12913832 SNP located on chromosome 15, which has been mapped to the HERC2 gene (p = 6.94 × 10−14). Strong associations were also found for SNPs near the ODZ3 and CDH17 genes on chromosomes 4 and 8 (rs908084, p = 8.59 × 10−9 and rs2514805, p = 1.22 × 10−8, respectively). We also prioritized 38 SNPs with suggestive genome-wide significance (p < 1 × 10−6). Among them, 17 SNPs map to the PITPNC1, ACMSD, ZBTB16, ODZ3, and CDH17 gene loci. Functional genomics data indicate that 2 of the suggestive SNPs (rs2446823 and rs2446824) are located within putative binding sites for the CCAAT/Enhancer Binding Protein (CEBP) and Hepatocyte Nuclear Factor 4, Alpha transcription factors (TFs), respectively. This study identified SNPs mapping to the HERC2, CDH17, and ODZ3 genes as potential susceptibility loci for DA. Pathway analysis indicated that these genes are associated with antigen processing and presentation, and other immune pathways. Overlap of 2 suggestive SNPs with likely TF binding sites suggests possible roles in disruption of gene regulation. These results provide new insights into the genetic architecture of DA and serve as a basis for future functional and mechanistic studies. PMID:25918132

  9. Centromere-Like Regions in the Budding Yeast Genome

    PubMed Central

    Lefrançois, Philippe; Auerbach, Raymond K.; Yellman, Christopher M.; Roeder, G. Shirleen; Snyder, Michael

    2013-01-01

    Accurate chromosome segregation requires centromeres (CENs), the DNA sequences where kinetochores form, to attach chromosomes to microtubules. In contrast to most eukaryotes, which have broad centromeres, Saccharomyces cerevisiae possesses sequence-defined point CENs. Chromatin immunoprecipitation followed by sequencing (ChIP–Seq) reveals colocalization of four kinetochore proteins at novel, discrete, non-centromeric regions, especially when levels of the centromeric histone H3 variant, Cse4 (a.k.a. CENP-A or CenH3), are elevated. These regions of overlapping protein binding enhance the segregation of plasmids and chromosomes and have thus been termed Centromere-Like Regions (CLRs). CLRs form in close proximity to S. cerevisiae CENs and share characteristics typical of both point and regional CENs. CLR sequences are conserved among related budding yeasts. Many genomic features characteristic of CLRs are also associated with these conserved homologous sequences from closely related budding yeasts. These studies provide general and important insights into the origin and evolution of centromeres. PMID:23349633

  10. Genome-wide association study of intraocular pressure identifies the GLCCI1/ICA1 region as a glaucoma susceptibility locus

    PubMed Central

    Strange, Amy; Bellenguez, Céline; Sim, Xueling; Luben, Robert; Hysi, Pirro G.; Ramdas, Wishal D.; van Koolwijk, Leonieke M.E.; Freeman, Colin; Pirinen, Matti; Su, Zhan; Band, Gavin; Pearson, Richard; Vukcevic, Damjan; Langford, Cordelia; Deloukas, Panos; Hunt, Sarah; Gray, Emma; Dronov, Serge; Potter, Simon C.; Tashakkori-Ghanbaria, Avazeh; Edkins, Sarah; Bumpstead, Suzannah J.; Blackwell, Jenefer M.; Bramon, Elvira; Brown, Matthew A.; Casas, Juan P.; Corvin, Aiden; Duncanson, Audrey; Jankowski, Janusz A.Z.; Markus, Hugh S.; Mathew, Christopher G.; Palmer, Colin N.A.; Plomin, Robert; Rautanen, Anna; Sawcer, Stephen J.; Trembath, Richard C.; Wood, Nicholas W.; Barroso, Ines; Peltonen, Leena; Healey, Paul; McGuffin, Peter; Topouzis, Fotis; Klaver, Caroline C.W.; van Duijn, Cornelia M.; Mackey, David A.; Young, Terri L.; Hammond, Christopher J.; Khaw, Kay-Tee; Wareham, Nick; Wang, Jie Jin; Wong, Tien Y.; Foster, Paul J.; Mitchell, Paul; Spencer, Chris C.A.; Donnelly, Peter; Viswanathan, Ananth C.

    2013-01-01

    To discover quantitative trait loci for intraocular pressure, a major risk factor for glaucoma and the only modifiable one, we performed a genome-wide association study on a discovery cohort of 2175 individuals from Sydney, Australia. We found a novel association between intraocular pressure and a common variant at 7p21 near to GLCCI1 and ICA1. The findings in this region were confirmed through two UK replication cohorts totalling 4866 individuals (rs59072263, Pcombined = 1.10 × 10−8). A copy of the G allele at this SNP is associated with an increase in mean IOP of 0.45 mmHg (95%CI = 0.30–0.61 mmHg). These results lend support to the implication of vesicle trafficking and glucocorticoid inducibility pathways in the determination of intraocular pressure and in the pathogenesis of primary open-angle glaucoma. PMID:23836780

  11. Genome-wide association identifies OBFC1 as a locus involved in human leukocyte telomere biology.

    PubMed

    Levy, Daniel; Neuhausen, Susan L; Hunt, Steven C; Kimura, Masayuki; Hwang, Shih-Jen; Chen, Wei; Bis, Joshua C; Fitzpatrick, Annette L; Smith, Erin; Johnson, Andrew D; Gardner, Jeffrey P; Srinivasan, Sathanur R; Schork, Nicholas; Rotter, Jerome I; Herbig, Utz; Psaty, Bruce M; Sastrasinh, Malinee; Murray, Sarah S; Vasan, Ramachandran S; Province, Michael A; Glazer, Nicole L; Lu, Xiaobin; Cao, Xiaojian; Kronmal, Richard; Mangino, Massimo; Soranzo, Nicole; Spector, Tim D; Berenson, Gerald S; Aviv, Abraham

    2010-05-18

    Telomeres are engaged in a host of cellular functions, and their length is regulated by multiple genes. Telomere shortening, in the course of somatic cell replication, ultimately leads to replicative senescence. In humans, rare mutations in genes that regulate telomere length have been identified in monogenic diseases such as dyskeratosis congenita and idiopathic pulmonary fibrosis, which are associated with shortened leukocyte telomere length (LTL) and increased risk for aplastic anemia. Shortened LTL is observed in a host of aging-related complex genetic diseases and is associated with diminished survival in the elderly. We report results of a genome-wide association study of LTL in a consortium of four observational studies (n = 3,417 participants with LTL and genome-wide genotyping). SNPs in the regions of the oligonucleotide/oligosaccharide-binding folds containing one gene (OBFC1; rs4387287; P = 3.9 x 10(-9)) and chemokine (C-X-C motif) receptor 4 gene (CXCR4; rs4452212; P = 2.9 x 10(-8)) were associated with LTL at a genome-wide significance level (P < 5 x 10(-8)). We attempted replication of the top SNPs at these loci through de novo genotyping of 1,893 additional individuals and in silico lookup in another observational study (n = 2,876), and we confirmed the association findings for OBFC1 but not CXCR4. In addition, we confirmed the telomerase RNA component (TERC) as a gene associated with LTL (P = 1.1 x 10(-5)). The identification of OBFC1 through genome-wide association as a locus for interindividual variation in LTL in the general population advances the understanding of telomere biology in humans and may provide insights into aging-related disorders linked to altered LTL dynamics.

  12. AbsIDconvert: An absolute approach for converting genetic identifiers at different granularities

    PubMed Central

    2012-01-01

    Background High-throughput molecular biology techniques yield vast amounts of data, often by detecting small portions of ribonucleotides corresponding to specific identifiers. Existing bioinformatic methodologies categorize and compare these elements using inferred descriptive annotation given this sequence information irrespective of the fact that it may not be representative of the identifier as a whole. Results All annotations, no matter the granularity, can be aligned to genomic sequences and therefore annotated by genomic intervals. We have developed AbsIDconvert, a methodology for converting between genomic identifiers by first mapping them onto a common universal coordinate system using an interval tree which is subsequently queried for overlapping identifiers. AbsIDconvert has many potential uses, including gene identifier conversion, identification of features within a genomic region, and cross-species comparisons. The utility is demonstrated in three case studies: 1) comparative genomic study mapping plasmodium gene sequences to corresponding human and mosquito transcriptional regions; 2) cross-species study of Incyte clone sequences; and 3) analysis of human Ensembl transcripts mapped by Affymetrix®; and Agilent microarray probes. AbsIDconvert currently supports ID conversion of 53 species for a given list of input identifiers, genomic sequence, or genome intervals. Conclusion AbsIDconvert provides an efficient and reliable mechanism for conversion between identifier domains of interest. The flexibility of this tool allows for custom definition identifier domains contingent upon the availability and determination of a genomic mapping interval. As the genomes and the sequences for genetic elements are further refined, this tool will become increasingly useful and accurate. AbsIDconvert is freely available as a web application or downloadable as a virtual machine at: http://bioinformatics.louisville.edu/abid/. PMID:22967011

  13. New genomic resources for switchgrass: a BAC library and comparative analysis of homoeologous genomic regions harboring bioenergy traits

    PubMed Central

    2011-01-01

    Background Switchgrass, a C4 species and a warm-season grass native to the prairies of North America, has been targeted for development into an herbaceous biomass fuel crop. Genetic improvement of switchgrass feedstock traits through marker-assisted breeding and biotechnology approaches calls for genomic tools development. Establishment of integrated physical and genetic maps for switchgrass will accelerate mapping of value added traits useful to breeding programs and to isolate important target genes using map based cloning. The reported polyploidy series in switchgrass ranges from diploid (2X = 18) to duodecaploid (12X = 108). Like in other large, repeat-rich plant genomes, this genomic complexity will hinder whole genome sequencing efforts. An extensive physical map providing enough information to resolve the homoeologous genomes would provide the necessary framework for accurate assembly of the switchgrass genome. Results A switchgrass BAC library constructed by partial digestion of nuclear DNA with EcoRI contains 147,456 clones covering the effective genome approximately 10 times based on a genome size of 3.2 Gigabases (~1.6 Gb effective). Restriction digestion and PFGE analysis of 234 randomly chosen BACs indicated that 95% of the clones contained inserts, ranging from 60 to 180 kb with an average of 120 kb. Comparative sequence analysis of two homoeologous genomic regions harboring orthologs of the rice OsBRI1 locus, a low-copy gene encoding a putative protein kinase and associated with biomass, revealed that orthologous clones from homoeologous chromosomes can be unambiguously distinguished from each other and correctly assembled to respective fingerprint contigs. Thus, the data obtained not only provide genomic resources for further analysis of switchgrass genome, but also improve efforts for an accurate genome sequencing strategy. Conclusions The construction of the first switchgrass BAC library and comparative analysis of homoeologous harboring OsBRI1

  14. New genomic resources for switchgrass: a BAC library and comparative analysis of homoeologous genomic regions harboring bioenergy traits.

    PubMed

    Saski, Christopher A; Li, Zhigang; Feltus, Frank A; Luo, Hong

    2011-07-18

    Switchgrass, a C4 species and a warm-season grass native to the prairies of North America, has been targeted for development into an herbaceous biomass fuel crop. Genetic improvement of switchgrass feedstock traits through marker-assisted breeding and biotechnology approaches calls for genomic tools development. Establishment of integrated physical and genetic maps for switchgrass will accelerate mapping of value added traits useful to breeding programs and to isolate important target genes using map based cloning. The reported polyploidy series in switchgrass ranges from diploid (2X = 18) to duodecaploid (12X = 108). Like in other large, repeat-rich plant genomes, this genomic complexity will hinder whole genome sequencing efforts. An extensive physical map providing enough information to resolve the homoeologous genomes would provide the necessary framework for accurate assembly of the switchgrass genome. A switchgrass BAC library constructed by partial digestion of nuclear DNA with EcoRI contains 147,456 clones covering the effective genome approximately 10 times based on a genome size of 3.2 Gigabases (~1.6 Gb effective). Restriction digestion and PFGE analysis of 234 randomly chosen BACs indicated that 95% of the clones contained inserts, ranging from 60 to 180 kb with an average of 120 kb. Comparative sequence analysis of two homoeologous genomic regions harboring orthologs of the rice OsBRI1 locus, a low-copy gene encoding a putative protein kinase and associated with biomass, revealed that orthologous clones from homoeologous chromosomes can be unambiguously distinguished from each other and correctly assembled to respective fingerprint contigs. Thus, the data obtained not only provide genomic resources for further analysis of switchgrass genome, but also improve efforts for an accurate genome sequencing strategy. The construction of the first switchgrass BAC library and comparative analysis of homoeologous harboring OsBRI1 orthologs present a glimpse into

  15. Genome wide association identifies common variants at the SERPINA6/SERPINA1 locus influencing plasma cortisol and corticosteroid binding globulin.

    PubMed

    Bolton, Jennifer L; Hayward, Caroline; Direk, Nese; Lewis, John G; Hammond, Geoffrey L; Hill, Lesley A; Anderson, Anna; Huffman, Jennifer; Wilson, James F; Campbell, Harry; Rudan, Igor; Wright, Alan; Hastie, Nicholas; Wild, Sarah H; Velders, Fleur P; Hofman, Albert; Uitterlinden, Andre G; Lahti, Jari; Räikkönen, Katri; Kajantie, Eero; Widen, Elisabeth; Palotie, Aarno; Eriksson, Johan G; Kaakinen, Marika; Järvelin, Marjo-Riitta; Timpson, Nicholas J; Davey Smith, George; Ring, Susan M; Evans, David M; St Pourcain, Beate; Tanaka, Toshiko; Milaneschi, Yuri; Bandinelli, Stefania; Ferrucci, Luigi; van der Harst, Pim; Rosmalen, Judith G M; Bakker, Stephen J L; Verweij, Niek; Dullaart, Robin P F; Mahajan, Anubha; Lindgren, Cecilia M; Morris, Andrew; Lind, Lars; Ingelsson, Erik; Anderson, Laura N; Pennell, Craig E; Lye, Stephen J; Matthews, Stephen G; Eriksson, Joel; Mellstrom, Dan; Ohlsson, Claes; Price, Jackie F; Strachan, Mark W J; Reynolds, Rebecca M; Tiemeier, Henning; Walker, Brian R

    2014-07-01

    Variation in plasma levels of cortisol, an essential hormone in the stress response, is associated in population-based studies with cardio-metabolic, inflammatory and neuro-cognitive traits and diseases. Heritability of plasma cortisol is estimated at 30-60% but no common genetic contribution has been identified. The CORtisol NETwork (CORNET) consortium undertook genome wide association meta-analysis for plasma cortisol in 12,597 Caucasian participants, replicated in 2,795 participants. The results indicate that <1% of variance in plasma cortisol is accounted for by genetic variation in a single region of chromosome 14. This locus spans SERPINA6, encoding corticosteroid binding globulin (CBG, the major cortisol-binding protein in plasma), and SERPINA1, encoding α1-antitrypsin (which inhibits cleavage of the reactive centre loop that releases cortisol from CBG). Three partially independent signals were identified within the region, represented by common SNPs; detailed biochemical investigation in a nested sub-cohort showed all these SNPs were associated with variation in total cortisol binding activity in plasma, but some variants influenced total CBG concentrations while the top hit (rs12589136) influenced the immunoreactivity of the reactive centre loop of CBG. Exome chip and 1000 Genomes imputation analysis of this locus in the CROATIA-Korcula cohort identified missense mutations in SERPINA6 and SERPINA1 that did not account for the effects of common variants. These findings reveal a novel common genetic source of variation in binding of cortisol by CBG, and reinforce the key role of CBG in determining plasma cortisol levels. In turn this genetic variation may contribute to cortisol-associated degenerative diseases.

  16. Genome-wide association analysis identifies six new loci associated with forced vital capacity

    PubMed Central

    Loth, Daan W.; Artigas, María Soler; Gharib, Sina A.; Wain, Louise V.; Franceschini, Nora; Koch, Beate; Pottinger, Tess; Smith, Albert Vernon; Duan, Qing; Oldmeadow, Chris; Lee, Mi Kyeong; Strachan, David P.; James, Alan L.; Huffman, Jennifer E.; Vitart, Veronique; Ramasamy, Adaikalavan; Wareham, Nicholas J.; Kaprio, Jaakko; Wang, Xin-Qun; Trochet, Holly; Kähönen, Mika; Flexeder, Claudia; Albrecht, Eva; Lopez, Lorna M.; de Jong, Kim; Thyagarajan, Bharat; Alves, Alexessander Couto; Enroth, Stefan; Omenaas, Ernst; Joshi, Peter K.; Fall, Tove; Viňuela, Ana; Launer, Lenore J.; Loehr, Laura R.; Fornage, Myriam; Li, Guo; Wilk, Jemma B.; Tang, Wenbo; Manichaikul, Ani; Lahousse, Lies; Harris, Tamara B.; North, Kari E.; Rudnicka, Alicja R.; Hui, Jennie; Gu, Xiangjun; Lumley, Thomas; Wright, Alan F.; Hastie, Nicholas D.; Campbell, Susan; Kumar, Rajesh; Pin, Isabelle; Scott, Robert A.; Pietiläinen, Kirsi H.; Surakka, Ida; Liu, Yongmei; Holliday, Elizabeth G.; Schulz, Holger; Heinrich, Joachim; Davies, Gail; Vonk, Judith M.; Wojczynski, Mary; Pouta, Anneli; Johansson, Åsa; Wild, Sarah H.; Ingelsson, Erik; Rivadeneira, Fernando; Völzke, Henry; Hysi, Pirro G.; Eiriksdottir, Gudny; Morrison, Alanna C.; Rotter, Jerome I.; Gao, Wei; Postma, Dirkje S.; White, Wendy B.; Rich, Stephen S.; Hofman, Albert; Aspelund, Thor; Couper, David; Smith, Lewis J.; Psaty, Bruce M.; Lohman, Kurt; Burchard, Esteban G.; Uitterlinden, André G.; Garcia, Melissa; Joubert, Bonnie R.; McArdle, Wendy L.; Musk, A. Bill; Hansel, Nadia; Heckbert, Susan R.; Zgaga, Lina; van Meurs, Joyce B.J.; Navarro, Pau; Rudan, Igor; Oh, Yeon-Mok; Redline, Susan; Jarvis, Deborah; Zhao, Jing Hua; Rantanen, Taina; O’Connor, George T.; Ripatti, Samuli; Scott, Rodney J.; Karrasch, Stefan; Grallert, Harald; Gaddis, Nathan C.; Starr, John M.; Wijmenga, Cisca; Minster, Ryan L.; Lederer, David J.; Pekkanen, Juha; Gyllensten, Ulf; Campbell, Harry; Morris, Andrew P.; Gläser, Sven; Hammond, Christopher J.; Burkart, Kristin M.; Beilby, John; Kritchevsky, Stephen B.; Gudnason, Vilmundur; Hancock, Dana B.; Williams, O. Dale; Polasek, Ozren; Zemunik, Tatijana; Kolcic, Ivana; Petrini, Marcy F.; Wjst, Matthias; Kim, Woo Jin; Porteous, David J.; Scotland, Generation; Smith, Blair H.; Viljanen, Anne; Heliövaara, Markku; Attia, John R.; Sayers, Ian; Hampel, Regina; Gieger, Christian; Deary, Ian J.; Boezen, H. Marike; Newman, Anne; Jarvelin, Marjo-Riitta; Wilson, James F.; Lind, Lars; Stricker, Bruno H.; Teumer, Alexander; Spector, Timothy D.; Melén, Erik; Peters, Marjolein J.; Lange, Leslie A.; Barr, R. Graham; Bracke, Ken R.; Verhamme, Fien M.; Sung, Joohon; Hiemstra, Pieter S.; Cassano, Patricia A.; Sood, Akshay; Hayward, Caroline; Dupuis, Josée; Hall, Ian P.; Brusselle, Guy G.; Tobin, Martin D.; London, Stephanie J.

    2014-01-01

    Forced vital capacity (FVC), a spirometric measure of pulmonary function, reflects lung volume and is used to diagnose and monitor lung diseases. We performed genome-wide association study meta-analysis of FVC in 52,253 individuals from 26 studies and followed up the top associations in 32,917 additional individuals of European ancestry. We found six new regions associated at genome-wide significance (P < 5 × 10−8) with FVC in or near EFEMP1, BMP6, MIR-129-2/HSD17B12, PRDM11, WWOX, and KCNJ2. Two (GSTCD and PTCH1) loci previously associated with spirometric measures were related to FVC. Newly implicated regions were followed-up in samples of African American, Korean, Chinese, and Hispanic individuals. We detected transcripts for all six newly implicated genes in human lung tissue. The new loci may inform mechanisms involved in lung development and pathogenesis of restrictive lung disease. PMID:24929828

  17. The complete chloroplast genome of Sinopodophyllum hexandrum (Berberidaceae).

    PubMed

    Li, Huie; Guo, Qiqiang

    2016-07-01

    The complete chloroplast (cp) genome of the Sinopodophyllum hexandrum (Berberidaceae) was determined in this study. The circular genome is 157,940 bp in size, and comprises a pair of inverted repeat (IR) regions of 26,077 bp each, a large single-copy (LSC) region of 86,460 bp and a small single-copy (SSC) region of 19,326 bp. The GC content of the whole cp genome was 38.5%. A total of 133 genes were identified, including 88 protein-coding genes, 37 tRNA genes and eight rRNA genes. The whole cp genome consists of 114 unique genes, and 19 genes are duplicated in the IR regions. The phylogenetic analysis revealed that S. hexandrum is closely related to Nandina domestica within the family Berberidaceae.

  18. Trends in genome-wide and region-specific genetic diversity in the Dutch-Flemish Holstein-Friesian breeding program from 1986 to 2015.

    PubMed

    Doekes, Harmen P; Veerkamp, Roel F; Bijma, Piter; Hiemstra, Sipke J; Windig, Jack J

    2018-04-11

    In recent decades, Holstein-Friesian (HF) selection schemes have undergone profound changes, including the introduction of optimal contribution selection (OCS; around 2000), a major shift in breeding goal composition (around 2000) and the implementation of genomic selection (GS; around 2010). These changes are expected to have influenced genetic diversity trends. Our aim was to evaluate genome-wide and region-specific diversity in HF artificial insemination (AI) bulls in the Dutch-Flemish breeding program from 1986 to 2015. Pedigree and genotype data (~ 75.5 k) of 6280 AI-bulls were used to estimate rates of genome-wide inbreeding and kinship and corresponding effective population sizes. Region-specific inbreeding trends were evaluated using regions of homozygosity (ROH). Changes in observed allele frequencies were compared to those expected under pure drift to identify putative regions under selection. We also investigated the direction of changes in allele frequency over time. Effective population size estimates for the 1986-2015 period ranged from 69 to 102. Two major breakpoints were observed in genome-wide inbreeding and kinship trends. Around 2000, inbreeding and kinship levels temporarily dropped. From 2010 onwards, they steeply increased, with pedigree-based, ROH-based and marker-based inbreeding rates as high as 1.8, 2.1 and 2.8% per generation, respectively. Accumulation of inbreeding varied substantially across the genome. A considerable fraction of markers showed changes in allele frequency that were greater than expected under pure drift. Putative selected regions harboured many quantitative trait loci (QTL) associated to a wide range of traits. In consecutive 5-year periods, allele frequencies changed more often in the same direction than in opposite directions, except when comparing the 1996-2000 and 2001-2005 periods. Genome-wide and region-specific diversity trends reflect major changes in the Dutch-Flemish HF breeding program. Introduction of

  19. Automated update, revision, and quality control of the maize genome annotations using MAKER-P improves the B73 RefGen_v3 gene models and identifies new genes.

    PubMed

    Law, MeiYee; Childs, Kevin L; Campbell, Michael S; Stein, Joshua C; Olson, Andrew J; Holt, Carson; Panchy, Nicholas; Lei, Jikai; Jiao, Dian; Andorf, Carson M; Lawrence, Carolyn J; Ware, Doreen; Shiu, Shin-Han; Sun, Yanni; Jiang, Ning; Yandell, Mark

    2015-01-01

    The large size and relative complexity of many plant genomes make creation, quality control, and dissemination of high-quality gene structure annotations challenging. In response, we have developed MAKER-P, a fast and easy-to-use genome annotation engine for plants. Here, we report the use of MAKER-P to update and revise the maize (Zea mays) B73 RefGen_v3 annotation build (5b+) in less than 3 h using the iPlant Cyberinfrastructure. MAKER-P identified and annotated 4,466 additional, well-supported protein-coding genes not present in the 5b+ annotation build, added additional untranslated regions to 1,393 5b+ gene models, identified 2,647 5b+ gene models that lack any supporting evidence (despite the use of large and diverse evidence data sets), identified 104,215 pseudogene fragments, and created an additional 2,522 noncoding gene annotations. We also describe a method for de novo training of MAKER-P for the annotation of newly sequenced grass genomes. Collectively, these results lead to the 6a maize genome annotation and demonstrate the utility of MAKER-P for rapid annotation, management, and quality control of grasses and other difficult-to-annotate plant genomes. © 2015 American Society of Plant Biologists. All Rights Reserved.

  20. High-resolution single-nucleotide polymorphism array-profiling in myeloproliferative neoplasms identifies novel genomic aberrations

    PubMed Central

    Stegelmann, Frank; Bullinger, Lars; Griesshammer, Martin; Holzmann, Karlheinz; Habdank, Marianne; Kuhn, Susanne; Maile, Carmen; Schauer, Stefanie; Döhner, Hartmut; Döhner, Konstanze

    2010-01-01

    Single-nucleotide polymorphism arrays allow for genome-wide profiling of copy-number alterations and copy-neutral runs of homozygosity at high resolution. To identify novel genetic lesions in myeloproliferative neoplasms, a large series of 151 clinically well characterized patients was analyzed in our study. Copy-number alterations were rare in essential thrombocythemia and polycythemia vera. In contrast, approximately one third of myelofibrosis patients exhibited small genomic losses (less than 5 Mb). In 2 secondary myelofibrosis cases the tumor suppressor gene NF1 in 17q11.2 was affected. Sequencing analyses revealed a mutation in the remaining NF1 allele of one patient. In terms of copy-neutral aberrations, no chromosomes other than 9p were recurrently affected. In conclusion, novel genomic aberrations were identified in our study, in particular in patients with myelofibrosis. Further analyses on single-gene level are necessary to uncover the mechanisms that are involved in the pathogenesis of myeloproliferative neoplasms. PMID:20015882

  1. A genomic approach to identify hybrid incompatibility genes.

    PubMed

    Cooper, Jacob C; Phadnis, Nitin

    2016-07-02

    Uncovering the genetic and molecular basis of barriers to gene flow between populations is key to understanding how new species are born. Intrinsic postzygotic reproductive barriers such as hybrid sterility and hybrid inviability are caused by deleterious genetic interactions known as hybrid incompatibilities. The difficulty in identifying these hybrid incompatibility genes remains a rate-limiting step in our understanding of the molecular basis of speciation. We recently described how whole genome sequencing can be applied to identify hybrid incompatibility genes, even from genetically terminal hybrids. Using this approach, we discovered a new hybrid incompatibility gene, gfzf, between Drosophila melanogaster and Drosophila simulans, and found that it plays an essential role in cell cycle regulation. Here, we discuss the history of the hunt for incompatibility genes between these species, discuss the molecular roles of gfzf in cell cycle regulation, and explore how intragenomic conflict drives the evolution of fundamental cellular mechanisms that lead to the developmental arrest of hybrids.

  2. A genomic approach to identify hybrid incompatibility genes

    PubMed Central

    Cooper, Jacob C.; Phadnis, Nitin

    2016-01-01

    ABSTRACT Uncovering the genetic and molecular basis of barriers to gene flow between populations is key to understanding how new species are born. Intrinsic postzygotic reproductive barriers such as hybrid sterility and hybrid inviability are caused by deleterious genetic interactions known as hybrid incompatibilities. The difficulty in identifying these hybrid incompatibility genes remains a rate-limiting step in our understanding of the molecular basis of speciation. We recently described how whole genome sequencing can be applied to identify hybrid incompatibility genes, even from genetically terminal hybrids. Using this approach, we discovered a new hybrid incompatibility gene, gfzf, between Drosophila melanogaster and Drosophila simulans, and found that it plays an essential role in cell cycle regulation. Here, we discuss the history of the hunt for incompatibility genes between these species, discuss the molecular roles of gfzf in cell cycle regulation, and explore how intragenomic conflict drives the evolution of fundamental cellular mechanisms that lead to the developmental arrest of hybrids. PMID:27230814

  3. Evolutionary history of the ABCB2 genomic region in teleosts

    USGS Publications Warehouse

    Palti, Y.; Rodriguez, M.F.; Gahr, S.A.; Hansen, J.D.

    2007-01-01

    Gene duplication, silencing and translocation have all been implicated in shaping the unique genomic architecture of the teleost MH regions. Previously, we demonstrated that trout possess five unlinked regions encoding MH genes. One of these regions harbors ABCB2 which in all other vertebrate classes is found in the MHC class II region. In this study, we sequenced a BAC contig for the trout ABCB2 region. Analysis of this region revealed the presence of genes homologous to those located in the human class II (ABCB2, BRD2, ??DAA), extended class II (RGL2, PHF1, SYGP1) and class III (PBX2, Notch-L) regions. The organization and syntenic relationships of this region were then compared to similar regions in humans, Tetraodon and zebrafish to learn more about the evolutionary history of this region. Our analysis indicates that this region was generated during the teleost-specific duplication event while also providing insight about potential MH paralogous regions in teleosts. ?? 2006 Elsevier Ltd. All rights reserved.

  4. Genome organization of epidemic Acinetobacter baumannii strains.

    PubMed

    Di Nocera, Pier Paolo; Rocco, Francesco; Giannouli, Maria; Triassi, Maria; Zarrilli, Raffaele

    2011-10-10

    Acinetobacter baumannii is an opportunistic pathogen responsible for hospital-acquired infections. A. baumannii epidemics described world-wide were caused by few genotypic clusters of strains. The occurrence of epidemics caused by multi-drug resistant strains assigned to novel genotypes have been reported over the last few years. In the present study, we compared whole genome sequences of three A. baumannii strains assigned to genotypes ST2, ST25 and ST78, representative of the most frequent genotypes responsible for epidemics in several Mediterranean hospitals, and four complete genome sequences of A. baumannii strains assigned to genotypes ST1, ST2 and ST77. Comparative genome analysis showed extensive synteny and identified 3068 coding regions which are conserved, at the same chromosomal position, in all A. baumannii genomes. Genome alignments also identified 63 DNA regions, ranging in size from 4 o 126 kb, all defined as genomic islands, which were present in some genomes, but were either missing or replaced by non-homologous DNA sequences in others. Some islands are involved in resistance to drugs and metals, others carry genes encoding surface proteins or enzymes involved in specific metabolic pathways, and others correspond to prophage-like elements. Accessory DNA regions encode 12 to 19% of the potential gene products of the analyzed strains. The analysis of a collection of epidemic A. baumannii strains showed that some islands were restricted to specific genotypes. The definition of the genome components of A. baumannii provides a scaffold to rapidly evaluate the genomic organization of novel clinical A. baumannii isolates. Changes in island profiling will be useful in genomic epidemiology of A. baumannii population.

  5. DNA methylation in the APOE genomic region is associated with cognitive function in African Americans.

    PubMed

    Liu, Jiaxuan; Zhao, Wei; Ware, Erin B; Turner, Stephen T; Mosley, Thomas H; Smith, Jennifer A

    2018-05-08

    Genetic variations in apolipoprotein E (APOE) and proximal genes (PVRL2, TOMM40, and APOC1) are associated with cognitive function and dementia, particularly Alzheimer's disease. Epigenetic mechanisms such as DNA methylation play a central role in the regulation of gene expression. Recent studies have found evidence that DNA methylation may contribute to the pathogenesis of dementia, but its association with cognitive function in populations without dementia remains unclear. We assessed DNA methylation levels of 48 CpG sites in the APOE genomic region in peripheral blood leukocytes collected from 289 African Americans (mean age = 67 years) from the Genetic Epidemiology Network of Arteriopathy (GENOA) study. Using linear regression, we examined the relationship between methylation in the APOE genomic region and multiple cognitive measures including learning, memory, processing speed, concentration, language and global cognitive function. We identified eight CpG sites in three genes (PVRL2, TOMM40, and APOE) that showed an inverse association between methylation level and delayed recall, a measure of memory, after adjusting for age and sex (False Discovery Rate q-value < 0.1). All eight CpGs are located in either CpG islands (CGIs) or CGI shelves, and six of them are in promoter regions. Education and APOE ε4 carrier status significantly modified the effect of methylation in cg08583001 (PVRL2) and cg22024783 (TOMM40), respectively. Together, methylation of the eight CpGs explained an additional 8.7% of the variance in delayed recall, after adjustment for age, sex, education, and APOE ε4 carrier status. Methylation was not significantly associated with any other cognitive measures. Our results suggest that methylation levels at multiple CpGs in the APOE genomic region are inversely associated with delayed recall during normal cognitive aging, even after accounting for known genetic predictors for cognition. Our findings highlight the important role of

  6. GRIL: genome rearrangement and inversion locator.

    PubMed

    Darling, Aaron E; Mau, Bob; Blattner, Frederick R; Perna, Nicole T

    2004-01-01

    GRIL is a tool to automatically identify collinear regions in a set of bacterial-size genome sequences. GRIL uses three basic steps. First, regions of high sequence identity are located. Second, some of these regions are filtered based on user-specified criteria. Finally, the remaining regions of sequence identity are used to define significant collinear regions among the sequences. By locating collinear regions of sequence, GRIL provides a basis for multiple genome alignment using current alignment systems. GRIL also provides a basis for using current inversion distance tools to infer phylogeny. GRIL is implemented in C++ and runs on any x86-based Linux or Windows platform. It is available from http://asap.ahabs.wisc.edu/gril

  7. Whole Genome Sequencing Demonstrates Limited Transmission within Identified Mycobacterium tuberculosis Clusters in New South Wales, Australia

    PubMed Central

    Gurjav, Ulziijargal; Outhred, Alexander C.; Jelfs, Peter; McCallum, Nadine; Wang, Qinning; Hill-Cawthorne, Grant A.; Marais, Ben J.; Sintchenko, Vitali

    2016-01-01

    Australia has a low tuberculosis incidence rate with most cases occurring among recent immigrants. Given suboptimal cluster resolution achieved with 24-locus mycobacterium interspersed repetitive unit (MIRU-24) genotyping, the added value of whole genome sequencing was explored. MIRU-24 profiles of all Mycobacterium tuberculosis culture-confirmed tuberculosis cases diagnosed between 2009 and 2013 in New South Wales (NSW), Australia, were examined and clusters identified. The relatedness of cases within the largest MIRU-24 clusters was assessed using whole genome sequencing and phylogenetic analyses. Of 1841 culture-confirmed TB cases, 91.9% (1692/1841) had complete demographic and genotyping data. East-African Indian (474; 28.0%) and Beijing (470; 27.8%) lineage strains predominated. The overall rate of MIRU-24 clustering was 20.1% (340/1692) and was highest among Beijing lineage strains (35.7%; 168/470). One Beijing and three East-African Indian (EAI) clonal complexes were responsible for the majority of observed clusters. Whole genome sequencing of the 4 largest clusters (30 isolates) demonstrated diverse single nucleotide polymorphisms (SNPs) within identified clusters. All sequenced EAI strains and 70% of Beijing lineage strains clustered by MIRU-24 typing demonstrated distinct SNP profiles. The superior resolution provided by whole genome sequencing demonstrated limited M. tuberculosis transmission within NSW, even within identified MIRU-24 clusters. Routine whole genome sequencing could provide valuable public health guidance in low burden settings. PMID:27737005

  8. Unbiased Combinatorial Genomic Approaches to Identify Alternative Therapeutic Targets within the TSC Signaling Network

    DTIC Science & Technology

    2015-09-01

    assessed the specificity of mutation in Drosophila S2R+ cells. We generated a quantitative mutation reporter vector in which an sgRNA target sequence ...phosphatases (563 genes) in the Drosophila genome (Figure 4). 65 samples that displayed synthetic lethality (15 genes) or synthetic increases in viability...targeting all kinases and phosphatases (563 genes) in the Drosophila genome . . Identified three hits (mRNA-Cap, Pitslre and CycT) that scored as

  9. Phylogenetic Conflict in Bears Identified by Automated Discovery of Transposable Element Insertions in Low-Coverage Genomes.

    PubMed

    Lammers, Fritjof; Gallus, Susanne; Janke, Axel; Nilsson, Maria A

    2017-10-01

    Phylogenetic reconstruction from transposable elements (TEs) offers an additional perspective to study evolutionary processes. However, detecting phylogenetically informative TE insertions requires tedious experimental work, limiting the power of phylogenetic inference. Here, we analyzed the genomes of seven bear species using high-throughput sequencing data to detect thousands of TE insertions. The newly developed pipeline for TE detection called TeddyPi (TE detection and discovery for Phylogenetic Inference) identified 150,513 high-quality TE insertions in the genomes of ursine and tremarctine bears. By integrating different TE insertion callers and using a stringent filtering approach, the TeddyPi pipeline produced highly reliable TE insertion calls, which were confirmed by extensive in vitro validation experiments. Analysis of single nucleotide substitutions in the flanking regions of the TEs shows that these substitutions correlate with the phylogenetic signal from the TE insertions. Our phylogenomic analyses show that TEs are a major driver of genomic variation in bears and enabled phylogenetic reconstruction of a well-resolved species tree, despite strong signals for incomplete lineage sorting and introgression. The analyses show that the Asiatic black, sun, and sloth bear form a monophyletic clade, in which phylogenetic incongruence originates from incomplete lineage sorting. TeddyPi is open source and can be adapted to various TE and structural variation callers. The pipeline makes it possible to confidently extract thousands of TE insertions even from low-coverage genomes (∼10×) of nonmodel organisms. This opens new possibilities for biologists to study phylogenies and evolutionary processes as well as rates and patterns of (retro-)transposition and structural variation. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  10. Using sheep genomes from diverse U.S. breeds to identify missense variants in genes affecting fecundity

    USDA-ARS?s Scientific Manuscript database

    Background: Access to sheep genome sequences significantly improves the chances of identifying genes that may influence the health, welfare, and productivity of these animals. Methods: A public, searchable DNA sequence resource for U.S. sheep was created with whole genome sequence (WGS) of 96 rams. ...

  11. Identifying homomorphic sex chromosomes from wild-caught adults with limited genomic resources.

    PubMed

    Brelsford, Alan; Lavanchy, Guillaume; Sermier, Roberto; Rausch, Anna; Perrin, Nicolas

    2017-07-01

    We demonstrate a genotyping-by-sequencing approach to identify homomorphic sex chromosomes and their homolog in a distantly related reference genome, based on noninvasive sampling of wild-caught individuals, in the moor frog Rana arvalis. Double-digest RADseq libraries were generated using buccal swabs from 30 males and 21 females from the same population. Search for sex-limited markers from the unfiltered data set (411 446 RAD tags) was more successful than searches from a filtered data set (33 073 RAD tags) for markers showing sex differences in heterozygosity or in allele frequencies. Altogether, we obtained 292 putatively sex-linked RAD loci, 98% of which point to male heterogamety. We could map 15 of them to the Xenopus tropicalis genome, all but one on chromosome pair 1, which seems regularly co-opted for sex determination among amphibians. The most efficient mapping strategy was a three-step hierarchical approach, where R. arvalis reads were first mapped to a low-coverage genome of Rana temporaria (17 My divergence), then the R. temporaria scaffolds to the Nanorana parkeri genome (90 My divergence), and finally the N. parkeri scaffolds to the X. tropicalis genome (210 My). We validated our conclusions with PCR primers amplifying part of Dmrt1, a candidate sex determination gene mapping to chromosome 1: a sex-diagnostic allele was present in all 30 males but in none of the 21 females. Our approach is likely to be productive in many situations where biological samples and/or genomic resources are limited. © 2016 John Wiley & Sons Ltd.

  12. Genome Sequences of Marine Shrimp Exopalaemon carinicauda Holthuis Provide Insights into Genome Size Evolution of Caridea.

    PubMed

    Yuan, Jianbo; Gao, Yi; Zhang, Xiaojun; Wei, Jiankai; Liu, Chengzhang; Li, Fuhua; Xiang, Jianhai

    2017-07-05

    Crustacea, particularly Decapoda, contains many economically important species, such as shrimps and crabs. Crustaceans exhibit enormous (nearly 500-fold) variability in genome size. However, limited genome resources are available for investigating these species. Exopalaemon carinicauda Holthuis, an economical caridean shrimp, is a potential ideal experimental animal for research on crustaceans. In this study, we performed low-coverage sequencing and de novo assembly of the E. carinicauda genome. The assembly covers more than 95% of coding regions. E. carinicauda possesses a large complex genome (5.73 Gb), with size twice higher than those of many decapod shrimps. As such, comparative genomic analyses were implied to investigate factors affecting genome size evolution of decapods. However, clues associated with genome duplication were not identified, and few horizontally transferred sequences were detected. Ultimately, the burst of transposable elements, especially retrotransposons, was determined as the major factor influencing genome expansion. A total of 2 Gb repeats were identified, and RTE-BovB, Jockey, Gypsy, and DIRS were the four major retrotransposons that significantly expanded. Both recent (Jockey and Gypsy) and ancestral (DIRS) originated retrotransposons responsible for the genome evolution. The E. carinicauda genome also exhibited potential for the genomic and experimental research of shrimps.

  13. Genome-wide association study meta-analysis identifies five new loci for systemic lupus erythematosus.

    PubMed

    Julià, Antonio; López-Longo, Francisco Javier; Pérez Venegas, José J; Bonàs-Guarch, Silvia; Olivé, Àlex; Andreu, José Luís; Aguirre-Zamorano, Mª Ángeles; Vela, Paloma; Nolla, Joan M; de la Fuente, José Luís Marenco; Zea, Antonio; Pego-Reigosa, José María; Freire, Mercedes; Díez, Elvira; Rodríguez-Almaraz, Esther; Carreira, Patricia; Blanco, Ricardo; Taboada, Víctor Martínez; López-Lasanta, María; Corbeto, Mireia López; Mercader, Josep M; Torrents, David; Absher, Devin; Marsal, Sara; Fernández-Nebro, Antonio

    2018-05-30

    Systemic lupus erythematosus (SLE) is a common systemic autoimmune disease with a complex genetic inheritance. Genome-wide association studies (GWAS) have significantly increased the number of significant loci associated with SLE risk. To date, however, established loci account for less than 30% of the disease heritability and additional risk variants have yet to be identified. Here we performed a GWAS followed by a meta-analysis to identify new genome-wide significant loci for SLE. We genotyped a cohort of 907 patients with SLE (cases) and 1524 healthy controls from Spain and performed imputation using the 1000 Genomes reference data. We tested for association using logistic regression with correction for the principal components of variation. Meta-analysis of the association results was subsequently performed on 7,110,321 variants using genetic data from a large cohort of 4036 patients with SLE and 6959 controls of Northern European ancestry. Genetic association was also tested at the pathway level after removing the effect of known risk loci using PASCAL software. We identified five new loci associated with SLE at the genome-wide level of significance (p < 5 × 10 - 8 ): GRB2, SMYD3, ST8SIA4, LAT2 and ARHGAP27. Pathway analysis revealed several biological processes significantly associated with SLE risk: B cell receptor signaling (p = 5.28 × 10 - 6 ), CTLA4 co-stimulation during T cell activation (p = 3.06 × 10 - 5 ), interleukin-4 signaling (p = 3.97 × 10 - 5 ) and cell surface interactions at the vascular wall (p = 4.63 × 10 - 5 ). Our results identify five novel loci for SLE susceptibility, and biologic pathways associated via multiple low-effect-size loci.

  14. Navigating the currents of seascape genomics: how spatial analyses can augment population genomic studies

    PubMed Central

    Crandall, Eric D.; Liggins, Libby; Bongaerts, Pim; Treml, Eric A.

    2016-01-01

    Population genomic approaches are making rapid inroads in the study of non-model organisms, including marine taxa. To date, these marine studies have predominantly focused on rudimentary metrics describing the spatial and environmental context of their study region (e.g., geographical distance, average sea surface temperature, average salinity). We contend that a more nuanced and considered approach to quantifying seascape dynamics and patterns can strengthen population genomic investigations and help identify spatial, temporal, and environmental factors associated with differing selective regimes or demographic histories. Nevertheless, approaches for quantifying marine landscapes are complicated. Characteristic features of the marine environment, including pelagic living in flowing water (experienced by most marine taxa at some point in their life cycle), require a well-designed spatial-temporal sampling strategy and analysis. Many genetic summary statistics used to describe populations may be inappropriate for marine species with large population sizes, large species ranges, stochastic recruitment, and asymmetrical gene flow. Finally, statistical approaches for testing associations between seascapes and population genomic patterns are still maturing with no single approach able to capture all relevant considerations. None of these issues are completely unique to marine systems and therefore similar issues and solutions will be shared for many organisms regardless of habitat. Here, we outline goals and spatial approaches for landscape genomics with an emphasis on marine systems and review the growing empirical literature on seascape genomics. We review established tools and approaches and highlight promising new strategies to overcome select issues including a strategy to spatially optimize sampling. Despite the many challenges, we argue that marine systems may be especially well suited for identifying candidate genomic regions under environmentally mediated

  15. Navigating the currents of seascape genomics: how spatial analyses can augment population genomic studies.

    PubMed

    Riginos, Cynthia; Crandall, Eric D; Liggins, Libby; Bongaerts, Pim; Treml, Eric A

    2016-12-01

    Population genomic approaches are making rapid inroads in the study of non-model organisms, including marine taxa. To date, these marine studies have predominantly focused on rudimentary metrics describing the spatial and environmental context of their study region (e.g., geographical distance, average sea surface temperature, average salinity). We contend that a more nuanced and considered approach to quantifying seascape dynamics and patterns can strengthen population genomic investigations and help identify spatial, temporal, and environmental factors associated with differing selective regimes or demographic histories. Nevertheless, approaches for quantifying marine landscapes are complicated. Characteristic features of the marine environment, including pelagic living in flowing water (experienced by most marine taxa at some point in their life cycle), require a well-designed spatial-temporal sampling strategy and analysis. Many genetic summary statistics used to describe populations may be inappropriate for marine species with large population sizes, large species ranges, stochastic recruitment, and asymmetrical gene flow. Finally, statistical approaches for testing associations between seascapes and population genomic patterns are still maturing with no single approach able to capture all relevant considerations. None of these issues are completely unique to marine systems and therefore similar issues and solutions will be shared for many organisms regardless of habitat. Here, we outline goals and spatial approaches for landscape genomics with an emphasis on marine systems and review the growing empirical literature on seascape genomics. We review established tools and approaches and highlight promising new strategies to overcome select issues including a strategy to spatially optimize sampling. Despite the many challenges, we argue that marine systems may be especially well suited for identifying candidate genomic regions under environmentally mediated

  16. Leveraging Comparative Genomics to Identify and Functionally Characterize Genes Associated with Sperm Phenotypes in Python bivittatus (Burmese Python).

    PubMed

    Irizarry, Kristopher J L; Rutllant, Josep

    2016-01-01

    Comparative genomics approaches provide a means of leveraging functional genomics information from a highly annotated model organism's genome (such as the mouse genome) in order to make physiological inferences about the role of genes and proteins in a less characterized organism's genome (such as the Burmese python). We employed a comparative genomics approach to produce the functional annotation of Python bivittatus genes encoding proteins associated with sperm phenotypes. We identify 129 gene-phenotype relationships in the python which are implicated in 10 specific sperm phenotypes. Results obtained through our systematic analysis identified subsets of python genes exhibiting associations with gene ontology annotation terms. Functional annotation data was represented in a semantic scatter plot. Together, these newly annotated Python bivittatus genome resources provide a high resolution framework from which the biology relating to reptile spermatogenesis, fertility, and reproduction can be further investigated. Applications of our research include (1) production of genetic diagnostics for assessing fertility in domestic and wild reptiles; (2) enhanced assisted reproduction technology for endangered and captive reptiles; and (3) novel molecular targets for biotechnology-based approaches aimed at reducing fertility and reproduction of invasive reptiles. Additional enhancements to reptile genomic resources will further enhance their value.

  17. Genome-Wide Structural Variation Detection by Genome Mapping on Nanochannel Arrays.

    PubMed

    Mak, Angel C Y; Lai, Yvonne Y Y; Lam, Ernest T; Kwok, Tsz-Piu; Leung, Alden K Y; Poon, Annie; Mostovoy, Yulia; Hastie, Alex R; Stedman, William; Anantharaman, Thomas; Andrews, Warren; Zhou, Xiang; Pang, Andy W C; Dai, Heng; Chu, Catherine; Lin, Chin; Wu, Jacob J K; Li, Catherine M L; Li, Jing-Woei; Yim, Aldrin K Y; Chan, Saki; Sibert, Justin; Džakula, Željko; Cao, Han; Yiu, Siu-Ming; Chan, Ting-Fung; Yip, Kevin Y; Xiao, Ming; Kwok, Pui-Yan

    2016-01-01

    Comprehensive whole-genome structural variation detection is challenging with current approaches. With diploid cells as DNA source and the presence of numerous repetitive elements, short-read DNA sequencing cannot be used to detect structural variation efficiently. In this report, we show that genome mapping with long, fluorescently labeled DNA molecules imaged on nanochannel arrays can be used for whole-genome structural variation detection without sequencing. While whole-genome haplotyping is not achieved, local phasing (across >150-kb regions) is routine, as molecules from the parental chromosomes are examined separately. In one experiment, we generated genome maps from a trio from the 1000 Genomes Project, compared the maps against that derived from the reference human genome, and identified structural variations that are >5 kb in size. We find that these individuals have many more structural variants than those published, including some with the potential of disrupting gene function or regulation. Copyright © 2016 by the Genetics Society of America.

  18. A Genome-Wide Association Study Identifies Genetic Variants Associated with Mathematics Ability

    PubMed Central

    Chen, Huan; Gu, Xiao-hong; Zhou, Yuxi; Ge, Zeng; Wang, Bin; Siok, Wai Ting; Wang, Guoqing; Huen, Michael; Jiang, Yuyang; Tan, Li-Hai; Sun, Yimin

    2017-01-01

    Mathematics ability is a complex cognitive trait with polygenic heritability. Genome-wide association study (GWAS) has been an effective approach to investigate genetic components underlying mathematic ability. Although previous studies reported several candidate genetic variants, none of them exceeded genome-wide significant threshold in general populations. Herein, we performed GWAS in Chinese elementary school students to identify potential genetic variants associated with mathematics ability. The discovery stage included 494 and 504 individuals from two independent cohorts respectively. The replication stage included another cohort of 599 individuals. In total, 28 of 81 candidate SNPs that met validation criteria were further replicated. Combined meta-analysis of three cohorts identified four SNPs (rs1012694, rs11743006, rs17778739 and rs17777541) of SPOCK1 gene showing association with mathematics ability (minimum p value 5.67 × 10−10, maximum β −2.43). The SPOCK1 gene is located on chromosome 5q31.2 and encodes a highly conserved glycoprotein testican-1 which was associated with tumor progression and prognosis as well as neurogenesis. This is the first study to report genome-wide significant association of individual SNPs with mathematics ability in general populations. Our preliminary results further supported the role of SPOCK1 during neurodevelopment. The genetic complexities underlying mathematics ability might contribute to explain the basis of human cognition and intelligence at genetic level. PMID:28155865

  19. A Genome-Wide Association Study Identifies Genetic Variants Associated with Mathematics Ability.

    PubMed

    Chen, Huan; Gu, Xiao-Hong; Zhou, Yuxi; Ge, Zeng; Wang, Bin; Siok, Wai Ting; Wang, Guoqing; Huen, Michael; Jiang, Yuyang; Tan, Li-Hai; Sun, Yimin

    2017-02-03

    Mathematics ability is a complex cognitive trait with polygenic heritability. Genome-wide association study (GWAS) has been an effective approach to investigate genetic components underlying mathematic ability. Although previous studies reported several candidate genetic variants, none of them exceeded genome-wide significant threshold in general populations. Herein, we performed GWAS in Chinese elementary school students to identify potential genetic variants associated with mathematics ability. The discovery stage included 494 and 504 individuals from two independent cohorts respectively. The replication stage included another cohort of 599 individuals. In total, 28 of 81 candidate SNPs that met validation criteria were further replicated. Combined meta-analysis of three cohorts identified four SNPs (rs1012694, rs11743006, rs17778739 and rs17777541) of SPOCK1 gene showing association with mathematics ability (minimum p value 5.67 × 10 -10 , maximum β -2.43). The SPOCK1 gene is located on chromosome 5q31.2 and encodes a highly conserved glycoprotein testican-1 which was associated with tumor progression and prognosis as well as neurogenesis. This is the first study to report genome-wide significant association of individual SNPs with mathematics ability in general populations. Our preliminary results further supported the role of SPOCK1 during neurodevelopment. The genetic complexities underlying mathematics ability might contribute to explain the basis of human cognition and intelligence at genetic level.

  20. Intra-Genomic Internal Transcribed Spacer Region Sequence Heterogeneity and Molecular Diagnosis in Clinical Microbiology.

    PubMed

    Zhao, Ying; Tsang, Chi-Ching; Xiao, Meng; Cheng, Jingwei; Xu, Yingchun; Lau, Susanna K P; Woo, Patrick C Y

    2015-10-22

    Internal transcribed spacer region (ITS) sequencing is the most extensively used technology for accurate molecular identification of fungal pathogens in clinical microbiology laboratories. Intra-genomic ITS sequence heterogeneity, which makes fungal identification based on direct sequencing of PCR products difficult, has rarely been reported in pathogenic fungi. During the process of performing ITS sequencing on 71 yeast strains isolated from various clinical specimens, direct sequencing of the PCR products showed ambiguous sequences in six of them. After cloning the PCR products into plasmids for sequencing, interpretable sequencing electropherograms could be obtained. For each of the six isolates, 10-49 clones were selected for sequencing and two to seven intra-genomic ITS copies were detected. The identities of these six isolates were confirmed to be Candida glabrata (n=2), Pichia (Candida) norvegensis (n=2), Candida tropicalis (n=1) and Saccharomyces cerevisiae (n=1). Multiple sequence alignment revealed that one to four intra-genomic ITS polymorphic sites were present in the six isolates, and all these polymorphic sites were located in the ITS1 and/or ITS2 regions. We report and describe the first evidence of intra-genomic ITS sequence heterogeneity in four different pathogenic yeasts, which occurred exclusively in the ITS1 and ITS2 spacer regions for the six isolates in this study.

  1. Parasitism drives host genome evolution: Insights from the Pasteuria ramosa-Daphnia magna system.

    PubMed

    Bourgeois, Yann; Roulin, Anne C; Müller, Kristina; Ebert, Dieter

    2017-04-01

    Because parasitism is thought to play a major role in shaping host genomes, it has been predicted that genomic regions associated with resistance to parasites should stand out in genome scans, revealing signals of selection above the genomic background. To test whether parasitism is indeed such a major factor in host evolution and to better understand host-parasite interaction at the molecular level, we studied genome-wide polymorphisms in 97 genotypes of the planktonic crustacean Daphnia magna originating from three localities across Europe. Daphnia magna is known to coevolve with the bacterial pathogen Pasteuria ramosa for which host genotypes (clonal lines) are either resistant or susceptible. Using association mapping, we identified two genomic regions involved in resistance to P. ramosa, one of which was already known from a previous QTL analysis. We then performed a naïve genome scan to test for signatures of positive selection and found that the two regions identified with the association mapping further stood out as outliers. Several other regions with evidence for selection were also found, but no link between these regions and phenotypic variation could be established. Our results are consistent with the hypothesis that parasitism is driving host genome evolution. © 2017 The Author(s). Evolution © 2017 The Society for the Study of Evolution.

  2. Indexcov: fast coverage quality control for whole-genome sequencing.

    PubMed

    Pedersen, Brent S; Collins, Ryan L; Talkowski, Michael E; Quinlan, Aaron R

    2017-11-01

    The BAM and CRAM formats provide a supplementary linear index that facilitates rapid access to sequence alignments in arbitrary genomic regions. Comparing consecutive entries in a BAM or CRAM index allows one to infer the number of alignment records per genomic region for use as an effective proxy of sequence depth in each genomic region. Based on these properties, we have developed indexcov, an efficient estimator of whole-genome sequencing coverage to rapidly identify samples with aberrant coverage profiles, reveal large-scale chromosomal anomalies, recognize potential batch effects, and infer the sex of a sample. Indexcov is available at https://github.com/brentp/goleft under the MIT license. © The Authors 2017. Published by Oxford University Press.

  3. A Functional Genomics Approach to Identify Novel Breast Cancer Gene Targets in Yeast

    DTIC Science & Technology

    2004-05-01

    AD Award Number: DAMD17-03-1-0232 TITLE: A Functional Genomics Approach to Identify Novel Breast Cancer Gene Targets in Yeast PRINCIPAL INVESTIGATOR...Approach to Identify Novel Breast DAMD17-03-1-0232 Cancer Gene Targets in Yeast 6. A UTHOR(S) Craig Bennett, Ph.D. 7. PERFORMING ORGANIZA TION NAME(S...Unlimited 13. ABSTRACT (Maximum 200 Words) We are using the yeast Saccharomyces cerevisiae to identify new cancer gene targets that interact with the

  4. Analyses of charophyte chloroplast genomes help characterize the ancestral chloroplast genome of land plants.

    PubMed

    Civaň, Peter; Foster, Peter G; Embley, Martin T; Séneca, Ana; Cox, Cymon J

    2014-04-01

    Despite the significance of the relationships between embryophytes and their charophyte algal ancestors in deciphering the origin and evolutionary success of land plants, few chloroplast genomes of the charophyte algae have been reconstructed to date. Here, we present new data for three chloroplast genomes of the freshwater charophytes Klebsormidium flaccidum (Klebsormidiophyceae), Mesotaenium endlicherianum (Zygnematophyceae), and Roya anglica (Zygnematophyceae). The chloroplast genome of Klebsormidium has a quadripartite organization with exceptionally large inverted repeat (IR) regions and, uniquely among streptophytes, has lost the rrn5 and rrn4.5 genes from the ribosomal RNA (rRNA) gene cluster operon. The chloroplast genome of Roya differs from other zygnematophycean chloroplasts, including the newly sequenced Mesotaenium, by having a quadripartite structure that is typical of other streptophytes. On the basis of the improbability of the novel gain of IR regions, we infer that the quadripartite structure has likely been lost independently in at least three zygnematophycean lineages, although the absence of the usual rRNA operonic synteny in the IR regions of Roya may indicate their de novo origin. Significantly, all zygnematophycean chloroplast genomes have undergone substantial genomic rearrangement, which may be the result of ancient retroelement activity evidenced by the presence of integrase-like and reverse transcriptase-like elements in the Roya chloroplast genome. Our results corroborate the close phylogenetic relationship between Zygnematophyceae and land plants and identify 89 protein-coding genes and 22 introns present in the chloroplast genome at the time of the evolutionary transition of plants to land, all of which can be found in the chloroplast genomes of extant charophytes.

  5. Analyses of Charophyte Chloroplast Genomes Help Characterize the Ancestral Chloroplast Genome of Land Plants

    PubMed Central

    Civáň, Peter; Foster, Peter G.; Embley, Martin T.; Séneca, Ana; Cox, Cymon J.

    2014-01-01

    Despite the significance of the relationships between embryophytes and their charophyte algal ancestors in deciphering the origin and evolutionary success of land plants, few chloroplast genomes of the charophyte algae have been reconstructed to date. Here, we present new data for three chloroplast genomes of the freshwater charophytes Klebsormidium flaccidum (Klebsormidiophyceae), Mesotaenium endlicherianum (Zygnematophyceae), and Roya anglica (Zygnematophyceae). The chloroplast genome of Klebsormidium has a quadripartite organization with exceptionally large inverted repeat (IR) regions and, uniquely among streptophytes, has lost the rrn5 and rrn4.5 genes from the ribosomal RNA (rRNA) gene cluster operon. The chloroplast genome of Roya differs from other zygnematophycean chloroplasts, including the newly sequenced Mesotaenium, by having a quadripartite structure that is typical of other streptophytes. On the basis of the improbability of the novel gain of IR regions, we infer that the quadripartite structure has likely been lost independently in at least three zygnematophycean lineages, although the absence of the usual rRNA operonic synteny in the IR regions of Roya may indicate their de novo origin. Significantly, all zygnematophycean chloroplast genomes have undergone substantial genomic rearrangement, which may be the result of ancient retroelement activity evidenced by the presence of integrase-like and reverse transcriptase-like elements in the Roya chloroplast genome. Our results corroborate the close phylogenetic relationship between Zygnematophyceae and land plants and identify 89 protein-coding genes and 22 introns present in the chloroplast genome at the time of the evolutionary transition of plants to land, all of which can be found in the chloroplast genomes of extant charophytes. PMID:24682153

  6. Identifying candidate drivers of drug response in heterogeneous cancer by mining high throughput genomics data.

    PubMed

    Nabavi, Sheida

    2016-08-15

    With advances in technologies, huge amounts of multiple types of high-throughput genomics data are available. These data have tremendous potential to identify new and clinically valuable biomarkers to guide the diagnosis, assessment of prognosis, and treatment of complex diseases, such as cancer. Integrating, analyzing, and interpreting big and noisy genomics data to obtain biologically meaningful results, however, remains highly challenging. Mining genomics datasets by utilizing advanced computational methods can help to address these issues. To facilitate the identification of a short list of biologically meaningful genes as candidate drivers of anti-cancer drug resistance from an enormous amount of heterogeneous data, we employed statistical machine-learning techniques and integrated genomics datasets. We developed a computational method that integrates gene expression, somatic mutation, and copy number aberration data of sensitive and resistant tumors. In this method, an integrative method based on module network analysis is applied to identify potential driver genes. This is followed by cross-validation and a comparison of the results of sensitive and resistance groups to obtain the final list of candidate biomarkers. We applied this method to the ovarian cancer data from the cancer genome atlas. The final result contains biologically relevant genes, such as COL11A1, which has been reported as a cis-platinum resistant biomarker for epithelial ovarian carcinoma in several recent studies. The described method yields a short list of aberrant genes that also control the expression of their co-regulated genes. The results suggest that the unbiased data driven computational method can identify biologically relevant candidate biomarkers. It can be utilized in a wide range of applications that compare two conditions with highly heterogeneous datasets.

  7. Population genomics studies identify signatures of global dispersal and drug resistance in Plasmodium vivax

    PubMed Central

    Hupalo, Daniel N; Luo, Zunping; Melnikov, Alexandre; Sutton, Patrick L; Rogov, Peter; Escalante, Ananias; Vallejo, Andrés F; Herrera, Sócrates; Arévalo-Herrera, Myriam; Fan, Qi; Wang, Ying; Cui, Liwang; Lucas, Carmen M; Durand, Salomon; Sanchez, Juan F; Baldeviano, G Christian; Lescano, Andres G; Laman, Moses; Barnadas, Celine; Barry, Alyssa; Mueller, Ivo; Kazura, James W; Eapen, Alex; Kanagaraj, Deena; Valecha, Neena; Ferreira, Marcelo U; Roobsoong, Wanlapa; Nguitragool, Wang; Sattabonkot, Jetsumon; Gamboa, Dionicia; Kosek, Margaret; Vinetz, Joseph M; González-Cerón, Lilia; Birren, Bruce W; Neafsey, Daniel E; Carlton, Jane M

    2017-01-01

    Plasmodium vivax is a major public health burden, responsible for the majority of malaria infections outside Africa. We explored the impact of demographic history and selective pressures on the P. vivax genome by sequencing 182 clinical isolates sampled from 11 countries across the globe, using hybrid selection to overcome human DNA contamination. We confirmed previous reports of high genomic diversity in P. vivax relative to the more virulent Plasmodium falciparum species; regional populations of P. vivax exhibited greater diversity than the global P. falciparum population, indicating a large and/or stable population. Signals of natural selection suggest that P. vivax is evolving in response to antimalarial drugs and is adapting to regional differences in the human host and the mosquito vector. These findings underline the variable epidemiology of this parasite species and highlight the breadth of approaches that may be required to eliminate P. vivax globally. PMID:27348298

  8. Population genomics studies identify signatures of global dispersal and drug resistance in Plasmodium vivax.

    PubMed

    Hupalo, Daniel N; Luo, Zunping; Melnikov, Alexandre; Sutton, Patrick L; Rogov, Peter; Escalante, Ananias; Vallejo, Andrés F; Herrera, Sócrates; Arévalo-Herrera, Myriam; Fan, Qi; Wang, Ying; Cui, Liwang; Lucas, Carmen M; Durand, Salomon; Sanchez, Juan F; Baldeviano, G Christian; Lescano, Andres G; Laman, Moses; Barnadas, Celine; Barry, Alyssa; Mueller, Ivo; Kazura, James W; Eapen, Alex; Kanagaraj, Deena; Valecha, Neena; Ferreira, Marcelo U; Roobsoong, Wanlapa; Nguitragool, Wang; Sattabonkot, Jetsumon; Gamboa, Dionicia; Kosek, Margaret; Vinetz, Joseph M; González-Cerón, Lilia; Birren, Bruce W; Neafsey, Daniel E; Carlton, Jane M

    2016-08-01

    Plasmodium vivax is a major public health burden, responsible for the majority of malaria infections outside Africa. We explored the impact of demographic history and selective pressures on the P. vivax genome by sequencing 182 clinical isolates sampled from 11 countries across the globe, using hybrid selection to overcome human DNA contamination. We confirmed previous reports of high genomic diversity in P. vivax relative to the more virulent Plasmodium falciparum species; regional populations of P. vivax exhibited greater diversity than the global P. falciparum population, indicating a large and/or stable population. Signals of natural selection suggest that P. vivax is evolving in response to antimalarial drugs and is adapting to regional differences in the human host and the mosquito vector. These findings underline the variable epidemiology of this parasite species and highlight the breadth of approaches that may be required to eliminate P. vivax globally.

  9. Inverse PCR-based method for isolating novel SINEs from genome.

    PubMed

    Han, Yawei; Chen, Liping; Guan, Lihong; He, Shunping

    2014-04-01

    Short interspersed elements (SINEs) are moderately repetitive DNA sequences in eukaryotic genomes. Although eukaryotic genomes contain numerous SINEs copy, it is very difficult and laborious to isolate and identify them by the reported methods. In this study, the inverse PCR was successfully applied to isolate SINEs from Opsariichthys bidens genome in Eastern Asian Cyprinid. A group of SINEs derived from tRNA(Ala) molecular had been identified, which were named Opsar according to Opsariichthys. SINEs characteristics were exhibited in Opsar, which contained a tRNA(Ala)-derived region at the 5' end, a tRNA-unrelated region, and AT-rich region at the 3' end. The tRNA-derived region of Opsar shared 76 % sequence similarity with tRNA(Ala) gene. This result indicated that Opsar could derive from the inactive or pseudogene of tRNA(Ala). The reliability of method was tested by obtaining C-SINE, Ct-SINE, and M-SINEs from Ctenopharyngodon idellus, Megalobrama amblycephala, and Cyprinus carpio genomes. This method is simpler than the previously reported, which successfully omitted many steps, such as preparation of probes, construction of genomic libraries, and hybridization.

  10. Development of a multiplex RT-PCR assay for the identification of recombination types at different genomic regions of vaccine-derived polioviruses.

    PubMed

    Dimitriou, T G; Kyriakopoulou, Z; Tsakogiannis, D; Fikatas, A; Gartzonika, C; Levidiotou-Stefanou, S; Markoulatos, P

    2016-08-01

    Polioviruses (PVs) are the causal agents of acute paralytic poliomyelitis. Since the 1960s, poliomyelitis has been effectively controlled by the use of two vaccines containing all three serotypes of PVs, the inactivated poliovirus vaccine and the live attenuated oral poliovirus vaccine (OPV). Despite the success of OPV in polio eradication programme, a significant disadvantage was revealed: the emergence of vaccine-associated paralytic poliomyelitis (VAPP). VAPP is the result of accumulated mutations and putative recombination events located at the genome of attenuated vaccine Sabin strains. In the present study, ten Sabin isolates derived from OPV vaccinees and environmental samples were studied in order to identify recombination types located from VP1 to 3D genomic regions of virus genome. The experimental procedure that was followed was virus RNA extraction, reverse transcription to convert the virus genome into cDNA, PCR and multiplex-PCR using specific designed primers able to localize and identify each recombination following agarose gel electrophoresis. This multiplex RT-PCR assay allows for the immediate detection and identification of multiple recombination types located at the viral genome of OPV derivatives. After the eradication of wild PVs, the remaining sources of poliovirus infection worldwide would be the OPV derivatives. As a consequence, the immediate detection and molecular characterization of recombinant derivatives are important to avoid epidemics due to the circulation of neurovirulent viral strains.

  11. Genome-wide screen identifies a novel prognostic signature for breast cancer survival

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mao, Xuan Y.; Lee, Matthew J.; Zhu, Jeffrey

    Large genomic datasets in combination with clinical data can be used as an unbiased tool to identify genes important in patient survival and discover potential therapeutic targets. We used a genome-wide screen to identify 587 genes significantly and robustly deregulated across four independent breast cancer (BC) datasets compared to normal breast tissue. Gene expression of 381 genes was significantly associated with relapse-free survival (RFS) in BC patients. We used a gene co-expression network approach to visualize the genetic architecture in normal breast and BCs. In normal breast tissue, co-expression cliques were identified enriched for cell cycle, gene transcription, cell adhesion,more » cytoskeletal organization and metabolism. In contrast, in BC, only two major co-expression cliques were identified enriched for cell cycle-related processes or blood vessel development, cell adhesion and mammary gland development processes. Interestingly, gene expression levels of 7 genes were found to be negatively correlated with many cell cycle related genes, highlighting these genes as potential tumor suppressors and novel therapeutic targets. A forward-conditional Cox regression analysis was used to identify a 12-gene signature associated with RFS. A prognostic scoring system was created based on the 12-gene signature. This scoring system robustly predicted BC patient RFS in 60 sampling test sets and was further validated in TCGA and METABRIC BC data. Our integrated study identified a 12-gene prognostic signature that could guide adjuvant therapy for BC patients and includes novel potential molecular targets for therapy.« less

  12. Genome-wide screen identifies a novel prognostic signature for breast cancer survival

    DOE PAGES

    Mao, Xuan Y.; Lee, Matthew J.; Zhu, Jeffrey; ...

    2017-01-21

    Large genomic datasets in combination with clinical data can be used as an unbiased tool to identify genes important in patient survival and discover potential therapeutic targets. We used a genome-wide screen to identify 587 genes significantly and robustly deregulated across four independent breast cancer (BC) datasets compared to normal breast tissue. Gene expression of 381 genes was significantly associated with relapse-free survival (RFS) in BC patients. We used a gene co-expression network approach to visualize the genetic architecture in normal breast and BCs. In normal breast tissue, co-expression cliques were identified enriched for cell cycle, gene transcription, cell adhesion,more » cytoskeletal organization and metabolism. In contrast, in BC, only two major co-expression cliques were identified enriched for cell cycle-related processes or blood vessel development, cell adhesion and mammary gland development processes. Interestingly, gene expression levels of 7 genes were found to be negatively correlated with many cell cycle related genes, highlighting these genes as potential tumor suppressors and novel therapeutic targets. A forward-conditional Cox regression analysis was used to identify a 12-gene signature associated with RFS. A prognostic scoring system was created based on the 12-gene signature. This scoring system robustly predicted BC patient RFS in 60 sampling test sets and was further validated in TCGA and METABRIC BC data. Our integrated study identified a 12-gene prognostic signature that could guide adjuvant therapy for BC patients and includes novel potential molecular targets for therapy.« less

  13. Multi-region and single-cell sequencing reveal variable genomic heterogeneity in rectal cancer.

    PubMed

    Liu, Mingshan; Liu, Yang; Di, Jiabo; Su, Zhe; Yang, Hong; Jiang, Beihai; Wang, Zaozao; Zhuang, Meng; Bai, Fan; Su, Xiangqian

    2017-11-23

    Colorectal cancer is a heterogeneous group of malignancies with complex molecular subtypes. While colon cancer has been widely investigated, studies on rectal cancer are very limited. Here, we performed multi-region whole-exome sequencing and single-cell whole-genome sequencing to examine the genomic intratumor heterogeneity (ITH) of rectal tumors. We sequenced nine tumor regions and 88 single cells from two rectal cancer patients with tumors of the same molecular classification and characterized their mutation profiles and somatic copy number alterations (SCNAs) at the multi-region and the single-cell levels. A variable extent of genomic heterogeneity was observed between the two patients, and the degree of ITH increased when analyzed on the single-cell level. We found that major SCNAs were early events in cancer development and inherited steadily. Single-cell sequencing revealed mutations and SCNAs which were hidden in bulk sequencing. In summary, we studied the ITH of rectal cancer at regional and single-cell resolution and demonstrated that variable heterogeneity existed in two patients. The mutational scenarios and SCNA profiles of two patients with treatment naïve from the same molecular subtype are quite different. Our results suggest each tumor possesses its own architecture, which may result in different diagnosis, prognosis, and drug responses. Remarkable ITH exists in the two patients we have studied, providing a preliminary impression of ITH in rectal cancer.

  14. Genome-wide Association Study Identifies African-Specific Susceptibility Loci in African Americans with Inflammatory Bowel Disease

    PubMed Central

    Brant, Steven R.; Okou, David T.; Simpson, Claire L.; Cutler, David J.; Haritunians, Talin; Bradfield, Jonathan P.; Chopra, Pankaj; Prince, Jarod; Begum, Ferdouse; Kumar, Archana; Huang, Chengrui; Venkateswaran, Suresh; Datta, Lisa W.; Wei, Zhi; Thomas, Kelly; Herrinton, Lisa J.; Klapproth, Jan-Micheal A.; Quiros, Antonio J.; Seminerio, Jenifer; Liu, Zhenqiu; Alexander, Jonathan S.; Baldassano, Robert N.; Dudley-Brown, Sharon; Cross, Raymond K.; Dassopoulos, Themistocles; Denson, Lee A.; Dhere, Tanvi A.; Dryden, Gerald W.; Hanson, John S.; Hou, Jason K.; Hussain, Sunny Z.; Hyams, Jeffrey S.; Isaacs, Kim L.; Kader, Howard; Kappelman, Michael D.; Katz, Jeffry; Kellermayer, Richard; Kirschner, Barbara S.; Kuemmerle, John F.; Kwon, John H.; Lazarev, Mark; Li, Ellen; Mack, David; Mannon, Peter; Moulton, Dedrick E.; Newberry, Rodney D.; Osuntokun, Bankole O.; Patel, Ashish S.; Saeed, Shehzad A.; Targan, Stephan R.; Valentine, John F.; Wang, Ming-Hsi; Zonca, Martin; Rioux, John D.; Duerr, Richard H.; Silverberg, Mark S.; Cho, Judy H.; Hakonarson, Hakon; Zwick, Michael E.; McGovern, Dermot P.B.; Kugathasan, Subra

    2016-01-01

    Background & Aims The inflammatory bowel diseases (IBD) ulcerative colitis (UC) and Crohn’s disease (CD) cause significant morbidity and are increasing in prevalence among all populations, including African Americans. More than 200 susceptibility loci have been identified in populations of predominantly European ancestry, but few loci have been associated with IBD in other ethnicities. Methods We performed 2 high-density, genome-wide scans comprising 2345 cases of African Americans with IBD (1646 with CD, 583 with UC, and 116 inflammatory bowel disease unclassified [IBD-U]) and 5002 individuals without IBD (controls, identified from the Health Retirement Study and Kaiser Permanente database). Single-nucleotide polymorphisms (SNPs) associated at P<5.0×10−8 in meta-analysis with a nominal evidence (P<.05) in each scan were considered to have genome-wide significance. Results We detected SNPs at HLA-DRB1, and African-specific SNPs at ZNF649 and LSAMP, with associations of genome-wide significance for UC. We detected SNPs at USP25 with associations of genome-wide significance associations for IBD. No associations of genome-wide significance were detected for CD. In addition, 9 genes previously associated with IBD contained SNPs with significant evidence for replication (P<1.6×10−6): ADCY3, CXCR6, HLA-DRB1 to HLA-DQA1 (genome-wide significance on conditioning), IL12B, PTGER4, and TNC for IBD; IL23R, PTGER4, and SNX20 (in strong linkage disequilibrium with NOD2) for CD; and KCNQ2 (near TNFRSF6B) for UC. Several of these genes, such as TNC (near TNFSF15), CXCR6, and genes associated with IBD at the HLA locus, contained SNPs with unique association patterns with African-specific alleles. Conclusions We performed a genome-wide association study of African Americans with IBD and identified loci associated with CD and UC in only this population; we also replicated loci identified in European populations. The detection of variants associated with IBD risk in only

  15. Genome-Wide Association Study Identifies African-Specific Susceptibility Loci in African Americans With Inflammatory Bowel Disease.

    PubMed

    Brant, Steven R; Okou, David T; Simpson, Claire L; Cutler, David J; Haritunians, Talin; Bradfield, Jonathan P; Chopra, Pankaj; Prince, Jarod; Begum, Ferdouse; Kumar, Archana; Huang, Chengrui; Venkateswaran, Suresh; Datta, Lisa W; Wei, Zhi; Thomas, Kelly; Herrinton, Lisa J; Klapproth, Jan-Micheal A; Quiros, Antonio J; Seminerio, Jenifer; Liu, Zhenqiu; Alexander, Jonathan S; Baldassano, Robert N; Dudley-Brown, Sharon; Cross, Raymond K; Dassopoulos, Themistocles; Denson, Lee A; Dhere, Tanvi A; Dryden, Gerald W; Hanson, John S; Hou, Jason K; Hussain, Sunny Z; Hyams, Jeffrey S; Isaacs, Kim L; Kader, Howard; Kappelman, Michael D; Katz, Jeffry; Kellermayer, Richard; Kirschner, Barbara S; Kuemmerle, John F; Kwon, John H; Lazarev, Mark; Li, Ellen; Mack, David; Mannon, Peter; Moulton, Dedrick E; Newberry, Rodney D; Osuntokun, Bankole O; Patel, Ashish S; Saeed, Shehzad A; Targan, Stephan R; Valentine, John F; Wang, Ming-Hsi; Zonca, Martin; Rioux, John D; Duerr, Richard H; Silverberg, Mark S; Cho, Judy H; Hakonarson, Hakon; Zwick, Michael E; McGovern, Dermot P B; Kugathasan, Subra

    2017-01-01

    The inflammatory bowel diseases (IBD) ulcerative colitis (UC) and Crohn's disease (CD) cause significant morbidity and are increasing in prevalence among all populations, including African Americans. More than 200 susceptibility loci have been identified in populations of predominantly European ancestry, but few loci have been associated with IBD in other ethnicities. We performed 2 high-density, genome-wide scans comprising 2345 cases of African Americans with IBD (1646 with CD, 583 with UC, and 116 inflammatory bowel disease unclassified) and 5002 individuals without IBD (controls, identified from the Health Retirement Study and Kaiser Permanente database). Single-nucleotide polymorphisms (SNPs) associated at P < 5.0 × 10 -8 in meta-analysis with a nominal evidence (P < .05) in each scan were considered to have genome-wide significance. We detected SNPs at HLA-DRB1, and African-specific SNPs at ZNF649 and LSAMP, with associations of genome-wide significance for UC. We detected SNPs at USP25 with associations of genome-wide significance for IBD. No associations of genome-wide significance were detected for CD. In addition, 9 genes previously associated with IBD contained SNPs with significant evidence for replication (P < 1.6 × 10 -6 ): ADCY3, CXCR6, HLA-DRB1 to HLA-DQA1 (genome-wide significance on conditioning), IL12B,PTGER4, and TNC for IBD; IL23R, PTGER4, and SNX20 (in strong linkage disequilibrium with NOD2) for CD; and KCNQ2 (near TNFRSF6B) for UC. Several of these genes, such as TNC (near TNFSF15), CXCR6, and genes associated with IBD at the HLA locus, contained SNPs with unique association patterns with African-specific alleles. We performed a genome-wide association study of African Americans with IBD and identified loci associated with UC in only this population; we also replicated IBD, CD, and UC loci identified in European populations. The detection of variants associated with IBD risk in only people of African descent demonstrates the

  16. Investigation of 95 variants identified in a genome-wide study for association with mortality after acute coronary syndrome.

    PubMed

    Morgan, Thomas M; House, John A; Cresci, Sharon; Jones, Philip; Allayee, Hooman; Hazen, Stanley L; Patel, Yesha; Patel, Riyaz S; Eapen, Danny J; Waddy, Salina P; Quyyumi, Arshed A; Kleber, Marcus E; März, Winfried; Winkelmann, Bernhard R; Boehm, Bernhard O; Krumholz, Harlan M; Spertus, John A

    2011-09-29

    Genome-wide association studies (GWAS) have identified new candidate genes for the occurrence of acute coronary syndrome (ACS), but possible effects of such genes on survival following ACS have yet to be investigated. We examined 95 polymorphisms in 69 distinct gene regions identified in a GWAS for premature myocardial infarction for their association with post-ACS mortality among 811 whites recruited from university-affiliated hospitals in Kansas City, Missouri. We then sought replication of a positive genetic association in a large, racially diverse cohort of myocardial infarction patients (N = 2284) using Kaplan-Meier survival analyses and Cox regression to adjust for relevant covariates. Finally, we investigated the apparent association further in 6086 additional coronary artery disease patients. After Cox adjustment for other ACS risk factors, of 95 SNPs tested in 811 whites only the association with the rs6922269 in MTHFD1L was statistically significant, with a 2.6-fold mortality hazard (P = 0.007). The recessive A/A genotype was of borderline significance in an age- and race-adjusted analysis of the entire combined cohort (N = 3095; P = 0.052), but this finding was not confirmed in independent cohorts (N = 6086). We found no support for the hypothesis that the GWAS-identified variants in this study substantially alter the probability of post-ACS survival. Large-scale, collaborative, genome-wide studies may be required in order to detect genetic variants that are robustly associated with survival in patients with coronary artery disease.

  17. First complete genome sequence of infectious laryngotracheitis virus

    PubMed Central

    2011-01-01

    Background Infectious laryngotracheitis virus (ILTV) is an alphaherpesvirus that causes acute respiratory disease in chickens worldwide. To date, only one complete genomic sequence of ILTV has been reported. This sequence was generated by concatenating partial sequences from six different ILTV strains. Thus, the full genomic sequence of a single (individual) strain of ILTV has not been determined previously. This study aimed to use high throughput sequencing technology to determine the complete genomic sequence of a live attenuated vaccine strain of ILTV. Results The complete genomic sequence of the Serva vaccine strain of ILTV was determined, annotated and compared to the concatenated ILTV reference sequence. The genome size of the Serva strain was 152,628 bp, with a G + C content of 48%. A total of 80 predicted open reading frames were identified. The Serva strain had 96.5% DNA sequence identity with the concatenated ILTV sequence. Notably, the concatenated ILTV sequence was found to lack four large regions of sequence, including 528 bp and 594 bp of sequence in the UL29 and UL36 genes, respectively, and two copies of a 1,563 bp sequence in the repeat regions. Considerable differences in the size of the predicted translation products of 4 other genes (UL54, UL30, UL37 and UL38) were also identified. More than 530 single-nucleotide polymorphisms (SNPs) were identified. Most SNPs were located within three genomic regions, corresponding to sequence from the SA-2 ILTV vaccine strain in the concatenated ILTV sequence. Conclusions This is the first complete genomic sequence of an individual ILTV strain. This sequence will facilitate future comparative genomic studies of ILTV by providing an appropriate reference sequence for the sequence analysis of other ILTV strains. PMID:21501528

  18. Genome-wide meta-analysis identifies five new susceptibility loci for cutaneous malignant melanoma.

    PubMed

    Law, Matthew H; Bishop, D Timothy; Lee, Jeffrey E; Brossard, Myriam; Martin, Nicholas G; Moses, Eric K; Song, Fengju; Barrett, Jennifer H; Kumar, Rajiv; Easton, Douglas F; Pharoah, Paul D P; Swerdlow, Anthony J; Kypreou, Katerina P; Taylor, John C; Harland, Mark; Randerson-Moor, Juliette; Akslen, Lars A; Andresen, Per A; Avril, Marie-Françoise; Azizi, Esther; Scarrà, Giovanna Bianchi; Brown, Kevin M; Dębniak, Tadeusz; Duffy, David L; Elder, David E; Fang, Shenying; Friedman, Eitan; Galan, Pilar; Ghiorzo, Paola; Gillanders, Elizabeth M; Goldstein, Alisa M; Gruis, Nelleke A; Hansson, Johan; Helsing, Per; Hočevar, Marko; Höiom, Veronica; Ingvar, Christian; Kanetsky, Peter A; Chen, Wei V; Landi, Maria Teresa; Lang, Julie; Lathrop, G Mark; Lubiński, Jan; Mackie, Rona M; Mann, Graham J; Molven, Anders; Montgomery, Grant W; Novaković, Srdjan; Olsson, Håkan; Puig, Susana; Puig-Butille, Joan Anton; Qureshi, Abrar A; Radford-Smith, Graham L; van der Stoep, Nienke; van Doorn, Remco; Whiteman, David C; Craig, Jamie E; Schadendorf, Dirk; Simms, Lisa A; Burdon, Kathryn P; Nyholt, Dale R; Pooley, Karen A; Orr, Nick; Stratigos, Alexander J; Cust, Anne E; Ward, Sarah V; Hayward, Nicholas K; Han, Jiali; Schulze, Hans-Joachim; Dunning, Alison M; Bishop, Julia A Newton; Demenais, Florence; Amos, Christopher I; MacGregor, Stuart; Iles, Mark M

    2015-09-01

    Thirteen common susceptibility loci have been reproducibly associated with cutaneous malignant melanoma (CMM). We report the results of an international 2-stage meta-analysis of CMM genome-wide association studies (GWAS). This meta-analysis combines 11 GWAS (5 previously unpublished) and a further three stage 2 data sets, totaling 15,990 CMM cases and 26,409 controls. Five loci not previously associated with CMM risk reached genome-wide significance (P < 5 × 10(-8)), as did 2 previously reported but unreplicated loci and all 13 established loci. Newly associated SNPs fall within putative melanocyte regulatory elements, and bioinformatic and expression quantitative trait locus (eQTL) data highlight candidate genes in the associated regions, including one involved in telomere biology.

  19. Use of Whole-Genome Phylogeny and Comparisons for Development of a Multiplex PCR Assay To Identify Sequence Type 36 Vibrio parahaemolyticus.

    PubMed

    Whistler, Cheryl A; Hall, Jeffrey A; Xu, Feng; Ilyas, Saba; Siwakoti, Puskar; Cooper, Vaughn S; Jones, Stephen H

    2015-06-01

    Vibrio parahaemolyticus sequence type 36 (ST36) strains that are native to the Pacific Ocean have recently caused multistate outbreaks of gastroenteritis linked to shellfish harvested from the Atlantic Ocean. Whole-genome comparisons of 295 genomes of V. parahaemolyticus, including several traced to northeastern U.S. sources, were used to identify diagnostic loci, one putatively encoding an endonuclease (prp), and two others potentially conferring O-antigenic properties (cps and flp). The combination of all three loci was present in only one clade of closely related strains of ST36, ST59, and one additional unknown sequence type. However, each locus was also identified outside this clade, with prp and flp occurring in only two nonclade isolates and cps in four. Based on the distribution of these loci in sequenced genomes, prp identified clade strains with >99% accuracy, but the addition of one more locus increased accuracy to 100%. Oligonucleotide primers targeting prp and cps were combined in a multiplex PCR method that defines species using the tlh locus and determines the presence of both the tdh and trh hemolysin-encoding genes, which are also present in ST36. Application of the method in vitro to a collection of 94 clinical isolates collected over a 4-year period in three northeastern U.S. states and 87 environmental isolates revealed that the prp and cps amplicons were detected only in clinical isolates identified as belonging to the ST36 clade and in no environmental isolates from the region. The assay should improve detection and surveillance, thereby reducing infections. Copyright © 2015, American Society for Microbiology. All Rights Reserved.

  20. Genome Wide Association Identifies Common Variants at the SERPINA6/SERPINA1 Locus Influencing Plasma Cortisol and Corticosteroid Binding Globulin

    PubMed Central

    Direk, Nese; Lewis, John G.; Hammond, Geoffrey L.; Hill, Lesley A.; Anderson, Anna; Huffman, Jennifer; Wilson, James F.; Campbell, Harry; Rudan, Igor; Wright, Alan; Hastie, Nicholas; Wild, Sarah H.; Velders, Fleur P.; Hofman, Albert; Uitterlinden, Andre G.; Lahti, Jari; Räikkönen, Katri; Kajantie, Eero; Widen, Elisabeth; Palotie, Aarno; Eriksson, Johan G.; Kaakinen, Marika; Järvelin, Marjo-Riitta; Timpson, Nicholas J.; Davey Smith, George; Ring, Susan M.; Evans, David M.; St Pourcain, Beate; Tanaka, Toshiko; Milaneschi, Yuri; Bandinelli, Stefania; Ferrucci, Luigi; van der Harst, Pim; Rosmalen, Judith G. M.; Bakker, Stephen J. L.; Verweij, Niek; Dullaart, Robin P. F.; Mahajan, Anubha; Lindgren, Cecilia M.; Morris, Andrew; Lind, Lars; Ingelsson, Erik; Anderson, Laura N.; Pennell, Craig E.; Lye, Stephen J.; Matthews, Stephen G.; Eriksson, Joel; Mellstrom, Dan; Ohlsson, Claes; Price, Jackie F.; Strachan, Mark W. J.; Reynolds, Rebecca M.; Tiemeier, Henning; Walker, Brian R.

    2014-01-01

    Variation in plasma levels of cortisol, an essential hormone in the stress response, is associated in population-based studies with cardio-metabolic, inflammatory and neuro-cognitive traits and diseases. Heritability of plasma cortisol is estimated at 30–60% but no common genetic contribution has been identified. The CORtisol NETwork (CORNET) consortium undertook genome wide association meta-analysis for plasma cortisol in 12,597 Caucasian participants, replicated in 2,795 participants. The results indicate that <1% of variance in plasma cortisol is accounted for by genetic variation in a single region of chromosome 14. This locus spans SERPINA6, encoding corticosteroid binding globulin (CBG, the major cortisol-binding protein in plasma), and SERPINA1, encoding α1-antitrypsin (which inhibits cleavage of the reactive centre loop that releases cortisol from CBG). Three partially independent signals were identified within the region, represented by common SNPs; detailed biochemical investigation in a nested sub-cohort showed all these SNPs were associated with variation in total cortisol binding activity in plasma, but some variants influenced total CBG concentrations while the top hit (rs12589136) influenced the immunoreactivity of the reactive centre loop of CBG. Exome chip and 1000 Genomes imputation analysis of this locus in the CROATIA-Korcula cohort identified missense mutations in SERPINA6 and SERPINA1 that did not account for the effects of common variants. These findings reveal a novel common genetic source of variation in binding of cortisol by CBG, and reinforce the key role of CBG in determining plasma cortisol levels. In turn this genetic variation may contribute to cortisol-associated degenerative diseases. PMID:25010111

  1. Comparative genomic analysis of Helicobacter pylori from Malaysia identifies three distinct lineages suggestive of differential evolution

    PubMed Central

    Kumar, Narender; Mariappan, Vanitha; Baddam, Ramani; Lankapalli, Aditya K.; Shaik, Sabiha; Goh, Khean-Lee; Loke, Mun Fai; Perkins, Tim; Benghezal, Mohammed; Hasnain, Seyed E.; Vadivelu, Jamuna; Marshall, Barry J.; Ahmed, Niyaz

    2015-01-01

    The discordant prevalence of Helicobacter pylori and its related diseases, for a long time, fostered certain enigmatic situations observed in the countries of the southern world. Variation in H. pylori infection rates and disease outcomes among different populations in multi-ethnic Malaysia provides a unique opportunity to understand dynamics of host–pathogen interaction and genome evolution. In this study, we extensively analyzed and compared genomes of 27 Malaysian H. pylori isolates and identified three major phylogeographic lineages: hspEastAsia, hpEurope and hpSouthIndia. The analysis of the virulence genes within the core genome, however, revealed a comparable pathogenic potential of the strains. In addition, we identified four genes limited to strains of East-Asian lineage. Our analyses identified a few strain-specific genes encoding restriction modification systems and outlined 311 core genes possibly under differential evolutionary constraints, among the strains representing different ethnic groups. The cagA and vacA genes also showed variations in accordance with the host genetic background of the strains. Moreover, restriction modification genes were found to be significantly enriched in East-Asian strains. An understanding of these variations in the genome content would provide significant insights into various adaptive and host modulation strategies harnessed by H. pylori to effectively persist in a host-specific manner. PMID:25452339

  2. Mapping of the genomic regions controlling seed storability in soybean (Glycine max L.).

    PubMed

    Dargahi, Hamidreza; Tanya, Patcharin; Srinives, Peerasak

    2014-08-01

    Seed storability is especially important in the tropics due to high temperature and relative humidity of storage environment that cause rapid deterioration of seeds in storage. The objective of this study was to use SSR markers to identify genomic regions associated with quantitative trait loci (QTLs) controlling seed storability based on relative germination rate in the F2:3 population derived from a cross between vegetable soybean line (MJ0004-6) with poor longevity and landrace cultivar from Myanmar (R18500) with good longevity. The F2:4 seeds harvested in 2011 and 2012 were used to investigate seed storability. The F2 population was genotyped with 148 markers and the genetic map consisted of 128 SSR loci which converged into 38 linkage groups covering 1664.3 cM of soybean genome. Single marker analysis revealed that 13 markers from six linkage groups (C1, D2, E, F, J and L) were associated with seed storability. Composite interval mapping identified a total of three QTLs on linkage groups C1, F and L with phenotypic variance explained ranging from 8.79 to 13.43%. The R18500 alleles increased seed storability at all of the detected QTLs. No common QTLs were found for storability of seeds harvested in 2011 and 2012. This study agreed with previous reports in other crops that genotype by environment interaction plays an important role in expression of seed storability.

  3. Genomic Diversity of Lactobacillus salivarius▿ †

    PubMed Central

    Raftis, Emma J.; Salvetti, Elisa; Torriani, Sandra; Felis, Giovanna E.; O'Toole, Paul W.

    2011-01-01

    Strains of Lactobacillus salivarius are increasingly employed as probiotic agents for humans or animals. Despite the diversity of environmental sources from which they have been isolated, the genomic diversity of L. salivarius has been poorly characterized, and the implications of this diversity for strain selection have not been examined. To tackle this, we applied comparative genomic hybridization (CGH) and multilocus sequence typing (MLST) to 33 strains derived from humans, animals, or food. The CGH, based on total genome content, including small plasmids, identified 18 major regions of genomic variation, or hot spots for variation. Three major divisions were thus identified, with only a subset of the human isolates constituting an ecologically discernible group. Omission of the small plasmids from the CGH or analysis by MLST provided broadly concordant fine divisions and separated human-derived and animal-derived strains more clearly. The two gene clusters for exopolysaccharide (EPS) biosynthesis corresponded to regions of significant genomic diversity. The CGH-based groupings of these regions did not correlate with levels of production of bound or released EPS. Furthermore, EPS production was significantly modulated by available carbohydrate. In addition to proving difficult to predict from the gene content, EPS production levels correlated inversely with production of biofilms, a trait considered desirable in probiotic commensals. L. salivarius displays a high level of genomic diversity, and while selection of L. salivarius strains for probiotic use can be informed by CGH or MLST, it also requires pragmatic experimental validation of desired phenotypic traits. PMID:21131523

  4. Anonymizing patient genomic data for public sharing association studies.

    PubMed

    Fernandez-Lozano, Carlos; Lopez-Campos, Guillermo; Seoane, Jose A; Lopez-Alonso, Victoria; Dorado, Julian; Martín-Sanchez, Fernando; Pazos, Alejandro

    2013-01-01

    The development of personalized medicine is tightly linked with the correct exploitation of molecular data, especially those associated with the genome sequence along with these use of genomic data there is an increasing demand to share these data for research purposes. Transition of clinical data to research is based in the anonymization of these data so the patient cannot be identified, the use of genomic data poses a great challenge because its nature of identifying data. In this work we have analyzed current methods for genome anonymization and propose a one way encryption method that may enable the process of genomic data sharing accessing only to certain regions of genomes for research purposes.

  5. Whole Genome Analysis of Injectional Anthrax Identifies Two Disease Clusters Spanning More Than 13 Years.

    PubMed

    Keim, Paul; Grunow, Roland; Vipond, Richard; Grass, Gregor; Hoffmaster, Alex; Birdsell, Dawn N; Klee, Silke R; Pullan, Steven; Antwerpen, Markus; Bayer, Brittany N; Latham, Jennie; Wiggins, Kristin; Hepp, Crystal; Pearson, Talima; Brooks, Tim; Sahl, Jason; Wagner, David M

    2015-11-01

    Anthrax is a rare disease in humans but elicits great public fear because of its past use as an agent of bioterrorism. Injectional anthrax has been occurring sporadically for more than ten years in heroin consumers across multiple European countries and this outbreak has been difficult to trace back to a source. We took a molecular epidemiological approach in understanding this disease outbreak, including whole genome sequencing of Bacillus anthracis isolates from the anthrax victims. We also screened two large strain repositories for closely related strains to provide context to the outbreak. Analyzing 60 Bacillus anthracis isolates associated with injectional anthrax cases and closely related reference strains, we identified 1071 Single Nucleotide Polymorphisms (SNPs). The synapomorphic SNPs (350) were used to reconstruct phylogenetic relationships, infer likely epidemiological sources and explore the dynamics of evolving pathogen populations. Injectional anthrax genomes separated into two tight clusters: one group was exclusively associated with the 2009-10 outbreak and located primarily in Scotland, whereas the second comprised more recent (2012-13) cases but also a single Norwegian case from 2000. Genome-based differentiation of injectional anthrax isolates argues for at least two separate disease events spanning > 12 years. The genomic similarity of the two clusters makes it likely that they are caused by separate contamination events originating from the same geographic region and perhaps the same site of drug manufacturing or processing. Pathogen diversity within single patients challenges assumptions concerning population dynamics of infecting B. anthracis and host defensive barriers for injectional anthrax. This work was supported by the United States Department of Homeland Security grant no. HSHQDC-10-C-00,139 and via a binational cooperative agreement between the United States Government and the Government of Germany. This work was supported by funds

  6. Genomic analyses identify molecular subtypes of pancreatic cancer.

    PubMed

    Bailey, Peter; Chang, David K; Nones, Katia; Johns, Amber L; Patch, Ann-Marie; Gingras, Marie-Claude; Miller, David K; Christ, Angelika N; Bruxner, Tim J C; Quinn, Michael C; Nourse, Craig; Murtaugh, L Charles; Harliwong, Ivon; Idrisoglu, Senel; Manning, Suzanne; Nourbakhsh, Ehsan; Wani, Shivangi; Fink, Lynn; Holmes, Oliver; Chin, Venessa; Anderson, Matthew J; Kazakoff, Stephen; Leonard, Conrad; Newell, Felicity; Waddell, Nick; Wood, Scott; Xu, Qinying; Wilson, Peter J; Cloonan, Nicole; Kassahn, Karin S; Taylor, Darrin; Quek, Kelly; Robertson, Alan; Pantano, Lorena; Mincarelli, Laura; Sanchez, Luis N; Evers, Lisa; Wu, Jianmin; Pinese, Mark; Cowley, Mark J; Jones, Marc D; Colvin, Emily K; Nagrial, Adnan M; Humphrey, Emily S; Chantrill, Lorraine A; Mawson, Amanda; Humphris, Jeremy; Chou, Angela; Pajic, Marina; Scarlett, Christopher J; Pinho, Andreia V; Giry-Laterriere, Marc; Rooman, Ilse; Samra, Jaswinder S; Kench, James G; Lovell, Jessica A; Merrett, Neil D; Toon, Christopher W; Epari, Krishna; Nguyen, Nam Q; Barbour, Andrew; Zeps, Nikolajs; Moran-Jones, Kim; Jamieson, Nigel B; Graham, Janet S; Duthie, Fraser; Oien, Karin; Hair, Jane; Grützmann, Robert; Maitra, Anirban; Iacobuzio-Donahue, Christine A; Wolfgang, Christopher L; Morgan, Richard A; Lawlor, Rita T; Corbo, Vincenzo; Bassi, Claudio; Rusev, Borislav; Capelli, Paola; Salvia, Roberto; Tortora, Giampaolo; Mukhopadhyay, Debabrata; Petersen, Gloria M; Munzy, Donna M; Fisher, William E; Karim, Saadia A; Eshleman, James R; Hruban, Ralph H; Pilarsky, Christian; Morton, Jennifer P; Sansom, Owen J; Scarpa, Aldo; Musgrove, Elizabeth A; Bailey, Ulla-Maja Hagbo; Hofmann, Oliver; Sutherland, Robert L; Wheeler, David A; Gill, Anthony J; Gibbs, Richard A; Pearson, John V; Waddell, Nicola; Biankin, Andrew V; Grimmond, Sean M

    2016-03-03

    Integrated genomic analysis of 456 pancreatic ductal adenocarcinomas identified 32 recurrently mutated genes that aggregate into 10 pathways: KRAS, TGF-β, WNT, NOTCH, ROBO/SLIT signalling, G1/S transition, SWI-SNF, chromatin modification, DNA repair and RNA processing. Expression analysis defined 4 subtypes: (1) squamous; (2) pancreatic progenitor; (3) immunogenic; and (4) aberrantly differentiated endocrine exocrine (ADEX) that correlate with histopathological characteristics. Squamous tumours are enriched for TP53 and KDM6A mutations, upregulation of the TP63∆N transcriptional network, hypermethylation of pancreatic endodermal cell-fate determining genes and have a poor prognosis. Pancreatic progenitor tumours preferentially express genes involved in early pancreatic development (FOXA2/3, PDX1 and MNX1). ADEX tumours displayed upregulation of genes that regulate networks involved in KRAS activation, exocrine (NR5A2 and RBPJL), and endocrine differentiation (NEUROD1 and NKX2-2). Immunogenic tumours contained upregulated immune networks including pathways involved in acquired immune suppression. These data infer differences in the molecular evolution of pancreatic cancer subtypes and identify opportunities for therapeutic development.

  7. Fine-Mapping of Common Genetic Variants Associated with Colorectal Tumor Risk Identified Potential Functional Variants

    PubMed Central

    Gala, Manish; Abecasis, Goncalo; Bezieau, Stephane; Brenner, Hermann; Butterbach, Katja; Caan, Bette J.; Carlson, Christopher S.; Casey, Graham; Chang-Claude, Jenny; Conti, David V.; Curtis, Keith R.; Duggan, David; Gallinger, Steven; Haile, Robert W.; Harrison, Tabitha A.; Hayes, Richard B.; Hoffmeister, Michael; Hopper, John L.; Hudson, Thomas J.; Jenkins, Mark A.; Küry, Sébastien; Le Marchand, Loic; Leal, Suzanne M.; Newcomb, Polly A.; Nickerson, Deborah A.; Potter, John D.; Schoen, Robert E.; Schumacher, Fredrick R.; Seminara, Daniela; Slattery, Martha L.; Hsu, Li; Chan, Andrew T.; White, Emily; Berndt, Sonja I.; Peters, Ulrike

    2016-01-01

    Genome-wide association studies (GWAS) have identified many common single nucleotide polymorphisms (SNPs) associated with colorectal cancer risk. These SNPs may tag correlated variants with biological importance. Fine-mapping around GWAS loci can facilitate detection of functional candidates and additional independent risk variants. We analyzed 11,900 cases and 14,311 controls in the Genetics and Epidemiology of Colorectal Cancer Consortium and the Colon Cancer Family Registry. To fine-map genomic regions containing all known common risk variants, we imputed high-density genetic data from the 1000 Genomes Project. We tested single-variant associations with colorectal tumor risk for all variants spanning genomic regions 250-kb upstream or downstream of 31 GWAS-identified SNPs (index SNPs). We queried the University of California, Santa Cruz Genome Browser to examine evidence for biological function. Index SNPs did not show the strongest association signals with colorectal tumor risk in their respective genomic regions. Bioinformatics analysis of SNPs showing smaller P-values in each region revealed 21 functional candidates in 12 loci (5q31.1, 8q24, 11q13.4, 11q23, 12p13.32, 12q24.21, 14q22.2, 15q13, 18q21, 19q13.1, 20p12.3, and 20q13.33). We did not observe evidence of additional independent association signals in GWAS-identified regions. Our results support the utility of integrating data from comprehensive fine-mapping with expanding publicly available genomic databases to help clarify GWAS associations and identify functional candidates that warrant more onerous laboratory follow-up. Such efforts may aid the eventual discovery of disease-causing variant(s). PMID:27379672

  8. Genome wide association study and genomic prediction for fatty acid composition in Chinese Simmental beef cattle using high density SNP array.

    PubMed

    Zhu, Bo; Niu, Hong; Zhang, Wengang; Wang, Zezhao; Liang, Yonghu; Guan, Long; Guo, Peng; Chen, Yan; Zhang, Lupei; Guo, Yong; Ni, Heming; Gao, Xue; Gao, Huijiang; Xu, Lingyang; Li, Junya

    2017-06-14

    Fatty acid composition of muscle is an important trait contributing to meat quality. Recently, genome-wide association study (GWAS) has been extensively used to explore the molecular mechanism underlying important traits in cattle. In this study, we performed GWAS using high density SNP array to analyze the association between SNPs and fatty acids and evaluated the accuracy of genomic prediction for fatty acids in Chinese Simmental cattle. Using the BayesB method, we identified 35 and 7 regions in Chinese Simmental cattle that displayed significant associations with individual fatty acids and fatty acid groups, respectively. We further obtained several candidate genes which may be involved in fatty acid biosynthesis including elongation of very long chain fatty acids protein 5 (ELOVL5), fatty acid synthase (FASN), caspase 2 (CASP2) and thyroglobulin (TG). Specifically, we obtained strong evidence of association signals for one SNP located at 51.3 Mb for FASN using Genome-wide Rapid Association Mixed Model and Regression-Genomic Control (GRAMMAR-GC) approaches. Also, region-based association test identified multiple SNPs within FASN and ELOVL5 for C14:0. In addition, our result revealed that the effectiveness of genomic prediction for fatty acid composition using BayesB was slightly superior over GBLUP in Chinese Simmental cattle. We identified several significantly associated regions and loci which can be considered as potential candidate markers for genomics-assisted breeding programs. Using multiple methods, our results revealed that FASN and ELOVL5 are associated with fatty acids with strong evidence. Our finding also suggested that it is feasible to perform genomic selection for fatty acids in Chinese Simmental cattle.

  9. Leveraging Comparative Genomics to Identify and Functionally Characterize Genes Associated with Sperm Phenotypes in Python bivittatus (Burmese Python)

    PubMed Central

    Rutllant, Josep

    2016-01-01

    Comparative genomics approaches provide a means of leveraging functional genomics information from a highly annotated model organism's genome (such as the mouse genome) in order to make physiological inferences about the role of genes and proteins in a less characterized organism's genome (such as the Burmese python). We employed a comparative genomics approach to produce the functional annotation of Python bivittatus genes encoding proteins associated with sperm phenotypes. We identify 129 gene-phenotype relationships in the python which are implicated in 10 specific sperm phenotypes. Results obtained through our systematic analysis identified subsets of python genes exhibiting associations with gene ontology annotation terms. Functional annotation data was represented in a semantic scatter plot. Together, these newly annotated Python bivittatus genome resources provide a high resolution framework from which the biology relating to reptile spermatogenesis, fertility, and reproduction can be further investigated. Applications of our research include (1) production of genetic diagnostics for assessing fertility in domestic and wild reptiles; (2) enhanced assisted reproduction technology for endangered and captive reptiles; and (3) novel molecular targets for biotechnology-based approaches aimed at reducing fertility and reproduction of invasive reptiles. Additional enhancements to reptile genomic resources will further enhance their value. PMID:27200191

  10. Comparative genomics approach to detecting split-coding regions in a low-coverage genome: lessons from the chimaera Callorhinchus milii (Holocephali, Chondrichthyes).

    PubMed

    Dessimoz, Christophe; Zoller, Stefan; Manousaki, Tereza; Qiu, Huan; Meyer, Axel; Kuraku, Shigehiro

    2011-09-01

    Recent development of deep sequencing technologies has facilitated de novo genome sequencing projects, now conducted even by individual laboratories. However, this will yield more and more genome sequences that are not well assembled, and will hinder thorough annotation when no closely related reference genome is available. One of the challenging issues is the identification of protein-coding sequences split into multiple unassembled genomic segments, which can confound orthology assignment and various laboratory experiments requiring the identification of individual genes. In this study, using the genome of a cartilaginous fish, Callorhinchus milii, as test case, we performed gene prediction using a model specifically trained for this genome. We implemented an algorithm, designated ESPRIT, to identify possible linkages between multiple protein-coding portions derived from a single genomic locus split into multiple unassembled genomic segments. We developed a validation framework based on an artificially fragmented human genome, improvements between early and recent mouse genome assemblies, comparison with experimentally validated sequences from GenBank, and phylogenetic analyses. Our strategy provided insights into practical solutions for efficient annotation of only partially sequenced (low-coverage) genomes. To our knowledge, our study is the first formulation of a method to link unassembled genomic segments based on proteomes of relatively distantly related species as references.

  11. Comparative genomics approach to detecting split-coding regions in a low-coverage genome: lessons from the chimaera Callorhinchus milii (Holocephali, Chondrichthyes)

    PubMed Central

    Zoller, Stefan; Manousaki, Tereza; Qiu, Huan; Meyer, Axel; Kuraku, Shigehiro

    2011-01-01

    Recent development of deep sequencing technologies has facilitated de novo genome sequencing projects, now conducted even by individual laboratories. However, this will yield more and more genome sequences that are not well assembled, and will hinder thorough annotation when no closely related reference genome is available. One of the challenging issues is the identification of protein-coding sequences split into multiple unassembled genomic segments, which can confound orthology assignment and various laboratory experiments requiring the identification of individual genes. In this study, using the genome of a cartilaginous fish, Callorhinchus milii, as test case, we performed gene prediction using a model specifically trained for this genome. We implemented an algorithm, designated ESPRIT, to identify possible linkages between multiple protein-coding portions derived from a single genomic locus split into multiple unassembled genomic segments. We developed a validation framework based on an artificially fragmented human genome, improvements between early and recent mouse genome assemblies, comparison with experimentally validated sequences from GenBank, and phylogenetic analyses. Our strategy provided insights into practical solutions for efficient annotation of only partially sequenced (low-coverage) genomes. To our knowledge, our study is the first formulation of a method to link unassembled genomic segments based on proteomes of relatively distantly related species as references. PMID:21712341

  12. Computational Approaches to Identify Promoters and cis-Regulatory Elements in Plant Genomes1

    PubMed Central

    Rombauts, Stephane; Florquin, Kobe; Lescot, Magali; Marchal, Kathleen; Rouzé, Pierre; Van de Peer, Yves

    2003-01-01

    The identification of promoters and their regulatory elements is one of the major challenges in bioinformatics and integrates comparative, structural, and functional genomics. Many different approaches have been developed to detect conserved motifs in a set of genes that are either coregulated or orthologous. However, although recent approaches seem promising, in general, unambiguous identification of regulatory elements is not straightforward. The delineation of promoters is even harder, due to its complex nature, and in silico promoter prediction is still in its infancy. Here, we review the different approaches that have been developed for identifying promoters and their regulatory elements. We discuss the detection of cis-acting regulatory elements using word-counting or probabilistic methods (so-called “search by signal” methods) and the delineation of promoters by considering both sequence content and structural features (“search by content” methods). As an example of search by content, we explored in greater detail the association of promoters with CpG islands. However, due to differences in sequence content, the parameters used to detect CpG islands in humans and other vertebrates cannot be used for plants. Therefore, a preliminary attempt was made to define parameters that could possibly define CpG and CpNpG islands in Arabidopsis, by exploring the compositional landscape around the transcriptional start site. To this end, a data set of more than 5,000 gene sequences was built, including the promoter region, the 5′-untranslated region, and the first introns and coding exons. Preliminary analysis shows that promoter location based on the detection of potential CpG/CpNpG islands in the Arabidopsis genome is not straightforward. Nevertheless, because the landscape of CpG/CpNpG islands differs considerably between promoters and introns on the one side and exons (whether coding or not) on the other, more sophisticated approaches can probably be

  13. Mitochondrial genome sequences and comparative genomics ofPhytophthora ramorum and P. sojae

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Martin, Frank N.; Douda, Bensasson; Tyler, Brett M.

    The complete sequences of the mitochondrial genomes of theoomycetes of Phytophthora ramorum and P. sojae were determined during thecourse of their complete nuclear genome sequencing (Tyler, et al. 2006).Both are circular, with sizes of 39,314 bp for P. ramorum and 42,975 bpfor P. sojae. Each contains a total of 37 identifiable protein-encodinggenes, 25 or 26 tRNAs (P. sojae and P. ramorum, respectively)specifying19 amino acids, and a variable number of ORFs (7 for P. ramorum and 12for P. sojae) which are potentially additional functional genes.Non-coding regions comprise approximately 11.5 percent and 18.4 percentof the genomes of P. ramorum and P. sojae,more » respectively. Relative to P.sojae, there is an inverted repeat of 1,150 bp in P. ramorum thatincludes an unassigned unique ORF, a tRNA gene, and adjacent non-codingsequences, but otherwise the gene order in both species is identical.Comparisons of these genomes with published sequences of the P. infestansmitochondrial genome reveals a number of similarities, but the gene orderin P. infestans differs in two adjacent locations due to inversions.Sequence alignments of the three genomes indicated sequence conservationranging from 75 to 85 percent and that specific regions were morevariable than others.« less

  14. Novel bacteriophages containing a genome of another bacteriophage within their genomes.

    PubMed

    Swanson, Maud M; Reavy, Brian; Makarova, Kira S; Cock, Peter J; Hopkins, David W; Torrance, Lesley; Koonin, Eugene V; Taliansky, Michael

    2012-01-01

    A novel bacteriophage infecting Staphylococus pasteuri was isolated during a screen for phages in Antarctic soils. The phage named SpaA1 is morphologically similar to phages of the family Siphoviridae. The 42,784 bp genome of SpaA1 is a linear, double-stranded DNA molecule with 3' protruding cohesive ends. The SpaA1 genome encompasses 63 predicted protein-coding genes which cluster within three regions of the genome, each of apparently different origin, in a mosaic pattern. In two of these regions, the gene sets resemble those in prophages of Bacillus thuringiensis kurstaki str. T03a001 (genes involved in DNA replication/transcription, cell entry and exit) and B. cereus AH676 (additional regulatory and recombination genes), respectively. The third region represents an almost complete genome (except for the short terminal segments) of a distinct bacteriophage, MZTP02. Nearly the same gene module was identified in prophages of B. thuringiensis serovar monterrey BGSC 4AJ1 and B. cereus Rock4-2. These findings suggest that MZTP02 can be shuttled between genomes of other bacteriophages and prophages, leading to the formation of chimeric genomes. The presence of a complete phage genome in the genome of other phages apparently has not been described previously and might represent a 'fast track' route of virus evolution and horizontal gene transfer. Another phage (BceA1) nearly identical in sequence to SpaA1, and also including the almost complete MZTP02 genome within its own genome, was isolated from a bacterium of the B. cereus/B. thuringiensis group. Remarkably, both SpaA1 and BceA1 phages can infect B. cereus and B. thuringiensis, but only one of them, SpaA1, can infect S. pasteuri. This finding is best compatible with a scenario in which MZTP02 was originally contained in BceA1 infecting Bacillus spp, the common hosts for these two phages, followed by emergence of SpaA1 infecting S. pasteuri.

  15. Identifying tagging SNPs for African specific genetic variation from the African Diaspora Genome

    PubMed Central

    Johnston, Henry Richard; Hu, Yi-Juan; Gao, Jingjing; O’Connor, Timothy D.; Abecasis, Gonçalo R.; Wojcik, Genevieve L; Gignoux, Christopher R.; Gourraud, Pierre-Antoine; Lizee, Antoine; Hansen, Mark; Genuario, Rob; Bullis, Dave; Lawley, Cindy; Kenny, Eimear E.; Bustamante, Carlos; Beaty, Terri H.; Mathias, Rasika A.; Barnes, Kathleen C.; Qin, Zhaohui S.; Preethi Boorgula, Meher; Campbell, Monica; Chavan, Sameer; Ford, Jean G.; Foster, Cassandra; Gao, Li; Hansel, Nadia N.; Horowitz, Edward; Huang, Lili; Ortiz, Romina; Potee, Joseph; Rafaels, Nicholas; Ruczinski, Ingo; Scott, Alan F.; Taub, Margaret A.; Vergara, Candelaria; Levin, Albert M.; Padhukasahasram, Badri; Williams, L. Keoki; Dunston, Georgia M.; Faruque, Mezbah U.; Gietzen, Kimberly; Deshpande, Aniket; Grus, Wendy E.; Locke, Devin P.; Foreman, Marilyn G.; Avila, Pedro C.; Grammer, Leslie; Kim, Kwang-Youn A.; Kumar, Rajesh; Schleimer, Robert; De La Vega, Francisco M.; Shringarpure, Suyash S.; Musharoff, Shaila; Burchard, Esteban G.; Eng, Celeste; Hernandez, Ryan D.; Pino-Yanes, Maria; Torgerson, Dara G.; Szpiech, Zachary A.; Torres, Raul; Nicolae, Dan L.; Ober, Carole; Olopade, Christopher O; Olopade, Olufunmilayo; Oluwole, Oluwafemi; Arinola, Ganiyu; Song, Wei; Correa, Adolfo; Musani, Solomon; Wilson, James G.; Lange, Leslie A.; Akey, Joshua; Bamshad, Michael; Chong, Jessica; Fu, Wenqing; Nickerson, Deborah; Reiner, Alexander; Hartert, Tina; Ware, Lorraine B.; Bleecker, Eugene; Meyers, Deborah; Ortega, Victor E.; Maul, Pissamai; Maul, Trevor; Watson, Harold; Ilma Araujo, Maria; Riccio Oliveira, Ricardo; Caraballo, Luis; Marrugo, Javier; Martinez, Beatriz; Meza, Catherine; Ayestas, Gerardo; Francisco Herrera-Paz, Edwin; Landaverde-Torres, Pamela; Erazo, Said Omar Leiva; Martinez, Rosella; Mayorga, Alvaro; Mayorga, Luis F.; Mejia-Mejia, Delmy-Aracely; Ramos, Hector; Saenz, Allan; Varela, Gloria; Marina Vasquez, Olga; Ferguson, Trevor; Knight-Madden, Jennifer; Samms-Vaughan, Maureen; Wilks, Rainford J.; Adegnika, Akim; Ateba-Ngoa, Ulysse; Yazdanbakhsh, Maria

    2017-01-01

    A primary goal of The Consortium on Asthma among African-ancestry Populations in the Americas (CAAPA) is to develop an ‘African Diaspora Power Chip’ (ADPC), a genotyping array consisting of tagging SNPs, useful in comprehensively identifying African specific genetic variation. This array is designed based on the novel variation identified in 642 CAAPA samples of African ancestry with high coverage whole genome sequence data (~30× depth). This novel variation extends the pattern of variation catalogued in the 1000 Genomes and Exome Sequencing Projects to a spectrum of populations representing the wide range of West African genomic diversity. These individuals from CAAPA also comprise a large swath of the African Diaspora population and incorporate historical genetic diversity covering nearly the entire Atlantic coast of the Americas. Here we show the results of designing and producing such a microchip array. This novel array covers African specific variation far better than other commercially available arrays, and will enable better GWAS analyses for researchers with individuals of African descent in their study populations. A recent study cataloging variation in continental African populations suggests this type of African-specific genotyping array is both necessary and valuable for facilitating large-scale GWAS in populations of African ancestry. PMID:28429804

  16. Exploiting genomic data to identify proteins involved in abalone reproduction.

    PubMed

    Mendoza-Porras, Omar; Botwright, Natasha A; McWilliam, Sean M; Cook, Mathew T; Harris, James O; Wijffels, Gene; Colgrave, Michelle L

    2014-08-28

    Aside from their critical role in reproduction, abalone gonads serve as an indicator of sexual maturity and energy balance, two key considerations for effective abalone culture. Temperate abalone farmers face issues with tank restocking with highly marketable abalone owing to inefficient spawning induction methods. The identification of key proteins in sexually mature abalone will serve as the foundation for a greater understanding of reproductive biology. Addressing this knowledge gap is the first step towards improving abalone aquaculture methods. Proteomic profiling of female and male gonads of greenlip abalone, Haliotis laevigata, was undertaken using liquid chromatography-mass spectrometry. Owing to the incomplete nature of abalone protein databases, in addition to searching against two publicly available databases, a custom database comprising genomic data was used. Overall, 162 and 110 proteins were identified in females and males respectively with 40 proteins common to both sexes. For proteins involved in sexual maturation, sperm and egg structure, motility, acrosomal reaction and fertilization, 23 were identified only in females, 18 only in males and 6 were common. Gene ontology analysis revealed clear differences between the female and male protein profiles reflecting a higher rate of protein synthesis in the ovary and higher metabolic activity in the testis. A comprehensive mass spectrometry-based analysis was performed to profile the abalone gonad proteome providing the foundation for future studies of reproduction in abalone. Key proteins involved in both reproduction and energy balance were identified. Genomic resources were utilised to build a database of molluscan proteins yielding >60% more protein identifications than in a standard workflow employing public protein databases. Copyright © 2014 Elsevier B.V. All rights reserved.

  17. Automated Update, Revision, and Quality Control of the Maize Genome Annotations Using MAKER-P Improves the B73 RefGen_v3 Gene Models and Identifies New Genes1[OPEN

    PubMed Central

    Law, MeiYee; Childs, Kevin L.; Campbell, Michael S.; Stein, Joshua C.; Olson, Andrew J.; Holt, Carson; Panchy, Nicholas; Lei, Jikai; Jiao, Dian; Andorf, Carson M.; Lawrence, Carolyn J.; Ware, Doreen; Shiu, Shin-Han; Sun, Yanni; Jiang, Ning; Yandell, Mark

    2015-01-01

    The large size and relative complexity of many plant genomes make creation, quality control, and dissemination of high-quality gene structure annotations challenging. In response, we have developed MAKER-P, a fast and easy-to-use genome annotation engine for plants. Here, we report the use of MAKER-P to update and revise the maize (Zea mays) B73 RefGen_v3 annotation build (5b+) in less than 3 h using the iPlant Cyberinfrastructure. MAKER-P identified and annotated 4,466 additional, well-supported protein-coding genes not present in the 5b+ annotation build, added additional untranslated regions to 1,393 5b+ gene models, identified 2,647 5b+ gene models that lack any supporting evidence (despite the use of large and diverse evidence data sets), identified 104,215 pseudogene fragments, and created an additional 2,522 noncoding gene annotations. We also describe a method for de novo training of MAKER-P for the annotation of newly sequenced grass genomes. Collectively, these results lead to the 6a maize genome annotation and demonstrate the utility of MAKER-P for rapid annotation, management, and quality control of grasses and other difficult-to-annotate plant genomes. PMID:25384563

  18. Origin of the CMS gene locus in rapeseed cybrid mitochondria: active and inactive recombination produces the complex CMS gene region in the mitochondrial genomes of Brassicaceae.

    PubMed

    Oshima, Masao; Kikuchi, Rie; Imamura, Jun; Handa, Hirokazu

    2010-01-01

    CMS (cytoplasmic male sterile) rapeseed is produced by asymmetrical somatic cell fusion between the Brassica napus cv. Westar and the Raphanus sativus Kosena CMS line (Kosena radish). The CMS rapeseed contains a CMS gene, orf125, which is derived from Kosena radish. Our sequence analyses revealed that the orf125 region in CMS rapeseed originated from recombination between the orf125/orfB region and the nad1C/ccmFN1 region by way of a 63 bp repeat. A precise sequence comparison among the related sequences in CMS rapeseed, Kosena radish and normal rapeseed showed that the orf125 region in CMS rapeseed consisted of the Kosena orf125/orfB region and the rapeseed nad1C/ccmFN1 region, even though Kosena radish had both the orf125/orfB region and the nad1C/ccmFN1 region in its mitochondrial genome. We also identified three tandem repeat sequences in the regions surrounding orf125, including a 63 bp repeat, which were involved in several recombination events. Interestingly, differences in the recombination activity for each repeat sequence were observed, even though these sequences were located adjacent to each other in the mitochondrial genome. We report results indicating that recombination events within the mitochondrial genomes are regulated at the level of specific repeat sequences depending on the cellular environment.

  19. Intra-Genomic Internal Transcribed Spacer Region Sequence Heterogeneity and Molecular Diagnosis in Clinical Microbiology

    PubMed Central

    Zhao, Ying; Tsang, Chi-Ching; Xiao, Meng; Cheng, Jingwei; Xu, Yingchun; Lau, Susanna K. P.; Woo, Patrick C. Y.

    2015-01-01

    Internal transcribed spacer region (ITS) sequencing is the most extensively used technology for accurate molecular identification of fungal pathogens in clinical microbiology laboratories. Intra-genomic ITS sequence heterogeneity, which makes fungal identification based on direct sequencing of PCR products difficult, has rarely been reported in pathogenic fungi. During the process of performing ITS sequencing on 71 yeast strains isolated from various clinical specimens, direct sequencing of the PCR products showed ambiguous sequences in six of them. After cloning the PCR products into plasmids for sequencing, interpretable sequencing electropherograms could be obtained. For each of the six isolates, 10–49 clones were selected for sequencing and two to seven intra-genomic ITS copies were detected. The identities of these six isolates were confirmed to be Candida glabrata (n = 2), Pichia (Candida) norvegensis (n = 2), Candida tropicalis (n = 1) and Saccharomyces cerevisiae (n = 1). Multiple sequence alignment revealed that one to four intra-genomic ITS polymorphic sites were present in the six isolates, and all these polymorphic sites were located in the ITS1 and/or ITS2 regions. We report and describe the first evidence of intra-genomic ITS sequence heterogeneity in four different pathogenic yeasts, which occurred exclusively in the ITS1 and ITS2 spacer regions for the six isolates in this study. PMID:26506340

  20. A Genome Wide Association Study Identifies Common Variants Associated with Lipid Levels in the Chinese Population

    PubMed Central

    Wu, Chen; Yang, Handong; Yu, Dianke; Yang, Xiaobo; Zhang, Xiaomin; Wang, Yiqin; Sun, Jielin; Gao, Yong; Tan, Aihua; He, Yunfeng; Zhang, Haiying; Qin, Xue; Zhu, Jingwen; Li, Huaixing; Lin, Xu; Zhu, Jiang; Min, Xinwen; Lang, Mingjian; Li, Dongfeng; Zhai, Kan; Chang, Jiang; Tan, Wen; Yuan, Jing; Chen, Weihong; Wang, Youjie; Wei, Sheng; Miao, Xiaoping; Wang, Feng; Fang, Weimin; Liang, Yuan; Deng, Qifei; Dai, Xiayun; Lin, Dafeng; Huang, Suli; Guo, Huan; Lilly Zheng, S.; Xu, Jianfeng; Lin, Dongxin; Hu, Frank B.; Wu, Tangchun

    2013-01-01

    Plasma lipid levels are important risk factors for cardiovascular disease and are influenced by genetic and environmental factors. Recent genome wide association studies (GWAS) have identified several lipid-associated loci, but these loci have been identified primarily in European populations. In order to identify genetic markers for lipid levels in a Chinese population and analyze the heterogeneity between Europeans and Asians, especially Chinese, we performed a meta-analysis of two genome wide association studies on four common lipid traits including total cholesterol (TC), triglycerides (TG), low-density lipoprotein cholesterol (LDL) and high-density lipoprotein cholesterol (HDL) in a Han Chinese population totaling 3,451 healthy subjects. Replication was performed in an additional 8,830 subjects of Han Chinese ethnicity. We replicated eight loci associated with lipid levels previously reported in a European population. The loci genome wide significantly associated with TC were near DOCK7, HMGCR and ABO; those genome wide significantly associated with TG were near APOA1/C3/A4/A5 and LPL; those genome wide significantly associated with LDL were near HMGCR, ABO and TOMM40; and those genome wide significantly associated with HDL were near LPL, LIPC and CETP. In addition, an additive genotype score of eight SNPs representing the eight loci that were found to be associated with lipid levels was associated with higher TC, TG and LDL levels (P = 5.52×10-16, 1.38×10-6 and 5.59×10-9, respectively). These findings suggest the cumulative effects of multiple genetic loci on plasma lipid levels. Comparisons with previous GWAS of lipids highlight heterogeneity in allele frequency and in effect size for some loci between Chinese and European populations. The results from our GWAS provided comprehensive and convincing evidence of the genetic determinants of plasma lipid levels in a Chinese population. PMID:24386095

  1. Identifying genomic and developmental causes of adverse drug reactions in children

    PubMed Central

    Becker, Mara L; Leeder, J Steven

    2011-01-01

    Adverse drug reactions are a concern for all clinicians who utilize medications to treat adults and children; however, the frequency of adult and pediatric adverse drug reactions is likely to be under-reported. In this age of genomics and personalized medicine, identifying genetic variation that results in differences in drug biotransformation and response has contributed to significant advances in the utilization of several commonly used medications in adults. In order to better understand the variability of drug response in children however, we must not only consider differences in genotype, but also variation in gene expression during growth and development, namely ontogeny. In this article, recommendations for systematically approaching pharmacogenomic studies in children are discussed, and several examples of studies that investigate the genomic and developmental contribution to adverse drug reactions in children are reviewed. PMID:21121777

  2. Complete genome sequence of the fire blight pathogen Erwinia pyrifoliae DSM 12163T and comparative genomic insights into plant pathogenicity

    PubMed Central

    2010-01-01

    Background Erwinia pyrifoliae is a newly described necrotrophic pathogen, which causes fire blight on Asian (Nashi) pear and is geographically restricted to Eastern Asia. Relatively little is known about its genetics compared to the closely related main fire blight pathogen E. amylovora. Results The genome of the type strain of E. pyrifoliae strain DSM 12163T, was sequenced using both 454 and Solexa pyrosequencing and annotated. The genome contains a circular chromosome of 4.026 Mb and four small plasmids. Based on their respective role in virulence in E. amylovora or related organisms, we identified several putative virulence factors, including type III and type VI secretion systems and their effectors, flagellar genes, sorbitol metabolism, iron uptake determinants, and quorum-sensing components. A deletion in the rpoS gene covering the most conserved region of the protein was identified which may contribute to the difference in virulence/host-range compared to E. amylovora. Comparative genomics with the pome fruit epiphyte Erwinia tasmaniensis Et1/99 showed that both species are overall highly similar, although specific differences were identified, for example the presence of some phage gene-containing regions and a high number of putative genomic islands containing transposases in the E. pyrifoliae DSM 12163T genome. Conclusions The E. pyrifoliae genome is an important addition to the published genome of E. tasmaniensis and the unfinished genome of E. amylovora providing a foundation for re-sequencing additional strains that may shed light on the evolution of the host-range and virulence/pathogenicity of this important group of plant-associated bacteria. PMID:20047678

  3. Genome-Wide Association Study Identifies NBS-LRR-Encoding Genes Related with Anthracnose and Common Bacterial Blight in the Common Bean.

    PubMed

    Wu, Jing; Zhu, Jifeng; Wang, Lanfen; Wang, Shumin

    2017-01-01

    Nucleotide-binding site and leucine-rich repeat (NBS-LRR) genes represent the largest and most important disease resistance genes in plants. The genome sequence of the common bean ( Phaseolus vulgaris L.) provides valuable data for determining the genomic organization of NBS-LRR genes. However, data on the NBS-LRR genes in the common bean are limited. In total, 178 NBS-LRR-type genes and 145 partial genes (with or without a NBS) located on 11 common bean chromosomes were identified from genome sequences database. Furthermore, 30 NBS-LRR genes were classified into Toll/interleukin-1 receptor (TIR)-NBS-LRR (TNL) types, and 148 NBS-LRR genes were classified into coiled-coil (CC)-NBS-LRR (CNL) types. Moreover, the phylogenetic tree supported the division of these PvNBS genes into two obvious groups, TNL types and CNL types. We also built expression profiles of NBS genes in response to anthracnose and common bacterial blight using qRT-PCR. Finally, we detected nine disease resistance loci for anthracnose (ANT) and seven for common bacterial blight (CBB) using the developed NBS-SSR markers. Among these loci, NSSR24, NSSR73, and NSSR265 may be located at new regions for ANT resistance, while NSSR65 and NSSR260 may be located at new regions for CBB resistance. Furthermore, we validated NSSR24, NSSR65, NSSR73, NSSR260, and NSSR265 using a new natural population. Our results provide useful information regarding the function of the NBS-LRR proteins and will accelerate the functional genomics and evolutionary studies of NBS-LRR genes in food legumes. NBS-SSR markers represent a wide-reaching resource for molecular breeding in the common bean and other food legumes. Collectively, our results should be of broad interest to bean scientists and breeders.

  4. Genome-Wide Association Study Identifies NBS-LRR-Encoding Genes Related with Anthracnose and Common Bacterial Blight in the Common Bean

    PubMed Central

    Wu, Jing; Zhu, Jifeng; Wang, Lanfen; Wang, Shumin

    2017-01-01

    Nucleotide-binding site and leucine-rich repeat (NBS-LRR) genes represent the largest and most important disease resistance genes in plants. The genome sequence of the common bean (Phaseolus vulgaris L.) provides valuable data for determining the genomic organization of NBS-LRR genes. However, data on the NBS-LRR genes in the common bean are limited. In total, 178 NBS-LRR-type genes and 145 partial genes (with or without a NBS) located on 11 common bean chromosomes were identified from genome sequences database. Furthermore, 30 NBS-LRR genes were classified into Toll/interleukin-1 receptor (TIR)-NBS-LRR (TNL) types, and 148 NBS-LRR genes were classified into coiled-coil (CC)-NBS-LRR (CNL) types. Moreover, the phylogenetic tree supported the division of these PvNBS genes into two obvious groups, TNL types and CNL types. We also built expression profiles of NBS genes in response to anthracnose and common bacterial blight using qRT-PCR. Finally, we detected nine disease resistance loci for anthracnose (ANT) and seven for common bacterial blight (CBB) using the developed NBS-SSR markers. Among these loci, NSSR24, NSSR73, and NSSR265 may be located at new regions for ANT resistance, while NSSR65 and NSSR260 may be located at new regions for CBB resistance. Furthermore, we validated NSSR24, NSSR65, NSSR73, NSSR260, and NSSR265 using a new natural population. Our results provide useful information regarding the function of the NBS-LRR proteins and will accelerate the functional genomics and evolutionary studies of NBS-LRR genes in food legumes. NBS-SSR markers represent a wide-reaching resource for molecular breeding in the common bean and other food legumes. Collectively, our results should be of broad interest to bean scientists and breeders. PMID:28848595

  5. Comparative genome-scale modelling of Staphylococcus aureus strains identifies strain-specific metabolic capabilities linked to pathogenicity

    PubMed Central

    Bosi, Emanuele; Monk, Jonathan M.; Aziz, Ramy K.; Fondi, Marco; Nizet, Victor; Palsson, Bernhard Ø.

    2016-01-01

    Staphylococcus aureus is a preeminent bacterial pathogen capable of colonizing diverse ecological niches within its human host. We describe here the pangenome of S. aureus based on analysis of genome sequences from 64 strains of S. aureus spanning a range of ecological niches, host types, and antibiotic resistance profiles. Based on this set, S. aureus is expected to have an open pangenome composed of 7,411 genes and a core genome composed of 1,441 genes. Metabolism was highly conserved in this core genome; however, differences were identified in amino acid and nucleotide biosynthesis pathways between the strains. Genome-scale models (GEMs) of metabolism were constructed for the 64 strains of S. aureus. These GEMs enabled a systems approach to characterizing the core metabolic and panmetabolic capabilities of the S. aureus species. All models were predicted to be auxotrophic for the vitamins niacin (vitamin B3) and thiamin (vitamin B1), whereas strain-specific auxotrophies were predicted for riboflavin (vitamin B2), guanosine, leucine, methionine, and cysteine, among others. GEMs were used to systematically analyze growth capabilities in more than 300 different growth-supporting environments. The results identified metabolic capabilities linked to pathogenic traits and virulence acquisitions. Such traits can be used to differentiate strains responsible for mild vs. severe infections and preference for hosts (e.g., animals vs. humans). Genome-scale analysis of multiple strains of a species can thus be used to identify metabolic determinants of virulence and increase our understanding of why certain strains of this deadly pathogen have spread rapidly throughout the world. PMID:27286824

  6. Genomic interval engineering of mice identified a novel modulator of triglyceride production

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhu, Y.; Jong, M.C.; Frazer, K.A.

    1999-10-01

    To accelerate the biological annotation of novel genes discovered in sequenced of mammalian genomes, we are creating large deletions in the mouse genome targeted to include clusters of such genes. Here we describe the targeted deletion of a 450 kb region on mouse chromosome 11 which, based on computational analysis of the deleted murine sequences and human 5q orthologous sequences, codes for nine putative genes. Mice homozygous for the deletion had a variety of abnormalities including severe hypertriglyceridemia, hepatic and cardiac enlargement, growth retardation and premature mortality. Analysis of triglyceride metabolism in these animals demonstrated a several-fold increase in hepaticmore » very-low density lipoprotein (VLDL) triglyceride secretion, the most prevalent mechanism responsible for hypertriglyceridemia in humans. A series of mouse BAC and human YAC transgenes covering different intervals of the 450 kb deleted region were assessed for their ability to complement the deletion induced abnormalities. These studies revealed that OCTN2, a gene recently shown to play a role in carnitine transport, was able to correct the triglyceride abnormalities. The discovery of this previously unappreciated relationship between OCTN2, carnitine and hepatic triglyceride production is of particular importance due to the clinical consequence of hypertriglyceridemia and the paucity of genes known to modulate triglyceride secretion.« less

  7. Markov models of genome segmentation

    NASA Astrophysics Data System (ADS)

    Thakur, Vivek; Azad, Rajeev K.; Ramaswamy, Ram

    2007-01-01

    We introduce Markov models for segmentation of symbolic sequences, extending a segmentation procedure based on the Jensen-Shannon divergence that has been introduced earlier. Higher-order Markov models are more sensitive to the details of local patterns and in application to genome analysis, this makes it possible to segment a sequence at positions that are biologically meaningful. We show the advantage of higher-order Markov-model-based segmentation procedures in detecting compositional inhomogeneity in chimeric DNA sequences constructed from genomes of diverse species, and in application to the E. coli K12 genome, boundaries of genomic islands, cryptic prophages, and horizontally acquired regions are accurately identified.

  8. A genome-wide association study of seed protein and oil content in soybean

    PubMed Central

    2014-01-01

    Background Association analysis is an alternative to conventional family-based methods to detect the location of gene(s) or quantitative trait loci (QTL) and provides relatively high resolution in terms of defining the genome position of a gene or QTL. Seed protein and oil concentration are quantitative traits which are determined by the interaction among many genes with small to moderate genetic effects and their interaction with the environment. In this study, a genome-wide association study (GWAS) was performed to identify quantitative trait loci (QTL) controlling seed protein and oil concentration in 298 soybean germplasm accessions exhibiting a wide range of seed protein and oil content. Results A total of 55,159 single nucleotide polymorphisms (SNPs) were genotyped using various methods including Illumina Infinium and GoldenGate assays and 31,954 markers with minor allele frequency >0.10 were used to estimate linkage disequilibrium (LD) in heterochromatic and euchromatic regions. In euchromatic regions, the mean LD (r 2 ) rapidly declined to 0.2 within 360 Kbp, whereas the mean LD declined to 0.2 at 9,600 Kbp in heterochromatic regions. The GWAS results identified 40 SNPs in 17 different genomic regions significantly associated with seed protein. Of these, the five SNPs with the highest associations and seven adjacent SNPs were located in the 27.6-30.0 Mbp region of Gm20. A major seed protein QTL has been previously mapped to the same location and potential candidate genes have recently been identified in this region. The GWAS results also detected 25 SNPs in 13 different genomic regions associated with seed oil. Of these markers, seven SNPs had a significant association with both protein and oil. Conclusions This research indicated that GWAS not only identified most of the previously reported QTL controlling seed protein and oil, but also resulted in narrower genomic regions than the regions reported as containing these QTL. The narrower GWAS-defined genome

  9. A genome-wide association study of seed protein and oil content in soybean.

    PubMed

    Hwang, Eun-Young; Song, Qijian; Jia, Gaofeng; Specht, James E; Hyten, David L; Costa, Jose; Cregan, Perry B

    2014-01-02

    Association analysis is an alternative to conventional family-based methods to detect the location of gene(s) or quantitative trait loci (QTL) and provides relatively high resolution in terms of defining the genome position of a gene or QTL. Seed protein and oil concentration are quantitative traits which are determined by the interaction among many genes with small to moderate genetic effects and their interaction with the environment. In this study, a genome-wide association study (GWAS) was performed to identify quantitative trait loci (QTL) controlling seed protein and oil concentration in 298 soybean germplasm accessions exhibiting a wide range of seed protein and oil content. A total of 55,159 single nucleotide polymorphisms (SNPs) were genotyped using various methods including Illumina Infinium and GoldenGate assays and 31,954 markers with minor allele frequency >0.10 were used to estimate linkage disequilibrium (LD) in heterochromatic and euchromatic regions. In euchromatic regions, the mean LD (r2) rapidly declined to 0.2 within 360 Kbp, whereas the mean LD declined to 0.2 at 9,600 Kbp in heterochromatic regions. The GWAS results identified 40 SNPs in 17 different genomic regions significantly associated with seed protein. Of these, the five SNPs with the highest associations and seven adjacent SNPs were located in the 27.6-30.0 Mbp region of Gm20. A major seed protein QTL has been previously mapped to the same location and potential candidate genes have recently been identified in this region. The GWAS results also detected 25 SNPs in 13 different genomic regions associated with seed oil. Of these markers, seven SNPs had a significant association with both protein and oil. This research indicated that GWAS not only identified most of the previously reported QTL controlling seed protein and oil, but also resulted in narrower genomic regions than the regions reported as containing these QTL. The narrower GWAS-defined genome regions will allow more precise

  10. Defiant: (DMRs: easy, fast, identification and ANnoTation) identifies differentially Methylated regions from iron-deficient rat hippocampus.

    PubMed

    Condon, David E; Tran, Phu V; Lien, Yu-Chin; Schug, Jonathan; Georgieff, Michael K; Simmons, Rebecca A; Won, Kyoung-Jae

    2018-02-05

    Identification of differentially methylated regions (DMRs) is the initial step towards the study of DNA methylation-mediated gene regulation. Previous approaches to call DMRs suffer from false prediction, use extreme resources, and/or require library installation and input conversion. We developed a new approach called Defiant to identify DMRs. Employing Weighted Welch Expansion (WWE), Defiant showed superior performance to other predictors in the series of benchmarking tests on artificial and real data. Defiant was subsequently used to investigate DNA methylation changes in iron-deficient rat hippocampus. Defiant identified DMRs close to genes associated with neuronal development and plasticity, which were not identified by its competitor. Importantly, Defiant runs between 5 to 479 times faster than currently available software packages. Also, Defiant accepts 10 different input formats widely used for DNA methylation data. Defiant effectively identifies DMRs for whole-genome bisulfite sequencing (WGBS), reduced-representation bisulfite sequencing (RRBS), Tet-assisted bisulfite sequencing (TAB-seq), and HpaII tiny fragment enrichment by ligation-mediated PCR-tag (HELP) assays.

  11. Genome-wide significant localization for working and spatial memory: Identifying genes for psychosis using models of cognition.

    PubMed

    Knowles, Emma E M; Carless, Melanie A; de Almeida, Marcio A A; Curran, Joanne E; McKay, D Reese; Sprooten, Emma; Dyer, Thomas D; Göring, Harald H; Olvera, Rene; Fox, Peter; Almasy, Laura; Duggirala, Ravi; Kent, Jack W; Blangero, John; Glahn, David C

    2014-01-01

    It is well established that risk for developing psychosis is largely mediated by the influence of genes, but identifying precisely which genes underlie that risk has been problematic. Focusing on endophenotypes, rather than illness risk, is one solution to this problem. Impaired cognition is a well-established endophenotype of psychosis. Here we aimed to characterize the genetic architecture of cognition using phenotypically detailed models as opposed to relying on general IQ or individual neuropsychological measures. In so doing we hoped to identify genes that mediate cognitive ability, which might also contribute to psychosis risk. Hierarchical factor models of genetically clustered cognitive traits were subjected to linkage analysis followed by QTL region-specific association analyses in a sample of 1,269 Mexican American individuals from extended pedigrees. We identified four genome wide significant QTLs, two for working and two for spatial memory, and a number of plausible and interesting candidate genes. The creation of detailed models of cognition seemingly enhanced the power to detect genetic effects on cognition and provided a number of possible candidate genes for psychosis. © 2013 Wiley Periodicals, Inc.

  12. Identifying parameter regions for multistationarity

    PubMed Central

    Conradi, Carsten; Mincheva, Maya; Wiuf, Carsten

    2017-01-01

    Mathematical modelling has become an established tool for studying the dynamics of biological systems. Current applications range from building models that reproduce quantitative data to identifying systems with predefined qualitative features, such as switching behaviour, bistability or oscillations. Mathematically, the latter question amounts to identifying parameter values associated with a given qualitative feature. We introduce a procedure to partition the parameter space of a parameterized system of ordinary differential equations into regions for which the system has a unique or multiple equilibria. The procedure is based on the computation of the Brouwer degree, and it creates a multivariate polynomial with parameter depending coefficients. The signs of the coefficients determine parameter regions with and without multistationarity. A particular strength of the procedure is the avoidance of numerical analysis and parameter sampling. The procedure consists of a number of steps. Each of these steps might be addressed algorithmically using various computer programs and available software, or manually. We demonstrate our procedure on several models of gene transcription and cell signalling, and show that in many cases we obtain a complete partitioning of the parameter space with respect to multistationarity. PMID:28972969

  13. Coexpression network analysis identifies transcriptional modules associated with genomic alterations in neuroblastoma.

    PubMed

    Yang, Liulin; Li, Yun; Wei, Zhi; Chang, Xiao

    2018-06-01

    Neuroblastoma is a highly complex and heterogeneous cancer in children. Acquired genomic alterations including MYCN amplification, 1p deletion and 11q deletion are important risk factors and biomarkers in neuroblastoma. Here, we performed a co-expression-based gene network analysis to study the intrinsic association between specific genomic changes and transcriptome organization. We identified multiple gene coexpression modules which are recurrent in two independent datasets and associated with functional pathways including nervous system development, cell cycle, immune system process and extracellular matrix/space. Our results also indicated that modules involved in nervous system development and cell cycle are highly associated with MYCN amplification and 1p deletion, while modules responding to immune system process are associated with MYCN amplification only. In summary, this integrated analysis provides novel insights into molecular heterogeneity and pathogenesis of neuroblastoma. This article is part of a Special Issue entitled: Accelerating Precision Medicine through Genetic and Genomic Big Data Analysis edited by Yudong Cai & Tao Huang. Copyright © 2017. Published by Elsevier B.V.

  14. A novel prokaryotic promoter identified in the genome of some monopartite begomoviruses.

    PubMed

    Wang, Wei-Chen; Hsu, Yau-Heiu; Lin, Na-Sheng; Wu, Chia-Ying; Lai, Yi-Chin; Hu, Chung-Chi

    2013-01-01

    Geminiviruses are known to exhibit both prokaryotic and eukaryotic features in their genomes, with the ability to express their genes and even replicate in bacterial cells. We have demonstrated previously the existence of unit-length single-stranded circular DNAs of Ageratum yellow vein virus (AYVV, a species in the genus Begomovirus, family Geminiviridae) in Escherichia coli cells, which prompted our search for unknown prokaryotic functions in the begomovirus genomes. By using a promoter trapping strategy, we identified a novel prokaryotic promoter, designated AV3 promoter, in nts 762-831 of the AYVV genome. Activity assays revealed that the AV3 promoter is strong, unidirectional, and constitutive, with an endogenous downstream ribosome binding site and a translatable short open reading frame of eight amino acids. Sequence analyses suggested that the AV3 promoter might be a remnant of prokaryotic ancestors that could be related to certain promoters of bacteria from marine or freshwater environments. The discovery of the prokaryotic AV3 promoter provided further evidence for the prokaryotic origin in the evolutionary history of geminiviruses.

  15. A Novel Prokaryotic Promoter Identified in the Genome of Some Monopartite Begomoviruses

    PubMed Central

    Wang, Wei-Chen; Hsu, Yau-Heiu; Lin, Na-Sheng; Wu, Chia-Ying; Lai, Yi-Chin; Hu, Chung-Chi

    2013-01-01

    Geminiviruses are known to exhibit both prokaryotic and eukaryotic features in their genomes, with the ability to express their genes and even replicate in bacterial cells. We have demonstrated previously the existence of unit-length single-stranded circular DNAs of Ageratum yellow vein virus (AYVV, a species in the genus Begomovirus, family Geminiviridae) in Escherichia coli cells, which prompted our search for unknown prokaryotic functions in the begomovirus genomes. By using a promoter trapping strategy, we identified a novel prokaryotic promoter, designated AV3 promoter, in nts 762-831 of the AYVV genome. Activity assays revealed that the AV3 promoter is strong, unidirectional, and constitutive, with an endogenous downstream ribosome binding site and a translatable short open reading frame of eight amino acids. Sequence analyses suggested that the AV3 promoter might be a remnant of prokaryotic ancestors that could be related to certain promoters of bacteria from marine or freshwater environments. The discovery of the prokaryotic AV3 promoter provided further evidence for the prokaryotic origin in the evolutionary history of geminiviruses. PMID:23936138

  16. Utilizing the Dog Genome in the Search for Novel Candidate Genes Involved in Glioma Development—Genome Wide Association Mapping followed by Targeted Massive Parallel Sequencing Identifies a Strongly Associated Locus

    PubMed Central

    Dickinson, Peter; Xiong, Anqi; York, Daniel; Jayashankar, Kartika; Pielberg, Gerli; Koltookian, Michele; Murén, Eva; Fuxelius, Hans-Henrik; Weishaupt, Holger; Andersson, Göran; Hedhammar, Åke; Bongcam-Rudloff, Erik; Forsberg-Nilsson, Karin

    2016-01-01

    Gliomas are the most common form of malignant primary brain tumors in humans and second most common in dogs, occurring with similar frequencies in both species. Dogs are valuable spontaneous models of human complex diseases including cancers and may provide insight into disease susceptibility and oncogenesis. Several brachycephalic breeds such as Boxer, Bulldog and Boston Terrier have an elevated risk of developing glioma, but others, including Pug and Pekingese, are not at higher risk. To identify glioma-associated genetic susceptibility factors, an across-breed genome-wide association study (GWAS) was performed on 39 dog glioma cases and 141 controls from 25 dog breeds, identifying a genome-wide significant locus on canine chromosome (CFA) 26 (p = 2.8 x 10−8). Targeted re-sequencing of the 3.4 Mb candidate region was performed, followed by genotyping of the 56 SNVs that best fit the association pattern between the re-sequenced cases and controls. We identified three candidate genes that were highly associated with glioma susceptibility: CAMKK2, P2RX7 and DENR. CAMKK2 showed reduced expression in both canine and human brain tumors, and a non-synonymous variant in P2RX7, previously demonstrated to have a 50% decrease in receptor function, was also associated with disease. Thus, one or more of these genes appear to affect glioma susceptibility. PMID:27171399

  17. Comparative genomic analysis of Helicobacter pylori from Malaysia identifies three distinct lineages suggestive of differential evolution.

    PubMed

    Kumar, Narender; Mariappan, Vanitha; Baddam, Ramani; Lankapalli, Aditya K; Shaik, Sabiha; Goh, Khean-Lee; Loke, Mun Fai; Perkins, Tim; Benghezal, Mohammed; Hasnain, Seyed E; Vadivelu, Jamuna; Marshall, Barry J; Ahmed, Niyaz

    2015-01-01

    The discordant prevalence of Helicobacter pylori and its related diseases, for a long time, fostered certain enigmatic situations observed in the countries of the southern world. Variation in H. pylori infection rates and disease outcomes among different populations in multi-ethnic Malaysia provides a unique opportunity to understand dynamics of host-pathogen interaction and genome evolution. In this study, we extensively analyzed and compared genomes of 27 Malaysian H. pylori isolates and identified three major phylogeographic lineages: hspEastAsia, hpEurope and hpSouthIndia. The analysis of the virulence genes within the core genome, however, revealed a comparable pathogenic potential of the strains. In addition, we identified four genes limited to strains of East-Asian lineage. Our analyses identified a few strain-specific genes encoding restriction modification systems and outlined 311 core genes possibly under differential evolutionary constraints, among the strains representing different ethnic groups. The cagA and vacA genes also showed variations in accordance with the host genetic background of the strains. Moreover, restriction modification genes were found to be significantly enriched in East-Asian strains. An understanding of these variations in the genome content would provide significant insights into various adaptive and host modulation strategies harnessed by H. pylori to effectively persist in a host-specific manner. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  18. A Gene-Oriented Haplotype Comparison Reveals Recently Selected Genomic Regions in Temperate and Tropical Maize Germplasm

    PubMed Central

    Zhang, Jie; Li, Yongxiang; Zheng, Jun; Zhang, Hongwei; Yang, Xiaohong; Wang, Jianhua; Wang, Guoying

    2017-01-01

    The extensive genetic variation present in maize (Zea mays) germplasm makes it possible to detect signatures of positive artificial selection that occurred during temperate and tropical maize improvement. Here we report an analysis of 532,815 polymorphisms from a maize association panel consisting of 368 diverse temperate and tropical inbred lines. We developed a gene-oriented approach adapting exonic polymorphisms to identify recently selected alleles by comparing haplotypes across the maize genome. This analysis revealed evidence of selection for more than 1100 genomic regions during recent improvement, and included regulatory genes and key genes with visible mutant phenotypes. We find that selected candidate target genes in temperate maize are enriched in biosynthetic processes, and further examination of these candidates highlights two cases, sucrose flux and oil storage, in which multiple genes in a common pathway can be cooperatively selected. Finally, based on available parallel gene expression data, we hypothesize that some genes were selected for regulatory variations, resulting in altered gene expression. PMID:28099470

  19. Genome-wide meta-analysis identifies multiple novel associations and ethnic heterogeneity of psoriasis susceptibility.

    PubMed

    Yin, Xianyong; Low, Hui Qi; Wang, Ling; Li, Yonghong; Ellinghaus, Eva; Han, Jiali; Estivill, Xavier; Sun, Liangdan; Zuo, Xianbo; Shen, Changbing; Zhu, Caihong; Zhang, Anping; Sanchez, Fabio; Padyukov, Leonid; Catanese, Joseph J; Krueger, Gerald G; Duffin, Kristina Callis; Mucha, Sören; Weichenthal, Michael; Weidinger, Stephan; Lieb, Wolfgang; Foo, Jia Nee; Li, Yi; Sim, Karseng; Liany, Herty; Irwan, Ishak; Teo, Yikying; Theng, Colin T S; Gupta, Rashmi; Bowcock, Anne; De Jager, Philip L; Qureshi, Abrar A; de Bakker, Paul I W; Seielstad, Mark; Liao, Wilson; Ståhle, Mona; Franke, Andre; Zhang, Xuejun; Liu, Jianjun

    2015-04-23

    Psoriasis is a common inflammatory skin disease with complex genetics and different degrees of prevalence across ethnic populations. Here we present the largest trans-ethnic genome-wide meta-analysis (GWMA) of psoriasis in 15,369 cases and 19,517 controls of Caucasian and Chinese ancestries. We identify four novel associations at LOC144817, COG6, RUNX1 and TP63, as well as three novel secondary associations within IFIH1 and IL12B. Fine-mapping analysis of MHC region demonstrates an important role for all three HLA class I genes and a complex and heterogeneous pattern of HLA associations between Caucasian and Chinese populations. Further, trans-ethnic comparison suggests population-specific effect or allelic heterogeneity for 11 loci. These population-specific effects contribute significantly to the ethnic diversity of psoriasis prevalence. This study not only provides novel biological insights into the involvement of immune and keratinocyte development mechanism, but also demonstrates a complex and heterogeneous genetic architecture of psoriasis susceptibility across ethnic populations.

  20. Genome-wide meta-analysis identifies multiple novel associations and ethnic heterogeneity of psoriasis susceptibility

    PubMed Central

    Yin, Xianyong; Low, Hui Qi; Wang, Ling; Li, Yonghong; Ellinghaus, Eva; Han, Jiali; Estivill, Xavier; Sun, Liangdan; Zuo, Xianbo; Shen, Changbing; Zhu, Caihong; Zhang, Anping; Sanchez, Fabio; Padyukov, Leonid; Catanese, Joseph J.; Krueger, Gerald G.; Duffin, Kristina Callis; Mucha, Sören; Weichenthal, Michael; Weidinger, Stephan; Lieb, Wolfgang; Foo, Jia Nee; Li, Yi; Sim, Karseng; Liany, Herty; Irwan, Ishak; Teo, Yikying; Theng, Colin T. S.; Gupta, Rashmi; Bowcock, Anne; De Jager, Philip L.; Qureshi, Abrar A.; de Bakker, Paul I. W.; Seielstad, Mark; Liao, Wilson; Ståhle, Mona; Franke, Andre; Zhang, Xuejun; Liu, Jianjun

    2015-01-01

    Psoriasis is a common inflammatory skin disease with complex genetics and different degrees of prevalence across ethnic populations. Here we present the largest trans-ethnic genome-wide meta-analysis (GWMA) of psoriasis in 15,369 cases and 19,517 controls of Caucasian and Chinese ancestries. We identify four novel associations at LOC144817, COG6, RUNX1 and TP63, as well as three novel secondary associations within IFIH1 and IL12B. Fine-mapping analysis of MHC region demonstrates an important role for all three HLA class I genes and a complex and heterogeneous pattern of HLA associations between Caucasian and Chinese populations. Further, trans-ethnic comparison suggests population-specific effect or allelic heterogeneity for 11 loci. These population-specific effects contribute significantly to the ethnic diversity of psoriasis prevalence. This study not only provides novel biological insights into the involvement of immune and keratinocyte development mechanism, but also demonstrates a complex and heterogeneous genetic architecture of psoriasis susceptibility across ethnic populations. PMID:25903422

  1. Ebolavirus comparative genomics

    PubMed Central

    Jun, Se-Ran; Leuze, Michael R.; Nookaew, Intawat; Uberbacher, Edward C.; Land, Miriam; Zhang, Qian; Wanchai, Visanu; Chai, Juanjuan; Nielsen, Morten; Trolle, Thomas; Lund, Ole; Buzard, Gregory S.; Pedersen, Thomas D.; Wassenaar, Trudy M.; Ussery, David W.

    2015-01-01

    The 2014 Ebola outbreak in West Africa is the largest documented for this virus. To examine the dynamics of this genome, we compare more than 100 currently available ebolavirus genomes to each other and to other viral genomes. Based on oligomer frequency analysis, the family Filoviridae forms a distinct group from all other sequenced viral genomes. All filovirus genomes sequenced to date encode proteins with similar functions and gene order, although there is considerable divergence in sequences between the three genera Ebolavirus, Cuevavirus and Marburgvirus within the family Filoviridae. Whereas all ebolavirus genomes are quite similar (multiple sequences of the same strain are often identical), variation is most common in the intergenic regions and within specific areas of the genes encoding the glycoprotein (GP), nucleoprotein (NP) and polymerase (L). We predict regions that could contain epitope-binding sites, which might be good vaccine targets. This information, combined with glycosylation sites and experimentally determined epitopes, can identify the most promising regions for the development of therapeutic strategies. This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). PMID:26175035

  2. Efficient Genome-wide Association in Biobanks Using Topic Modeling Identifies Multiple Novel Disease Loci

    PubMed Central

    McCoy, Thomas H; Castro, Victor M; Snapper, Leslie A; Hart, Kamber L; Perlis, Roy H

    2017-01-01

    Biobanks and national registries represent a powerful tool for genomic discovery, but rely on diagnostic codes that can be unreliable and fail to capture relationships between related diagnoses. We developed an efficient means of conducting genome-wide association studies using combinations of diagnostic codes from electronic health records for 10,845 participants in a biobanking program at two large academic medical centers. Specifically, we applied latent Dirichilet allocation to fit 50 disease topics based on diagnostic codes, then conducted a genome-wide common-variant association for each topic. In sensitivity analysis, these results were contrasted with those obtained from traditional single-diagnosis phenome-wide association analysis, as well as those in which only a subset of diagnostic codes were included per topic. In meta-analysis across three biobank cohorts, we identified 23 disease-associated loci with p < 1e-15, including previously associated autoimmune disease loci. In all cases, observed significant associations were of greater magnitude than single phenome-wide diagnostic codes, and incorporation of less strongly loading diagnostic codes enhanced association. This strategy provides a more efficient means of identifying phenome-wide associations in biobanks with coded clinical data. PMID:28861588

  3. Genome-wide mapping of virulence in brown planthopper identifies loci that break down host plant resistance.

    PubMed

    Jing, Shengli; Zhang, Lei; Ma, Yinhua; Liu, Bingfang; Zhao, Yan; Yu, Hangjin; Zhou, Xi; Qin, Rui; Zhu, Lili; He, Guangcun

    2014-01-01

    Insects and plants have coexisted for over 350 million years and their interactions have affected ecosystems and agricultural practices worldwide. Variation in herbivorous insects' virulence to circumvent host resistance has been extensively documented. However, despite decades of investigation, the genetic foundations of virulence are currently unknown. The brown planthopper (Nilaparvata lugens) is the most destructive rice (Oryza sativa) pest in the world. The identification of the resistance gene Bph1 and its introduction in commercial rice varieties prompted the emergence of a new virulent brown planthopper biotype that was able to break the resistance conferred by Bph1. In this study, we aimed to construct a high density linkage map for the brown planthopper and identify the loci responsible for its virulence in order to determine their genetic architecture. Based on genotyping data for hundreds of molecular markers in three mapping populations, we constructed the most comprehensive linkage map available for this species, covering 96.6% of its genome. Fifteen chromosomes were anchored with 124 gene-specific markers. Using genome-wide scanning and interval mapping, the Qhp7 locus that governs preference for Bph1 plants was mapped to a 0.1 cM region of chromosome 7. In addition, two major QTLs that govern the rate of insect growth on resistant rice plants were identified on chromosomes 5 (Qgr5) and 14 (Qgr14). This is the first study to successfully locate virulence in the genome of this important agricultural insect by marker-based genetic mapping. Our results show that the virulence which overcomes the resistance conferred by Bph1 is controlled by a few major genes and that the components of virulence originate from independent genetic characters. The isolation of these loci will enable the elucidation of the molecular mechanisms underpinning the rice-brown planthopper interaction and facilitate the development of durable approaches for controlling this most

  4. Genome-Wide Mapping of Virulence in Brown Planthopper Identifies Loci That Break Down Host Plant Resistance

    PubMed Central

    Jing, Shengli; Zhang, Lei; Ma, Yinhua; Liu, Bingfang; Zhao, Yan; Yu, Hangjin; Zhou, Xi; Qin, Rui; Zhu, Lili; He, Guangcun

    2014-01-01

    Insects and plants have coexisted for over 350 million years and their interactions have affected ecosystems and agricultural practices worldwide. Variation in herbivorous insects' virulence to circumvent host resistance has been extensively documented. However, despite decades of investigation, the genetic foundations of virulence are currently unknown. The brown planthopper (Nilaparvata lugens) is the most destructive rice (Oryza sativa) pest in the world. The identification of the resistance gene Bph1 and its introduction in commercial rice varieties prompted the emergence of a new virulent brown planthopper biotype that was able to break the resistance conferred by Bph1. In this study, we aimed to construct a high density linkage map for the brown planthopper and identify the loci responsible for its virulence in order to determine their genetic architecture. Based on genotyping data for hundreds of molecular markers in three mapping populations, we constructed the most comprehensive linkage map available for this species, covering 96.6% of its genome. Fifteen chromosomes were anchored with 124 gene-specific markers. Using genome-wide scanning and interval mapping, the Qhp7 locus that governs preference for Bph1 plants was mapped to a 0.1 cM region of chromosome 7. In addition, two major QTLs that govern the rate of insect growth on resistant rice plants were identified on chromosomes 5 (Qgr5) and 14 (Qgr14). This is the first study to successfully locate virulence in the genome of this important agricultural insect by marker-based genetic mapping. Our results show that the virulence which overcomes the resistance conferred by Bph1 is controlled by a few major genes and that the components of virulence originate from independent genetic characters. The isolation of these loci will enable the elucidation of the molecular mechanisms underpinning the rice-brown planthopper interaction and facilitate the development of durable approaches for controlling this most

  5. Genome-wide Association Study Identifies Peanut Allergy-Specific Loci and Evidence of Epigenetic Mediation in U.S. Children

    PubMed Central

    Hong, Xiumei; Hao, Ke; Ladd-Acosta, Christine; Hansen, Kasper D; Tsai, Hui-Ju; Liu, Xin; Xu, Xin; Thornton, Timothy A.; Caruso, Deanna; Keet, Corinne A; Sun, Yifei; Wang, Guoying; Luo, Wei; Kumar, Rajesh; Fuleihan, Ramsay; Singh, Anne Marie; Kim, Jennifer S; Story, Rachel E; Gupta, Ruchi S; Gao, Peisong; Chen, Zhu; Walker, Sheila O.; Bartell, Tami R; Beaty, Terri H; Fallin, M Daniele; Schleimer, Robert; Holt, Patrick G; Nadeau, Kari Christine; Wood, Robert A; Pongracic, Jacqueline A; Weeks, Daniel E; Wang, Xiaobin

    2015-01-01

    Food allergy (FA) affects 2–10% of U.S. children and is a growing clinical and public health problem. Here we conduct the first genome-wide association study of well-defined FA, including specific subtypes (peanut, milk, and egg) in 2,759 U.S. participants (1,315 children; 1,444 parents) from the Chicago Food Allergy Study; and identify peanut allergy (PA)-specific loci in the HLA-DR and -DQ gene region at 6p21.32, tagged by rs7192 (p=5.5×10−8) and rs9275596 (p=6.8×10−10), in 2,197 participants of European ancestry. We replicate these associations in an independent sample of European ancestry. These associations are further supported by meta-analyses across the discovery and replication samples. Both single-nucleotide polymorphisms (SNPs) are associated with differential DNA methylation levels at multiple CpG sites (p<5×10−8); and differential DNA methylation of the HLA-DQB1 and HLA-DRB1 genes partially mediate the identified SNP-PA associations. This study suggests that the HLA-DR and -DQ gene region likely poses significant genetic risk for PA. PMID:25710614

  6. Genome-wide association study of breast cancer in Latinas identifies novel protective variants on 6q25.

    PubMed

    Fejerman, Laura; Ahmadiyeh, Nasim; Hu, Donglei; Huntsman, Scott; Beckman, Kenneth B; Caswell, Jennifer L; Tsung, Karen; John, Esther M; Torres-Mejia, Gabriela; Carvajal-Carmona, Luis; Echeverry, María Magdalena; Tuazon, Anna Marie D; Ramirez, Carolina; Gignoux, Christopher R; Eng, Celeste; Gonzalez-Burchard, Esteban; Henderson, Brian; Le Marchand, Loic; Kooperberg, Charles; Hou, Lifang; Agalliu, Ilir; Kraft, Peter; Lindström, Sara; Perez-Stable, Eliseo J; Haiman, Christopher A; Ziv, Elad

    2014-10-20

    The genetic contributions to breast cancer development among Latinas are not well understood. Here we carry out a genome-wide association study of breast cancer in Latinas and identify a genome-wide significant risk variant, located 5' of the Estrogen Receptor 1 gene (ESR1; 6q25 region). The minor allele for this variant is strongly protective (rs140068132: odds ratio (OR) 0.60, 95% confidence interval (CI) 0.53-0.67, P=9 × 10(-18)), originates from Indigenous Americans and is uncorrelated with previously reported risk variants at 6q25. The association is stronger for oestrogen receptor-negative disease (OR 0.34, 95% CI 0.21-0.54) than oestrogen receptor-positive disease (OR 0.63, 95% CI 0.49-0.80; P heterogeneity=0.01) and is also associated with mammographic breast density, a strong risk factor for breast cancer (P=0.001). rs140068132 is located within several transcription factor-binding sites and electrophoretic mobility shift assays with MCF-7 nuclear protein demonstrate differential binding of the G/A alleles at this locus. These results highlight the importance of conducting research in diverse populations.

  7. Genome-wide association study of breast cancer in Latinas identifies novel protective variants on 6q25

    PubMed Central

    Fejerman, Laura; Ahmadiyeh, Nasim; Hu, Donglei; Huntsman, Scott; Beckman, Kenneth B.; Caswell, Jennifer L.; Tsung, Karen; John, Esther M.; Torres-Mejia, Gabriela; Carvajal-Carmona, Luis; Echeverry, María Magdalena; Tuazon, Anna Marie D.; Ramirez, Carolina; Carvajal-Carmona, Luis; Echeverry, María Magdalena; Bohórquez, Mabel Elena; Prieto, Rodrigo; Criollo, Ángel; Ramírez, Carolina; Estrada, Ana Patricia; Suáres, John Jairo; Mateus, Gilbert; Castro, Jorge Mario; Sánchez, Yesid; Murillo, Raúl; Lucia Serrano, Martha; Sanabria, Carolina; Olaya, Justo Germán; Bolaños, Fernando; Vélez, Alejandro; Carmona, Jenny Andrea; Vélez, Alejandro; Rodríguez, Nancy Guerrero; Serón Sousa, Cristina; Mendez, Cesar Eduardo Alvarez; Galviz, Ana Isabel Orduz; Gignoux, Christopher R.; Eng, Celeste; Gonzalez-Burchard, Esteban; Henderson, Brian; Marchand, Loic Le; Kooperberg, Charles; Hou, Lifang; Agalliu, Ilir; Kraft, Peter; Lindström, Sara; Perez-Stable, Eliseo J.; Haiman, Christopher A.; Ziv, Elad

    2014-01-01

    The genetic contributions to breast cancer development among Latinas are not well understood. Here we carry out a genome-wide association study of breast cancer in Latinas and identify a genome-wide significant risk variant, located 5′ of the Estrogen Receptor 1 gene (ESR1; 6q25 region). The minor allele for this variant is strongly protective (rs140068132: odds ratio (OR) 0.60, 95% confidence interval (CI) 0.53–0.67, P=9 × 10−18), originates from Indigenous Americans and is uncorrelated with previously reported risk variants at 6q25. The association is stronger for oestrogen receptor-negative disease (OR 0.34, 95% CI 0.21–0.54) than oestrogen receptor-positive disease (OR 0.63, 95% CI 0.49–0.80; P heterogeneity=0.01) and is also associated with mammographic breast density, a strong risk factor for breast cancer (P=0.001). rs140068132 is located within several transcription factor-binding sites and electrophoretic mobility shift assays with MCF-7 nuclear protein demonstrate differential binding of the G/A alleles at this locus. These results highlight the importance of conducting research in diverse populations. PMID:25327703

  8. Genome-Wide Expression Profiling of Complex Regional Pain Syndrome

    PubMed Central

    Jin, Eun-Heui; Zhang, Enji; Ko, Youngkwon; Sim, Woo Seog; Moon, Dong Eon; Yoon, Keon Jung; Hong, Jang Hee; Lee, Won Hyung

    2013-01-01

    Complex regional pain syndrome (CRPS) is a chronic, progressive, and devastating pain syndrome characterized by spontaneous pain, hyperalgesia, allodynia, altered skin temperature, and motor dysfunction. Although previous gene expression profiling studies have been conducted in animal pain models, there genome-wide expression profiling in the whole blood of CRPS patients has not been reported yet. Here, we successfully identified certain pain-related genes through genome-wide expression profiling in the blood from CRPS patients. We found that 80 genes were differentially expressed between 4 CRPS patients (2 CRPS I and 2 CRPS II) and 5 controls (cut-off value: 1.5-fold change and p<0.05). Most of those genes were associated with signal transduction, developmental processes, cell structure and motility, and immunity and defense. The expression levels of major histocompatibility complex class I A subtype (HLA-A29.1), matrix metalloproteinase 9 (MMP9), alanine aminopeptidase N (ANPEP), l-histidine decarboxylase (HDC), granulocyte colony-stimulating factor 3 receptor (G-CSF3R), and signal transducer and activator of transcription 3 (STAT3) genes selected from the microarray were confirmed in 24 CRPS patients and 18 controls by quantitative reverse transcription-polymerase chain reaction (qRT-PCR). We focused on the MMP9 gene that, by qRT-PCR, showed a statistically significant difference in expression in CRPS patients compared to controls with the highest relative fold change (4.0±1.23 times and p = 1.4×10−4). The up-regulation of MMP9 gene in the blood may be related to the pain progression in CRPS patients. Our findings, which offer a valuable contribution to the understanding of the differential gene expression in CRPS may help in the understanding of the pathophysiology of CRPS pain progression. PMID:24244504

  9. Genome-wide association analyses identify 18 new loci associated with serum urate concentrations

    PubMed Central

    Köttgen, Anna; Albrecht, Eva; Teumer, Alexander; Vitart, Veronique; Krumsiek, Jan; Hundertmark, Claudia; Pistis, Giorgio; Ruggiero, Daniela; O’Seaghdha, Conall M; Haller, Toomas; Yang, Qiong; Tanaka, Toshiko; Johnson, Andrew D; Kutalik, Zoltán; Smith, Albert V; Shi, Julia; Struchalin, Maksim; Middelberg, Rita P S; Brown, Morris J; Gaffo, Angelo L; Pirastu, Nicola; Li, Guo; Hayward, Caroline; Zemunik, Tatijana; Huffman, Jennifer; Yengo, Loic; Zhao, Jing Hua; Demirkan, Ayse; Feitosa, Mary F; Liu, Xuan; Malerba, Giovanni; Lopez, Lorna M; van der Harst, Pim; Li, Xinzhong; Kleber, Marcus E; Hicks, Andrew A; Nolte, Ilja M; Johansson, Asa; Murgia, Federico; Wild, Sarah H; Bakker, Stephan J L; Peden, John F; Dehghan, Abbas; Steri, Maristella; Tenesa, Albert; Lagou, Vasiliki; Salo, Perttu; Mangino, Massimo; Rose, Lynda M; Lehtimäki, Terho; Woodward, Owen M; Okada, Yukinori; Tin, Adrienne; Müller, Christian; Oldmeadow, Christopher; Putku, Margus; Czamara, Darina; Kraft, Peter; Frogheri, Laura; Thun, Gian Andri; Grotevendt, Anne; Gislason, Gauti Kjartan; Harris, Tamara B; Launer, Lenore J; McArdle, Patrick; Shuldiner, Alan R; Boerwinkle, Eric; Coresh, Josef; Schmidt, Helena; Schallert, Michael; Martin, Nicholas G; Montgomery, Grant W; Kubo, Michiaki; Nakamura, Yusuke; Tanaka, Toshihiro; Munroe, Patricia B; Samani, Nilesh J; Jacobs, David R; Liu, Kiang; D’Adamo, Pio; Ulivi, Sheila; Rotter, Jerome I; Psaty, Bruce M; Vollenweider, Peter; Waeber, Gerard; Campbell, Susan; Devuyst, Olivier; Navarro, Pau; Kolcic, Ivana; Hastie, Nicholas; Balkau, Beverley; Froguel, Philippe; Esko, Tõnu; Salumets, Andres; Khaw, Kay Tee; Langenberg, Claudia; Wareham, Nicholas J; Isaacs, Aaron; Kraja, Aldi; Zhang, Qunyuan; Wild, Philipp S; Scott, Rodney J; Holliday, Elizabeth G; Org, Elin; Viigimaa, Margus; Bandinelli, Stefania; Metter, Jeffrey E; Lupo, Antonio; Trabetti, Elisabetta; Sorice, Rossella; Döring, Angela; Lattka, Eva; Strauch, Konstantin; Theis, Fabian; Waldenberger, Melanie; Wichmann, H-Erich; Davies, Gail; Gow, Alan J; Bruinenberg, Marcel; Study, LifeLines Cohort; Stolk, Ronald P; Kooner, Jaspal S; Zhang, Weihua; Winkelmann, Bernhard R; Boehm, Bernhard O; Lucae, Susanne; Penninx, Brenda W; Smit, Johannes H; Curhan, Gary; Mudgal, Poorva; Plenge, Robert M; Portas, Laura; Persico, Ivana; Kirin, Mirna; Wilson, James F; Leach, Irene Mateo; van Gilst, Wiek H; Goel, Anuj; Ongen, Halit; Hofman, Albert; Rivadeneira, Fernando; Uitterlinden, Andre G; Imboden, Medea; von Eckardstein, Arnold; Cucca, Francesco; Nagaraja, Ramaiah; Piras, Maria Grazia; Nauck, Matthias; Schurmann, Claudia; Budde, Kathrin; Ernst, Florian; Farrington, Susan M; Theodoratou, Evropi; Prokopenko, Inga; Stumvoll, Michael; Jula, Antti; Perola, Markus; Salomaa, Veikko; Shin, So-Youn; Spector, Tim D; Sala, Cinzia; Ridker, Paul M; Kähönen, Mika; Viikari, Jorma; Hengstenberg, Christian; Nelson, Christopher P; Consortium, CARDIoGRAM; Consortium, DIAGRAM; Consortium, ICBP; Consortium, MAGIC; Meschia, James F; Nalls, Michael A; Sharma, Pankaj; Singleton, Andrew B; Kamatani, Naoyuki; Zeller, Tanja; Burnier, Michel; Attia, John; Laan, Maris; Klopp, Norman; Hillege, Hans L; Kloiber, Stefan; Choi, Hyon; Pirastu, Mario; Tore, Silvia; Probst-Hensch, Nicole M; Völzke, Henry; Gudnason, Vilmundur; Parsa, Afshin; Schmidt, Reinhold; Whitfield, John B; Fornage, Myriam; Gasparini, Paolo; Siscovick, David S; Polašek, Ozren; Campbell, Harry; Rudan, Igor; Bouatia-Naji, Nabila; Metspalu, Andres; Loos, Ruth J F; van Duijn, Cornelia M; Borecki, Ingrid B; Ferrucci, Luigi; Gambaro, Giovanni; Deary, Ian J; Wolffenbuttel, Bruce H R; Chambers, John C; März, Winfried; Pramstaller, Peter P; Snieder, Harold; Gyllensten, Ulf; Wright, Alan F; Navis, Gerjan; Watkins, Hugh; Witteman, Jacqueline C M; Sanna, Serena; Schipf, Sabine; Dunlop, Malcolm G; Tönjes, Anke; Ripatti, Samuli; Soranzo, Nicole; Toniolo, Daniela; Chasman, Daniel I; Raitakari, Olli; Kao, W H Linda; Ciullo, Marina; Fox, Caroline S; Caulfield, Mark; Bochud, Murielle; Gieger, Christian

    2013-01-01

    Elevated serum urate concentrations can cause gout, a prevalent and painful inflammatory arthritis. By combining data from >140,000 individuals of European ancestry within the Global Urate Genetics Consortium (GUGC), we identified and replicated 28 genome-wide significant loci in association with serum urate concentrations (18 new regions in or near TRIM46, INHBB, SFMBT1, TMEM171, VEGFA, BAZ1B, PRKAG2, STC1, HNF4G, A1CF, ATXN2, UBE2Q2, IGF1R, NFAT5, MAF, HLF, ACVR1B-ACVRL1 and B3GNT4). Associations for many of the loci were of similar magnitude in individuals of non-European ancestry. We further characterized these loci for associations with gout, transcript expression and the fractional excretion of urate. Network analyses implicate the inhibins-activins signaling pathways and glucose metabolism in systemic urate control. New candidate genes for serum urate concentration highlight the importance of metabolic control of urate production and excretion, which may have implications for the treatment and prevention of gout. PMID:23263486

  10. Lessons for livestock genomics from genome and transcriptome sequencing in cattle and other mammals.

    PubMed

    Taylor, Jeremy F; Whitacre, Lynsey K; Hoff, Jesse L; Tizioto, Polyana C; Kim, JaeWoo; Decker, Jared E; Schnabel, Robert D

    2016-08-17

    Decreasing sequencing costs and development of new protocols for characterizing global methylation, gene expression patterns and regulatory regions have stimulated the generation of large livestock datasets. Here, we discuss experiences in the analysis of whole-genome and transcriptome sequence data. We analyzed whole-genome sequence (WGS) data from 132 individuals from five canid species (Canis familiaris, C. latrans, C. dingo, C. aureus and C. lupus) and 61 breeds, three bison (Bison bison), 64 water buffalo (Bubalus bubalis) and 297 bovines from 17 breeds. By individual, data vary in extent of reference genome depth of coverage from 4.9X to 64.0X. We have also analyzed RNA-seq data for 580 samples representing 159 Bos taurus and Rattus norvegicus animals and 98 tissues. By aligning reads to a reference assembly and calling variants, we assessed effects of average depth of coverage on the actual coverage and on the number of called variants. We examined the identity of unmapped reads by assembling them and querying produced contigs against the non-redundant nucleic acids database. By imputing high-density single nucleotide polymorphism data on 4010 US registered Angus animals to WGS using Run4 of the 1000 Bull Genomes Project and assessing the accuracy of imputation, we identified misassembled reference sequence regions. We estimate that a 24X depth of coverage is required to achieve 99.5 % coverage of the reference assembly and identify 95 % of the variants within an individual's genome. Genomes sequenced to low average coverage (e.g., <10X) may fail to cover 10 % of the reference genome and identify <75 % of variants. About 10 % of genomic DNA or transcriptome sequence reads fail to align to the reference assembly. These reads include loci missing from the reference assembly and misassembled genes and interesting symbionts, commensal and pathogenic organisms. Assembly errors and a lack of annotation of functional elements significantly limit the utility of

  11. Whole-genome sequence analyses of Western Central African Pygmy hunter-gatherers reveal a complex demographic history and identify candidate genes under positive natural selection

    PubMed Central

    Hsieh, PingHsun; Veeramah, Krishna R.; Lachance, Joseph; Tishkoff, Sarah A.; Wall, Jeffrey D.; Hammer, Michael F.; Gutenkunst, Ryan N.

    2016-01-01

    African Pygmies practicing a mobile hunter-gatherer lifestyle are phenotypically and genetically diverged from other anatomically modern humans, and they likely experienced strong selective pressures due to their unique lifestyle in the Central African rainforest. To identify genomic targets of adaptation, we sequenced the genomes of four Biaka Pygmies from the Central African Republic and jointly analyzed these data with the genome sequences of three Baka Pygmies from Cameroon and nine Yoruba famers. To account for the complex demographic history of these populations that includes both isolation and gene flow, we fit models using the joint allele frequency spectrum and validated them using independent approaches. Our two best-fit models both suggest ancient divergence between the ancestors of the farmers and Pygmies, 90,000 or 150,000 yr ago. We also find that bidirectional asymmetric gene flow is statistically better supported than a single pulse of unidirectional gene flow from farmers to Pygmies, as previously suggested. We then applied complementary statistics to scan the genome for evidence of selective sweeps and polygenic selection. We found that conventional statistical outlier approaches were biased toward identifying candidates in regions of high mutation or low recombination rate. To avoid this bias, we assigned P-values for candidates using whole-genome simulations incorporating demography and variation in both recombination and mutation rates. We found that genes and gene sets involved in muscle development, bone synthesis, immunity, reproduction, cell signaling and development, and energy metabolism are likely to be targets of positive natural selection in Western African Pygmies or their recent ancestors. PMID:26888263

  12. Integrating transcriptome and genome re-sequencing data to identify key genes and mutations affecting chicken eggshell qualities.

    PubMed

    Zhang, Quan; Zhu, Feng; Liu, Long; Zheng, Chuan Wei; Wang, De He; Hou, Zhuo Cheng; Ning, Zhong Hua

    2015-01-01

    Eggshell damages lead to economic losses in the egg production industry and are a threat to human health. We examined 49-wk-old Rhode Island White hens (Gallus gallus) that laid eggs having shells with significantly different strengths and thicknesses. We used HiSeq 2000 (Illumina) sequencing to characterize the chicken transcriptome and whole genome to identify the key genes and genetic mutations associated with eggshell calcification. We identified a total of 14,234 genes expressed in the chicken uterus, representing 89% of all annotated chicken genes. A total of 889 differentially expressed genes were identified by comparing low eggshell strength (LES) and normal eggshell strength (NES) genomes. The DEGs are enriched in calcification-related processes, including calcium ion transport and calcium signaling pathways as revealed by gene ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG) pathway analysis. Some important matrix proteins, such as OC-116, LTF and SPP1, were also expressed differentially between two groups. A total of 3,671,919 single-nucleotide polymorphisms (SNPs) and 508,035 Indels were detected in protein coding genes by whole-genome re-sequencing, including 1775 non-synonymous variations and 19 frame-shift Indels in DEGs. SNPs and Indels found in this study could be further investigated for eggshell traits. This is the first report to integrate the transcriptome and genome re-sequencing to target the genetic variations which decreased the eggshell qualities. These findings further advance our understanding of eggshell calcification in the chicken uterus.

  13. Genome-wide analyses for personality traits identify six genomic loci and show correlations with psychiatric disorders

    PubMed Central

    Lo, Min-Tzu; Hinds, David A.; Tung, Joyce Y.; Franz, Carol; Fan, Chun-Chieh; Wang, Yunpeng; Smeland, Olav B.; Schork, Andrew; Holland, Dominic; Kauppi, Karolina; Sanyal, Nilotpal; Escott-Price, Valentina; Smith, Daniel J.; O'Donovan, Michael; Stefansson, Hreinn; Bjornsdottir, Gyda; Thorgeirsson, Thorgeir E.; Stefansson, Kari; McEvoy, Linda K.; Dale, Anders M.; Andreassen, Ole A.; Chen, Chi-Hua

    2017-01-01

    Summary Personality is influenced by genetic and environmental factors1, and associated with mental health. However, the underlying genetic determinants are largely unknown. We identified six genetic loci, including five novel loci2,3, significantly associated with personality traits in a meta-analysis of genome-wide association studies (N=123,132–260,861). Of these genome-wide significant loci, extraversion was associated with variants in WSCD2 and near PCDH15, and neuroticism with variants on chromosome 8p23.1 and in L3MBTL2. We performed a principal component analysis to extract major dimensions underlying genetic variations among five personality traits and six psychiatric disorders (N=5,422–18,759). The first genetic dimension separated personality traits and psychiatric disorders, except that neuroticism and openness to experience were clustered with the disorders. High genetic correlations were found between extraversion and attention-deficit/hyperactivity disorder (ADHD), and between openness and schizophrenia/bipolar disorder. The second genetic dimension was closely aligned with extraversion-introversion and grouped neuroticism with internalizing psychopathology (e.g., depression/anxiety). PMID:27918536

  14. Genome-wide analyses for personality traits identify six genomic loci and show correlations with psychiatric disorders.

    PubMed

    Lo, Min-Tzu; Hinds, David A; Tung, Joyce Y; Franz, Carol; Fan, Chun-Chieh; Wang, Yunpeng; Smeland, Olav B; Schork, Andrew; Holland, Dominic; Kauppi, Karolina; Sanyal, Nilotpal; Escott-Price, Valentina; Smith, Daniel J; O'Donovan, Michael; Stefansson, Hreinn; Bjornsdottir, Gyda; Thorgeirsson, Thorgeir E; Stefansson, Kari; McEvoy, Linda K; Dale, Anders M; Andreassen, Ole A; Chen, Chi-Hua

    2017-01-01

    Personality is influenced by genetic and environmental factors and associated with mental health. However, the underlying genetic determinants are largely unknown. We identified six genetic loci, including five novel loci, significantly associated with personality traits in a meta-analysis of genome-wide association studies (N = 123,132-260,861). Of these genome-wide significant loci, extraversion was associated with variants in WSCD2 and near PCDH15, and neuroticism with variants on chromosome 8p23.1 and in L3MBTL2. We performed a principal component analysis to extract major dimensions underlying genetic variations among five personality traits and six psychiatric disorders (N = 5,422-18,759). The first genetic dimension separated personality traits and psychiatric disorders, except that neuroticism and openness to experience were clustered with the disorders. High genetic correlations were found between extraversion and attention-deficit-hyperactivity disorder (ADHD) and between openness and schizophrenia and bipolar disorder. The second genetic dimension was closely aligned with extraversion-introversion and grouped neuroticism with internalizing psychopathology (e.g., depression or anxiety).

  15. Bacillus subtilis genome diversity.

    PubMed

    Earl, Ashlee M; Losick, Richard; Kolter, Roberto

    2007-02-01

    Microarray-based comparative genomic hybridization (M-CGH) is a powerful method for rapidly identifying regions of genome diversity among closely related organisms. We used M-CGH to examine the genome diversity of 17 strains belonging to the nonpathogenic species Bacillus subtilis. Our M-CGH results indicate that there is considerable genetic heterogeneity among members of this species; nearly one-third of Bsu168-specific genes exhibited variability, as measured by the microarray hybridization intensities. The variable loci include those encoding proteins involved in antibiotic production, cell wall synthesis, sporulation, and germination. The diversity in these genes may reflect this organism's ability to survive in diverse natural settings.

  16. Investigation of 95 variants identified in a genome-wide study for association with mortality after acute coronary syndrome

    PubMed Central

    2011-01-01

    Background Genome-wide association studies (GWAS) have identified new candidate genes for the occurrence of acute coronary syndrome (ACS), but possible effects of such genes on survival following ACS have yet to be investigated. Methods We examined 95 polymorphisms in 69 distinct gene regions identified in a GWAS for premature myocardial infarction for their association with post-ACS mortality among 811 whites recruited from university-affiliated hospitals in Kansas City, Missouri. We then sought replication of a positive genetic association in a large, racially diverse cohort of myocardial infarction patients (N = 2284) using Kaplan-Meier survival analyses and Cox regression to adjust for relevant covariates. Finally, we investigated the apparent association further in 6086 additional coronary artery disease patients. Results After Cox adjustment for other ACS risk factors, of 95 SNPs tested in 811 whites only the association with the rs6922269 in MTHFD1L was statistically significant, with a 2.6-fold mortality hazard (P = 0.007). The recessive A/A genotype was of borderline significance in an age- and race-adjusted analysis of the entire combined cohort (N = 3095; P = 0.052), but this finding was not confirmed in independent cohorts (N = 6086). Conclusions We found no support for the hypothesis that the GWAS-identified variants in this study substantially alter the probability of post-ACS survival. Large-scale, collaborative, genome-wide studies may be required in order to detect genetic variants that are robustly associated with survival in patients with coronary artery disease. PMID:21957892

  17. Six Novel Loci Associated with Circulating VEGF Levels Identified by a Meta-analysis of Genome-Wide Association Studies

    PubMed Central

    Song, Ci; Nutile, Teresa; Vernon Smith, Albert; Concas, Maria Pina; Traglia, Michela; Barbieri, Caterina; Ndiaye, Ndeye Coumba; Stathopoulou, Maria G.; Lagou, Vasiliki; Maestrale, Giovanni Battista; Sala, Cinzia; Debette, Stephanie; Kovacs, Peter; Lind, Lars; Lamont, John; Fitzgerald, Peter; Tönjes, Anke; Gudnason, Vilmundur; Toniolo, Daniela; Pirastu, Mario; Bellenguez, Celine; Vasan, Ramachandran S.; Ingelsson, Erik; Leutenegger, Anne-Louise; Johnson, Andrew D.; DeStefano, Anita L.; Visvikis-Siest, Sophie; Seshadri, Sudha; Ciullo, Marina

    2016-01-01

    Vascular endothelial growth factor (VEGF) is an angiogenic and neurotrophic factor, secreted by endothelial cells, known to impact various physiological and disease processes from cancer to cardiovascular disease and to be pharmacologically modifiable. We sought to identify novel loci associated with circulating VEGF levels through a genome-wide association meta-analysis combining data from European-ancestry individuals and using a dense variant map from 1000 genomes imputation panel. Six discovery cohorts including 13,312 samples were analyzed, followed by in-silico and de-novo replication studies including an additional 2,800 individuals. A total of 10 genome-wide significant variants were identified at 7 loci. Four were novel loci (5q14.3, 10q21.3, 16q24.2 and 18q22.3) and the leading variants at these loci were rs114694170 (MEF2C, P = 6.79x10-13), rs74506613 (JMJD1C, P = 1.17x10-19), rs4782371 (ZFPM1, P = 1.59x10-9) and rs2639990 (ZADH2, P = 1.72x10-8), respectively. We also identified two new independent variants (rs34528081, VEGFA, P = 1.52x10-18; rs7043199, VLDLR-AS1, P = 5.12x10-14) at the 3 previously identified loci and strengthened the evidence for the four previously identified SNPs (rs6921438, LOC100132354, P = 7.39x10-1467; rs1740073, C6orf223, P = 2.34x10-17; rs6993770, ZFPM2, P = 2.44x10-60; rs2375981, KCNV2, P = 1.48x10-100). These variants collectively explained up to 52% of the VEGF phenotypic variance. We explored biological links between genes in the associated loci using Ingenuity Pathway Analysis that emphasized their roles in embryonic development and function. Gene set enrichment analysis identified the ERK5 pathway as enriched in genes containing VEGF associated variants. eQTL analysis showed, in three of the identified regions, variants acting as both cis and trans eQTLs for multiple genes. Most of these genes, as well as some of those in the associated loci, were involved in platelet biogenesis and functionality, suggesting the

  18. HLA Diversity in the 1000 Genomes Dataset

    PubMed Central

    Gourraud, Pierre-Antoine; Khankhanian, Pouya; Cereb, Nezih; Yang, Soo Young; Feolo, Michael; Maiers, Martin; D. Rioux, John; Hauser, Stephen; Oksenberg, Jorge

    2014-01-01

    The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation by sequencing at a level that should allow the genome-wide detection of most variants with frequencies as low as 1%. However, in the major histocompatibility complex (MHC), only the top 10 most frequent haplotypes are in the 1% frequency range whereas thousands of haplotypes are present at lower frequencies. Given the limitation of both the coverage and the read length of the sequences generated by the 1000 Genomes Project, the highly variable positions that define HLA alleles may be difficult to identify. We used classical Sanger sequencing techniques to type the HLA-A, HLA-B, HLA-C, HLA-DRB1 and HLA-DQB1 genes in the available 1000 Genomes samples and combined the results with the 103,310 variants in the MHC region genotyped by the 1000 Genomes Project. Using pairwise identity-by-descent distances between individuals and principal component analysis, we established the relationship between ancestry and genetic diversity in the MHC region. As expected, both the MHC variants and the HLA phenotype can identify the major ancestry lineage, informed mainly by the most frequent HLA haplotypes. To some extent, regions of the genome with similar genetic or similar recombination rate have similar properties. An MHC-centric analysis underlines departures between the ancestral background of the MHC and the genome-wide picture. Our analysis of linkage disequilibrium (LD) decay in these samples suggests that overestimation of pairwise LD occurs due to a limited sampling of the MHC diversity. This collection of HLA-specific MHC variants, available on the dbMHC portal, is a valuable resource for future analyses of the role of MHC in population and disease studies. PMID:24988075

  19. HLA diversity in the 1000 genomes dataset.

    PubMed

    Gourraud, Pierre-Antoine; Khankhanian, Pouya; Cereb, Nezih; Yang, Soo Young; Feolo, Michael; Maiers, Martin; Rioux, John D; Hauser, Stephen; Oksenberg, Jorge

    2014-01-01

    The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation by sequencing at a level that should allow the genome-wide detection of most variants with frequencies as low as 1%. However, in the major histocompatibility complex (MHC), only the top 10 most frequent haplotypes are in the 1% frequency range whereas thousands of haplotypes are present at lower frequencies. Given the limitation of both the coverage and the read length of the sequences generated by the 1000 Genomes Project, the highly variable positions that define HLA alleles may be difficult to identify. We used classical Sanger sequencing techniques to type the HLA-A, HLA-B, HLA-C, HLA-DRB1 and HLA-DQB1 genes in the available 1000 Genomes samples and combined the results with the 103,310 variants in the MHC region genotyped by the 1000 Genomes Project. Using pairwise identity-by-descent distances between individuals and principal component analysis, we established the relationship between ancestry and genetic diversity in the MHC region. As expected, both the MHC variants and the HLA phenotype can identify the major ancestry lineage, informed mainly by the most frequent HLA haplotypes. To some extent, regions of the genome with similar genetic or similar recombination rate have similar properties. An MHC-centric analysis underlines departures between the ancestral background of the MHC and the genome-wide picture. Our analysis of linkage disequilibrium (LD) decay in these samples suggests that overestimation of pairwise LD occurs due to a limited sampling of the MHC diversity. This collection of HLA-specific MHC variants, available on the dbMHC portal, is a valuable resource for future analyses of the role of MHC in population and disease studies.

  20. Defining functional DNA elements in the human genome

    PubMed Central

    Kellis, Manolis; Wold, Barbara; Snyder, Michael P.; Bernstein, Bradley E.; Kundaje, Anshul; Marinov, Georgi K.; Ward, Lucas D.; Birney, Ewan; Crawford, Gregory E.; Dekker, Job; Dunham, Ian; Elnitski, Laura L.; Farnham, Peggy J.; Feingold, Elise A.; Gerstein, Mark; Giddings, Morgan C.; Gilbert, David M.; Gingeras, Thomas R.; Green, Eric D.; Guigo, Roderic; Hubbard, Tim; Kent, Jim; Lieb, Jason D.; Myers, Richard M.; Pazin, Michael J.; Ren, Bing; Stamatoyannopoulos, John A.; Weng, Zhiping; White, Kevin P.; Hardison, Ross C.

    2014-01-01

    With the completion of the human genome sequence, attention turned to identifying and annotating its functional DNA elements. As a complement to genetic and comparative genomics approaches, the Encyclopedia of DNA Elements Project was launched to contribute maps of RNA transcripts, transcriptional regulator binding sites, and chromatin states in many cell types. The resulting genome-wide data reveal sites of biochemical activity with high positional resolution and cell type specificity that facilitate studies of gene regulation and interpretation of noncoding variants associated with human disease. However, the biochemically active regions cover a much larger fraction of the genome than do evolutionarily conserved regions, raising the question of whether nonconserved but biochemically active regions are truly functional. Here, we review the strengths and limitations of biochemical, evolutionary, and genetic approaches for defining functional DNA segments, potential sources for the observed differences in estimated genomic coverage, and the biological implications of these discrepancies. We also analyze the relationship between signal intensity, genomic coverage, and evolutionary conservation. Our results reinforce the principle that each approach provides complementary information and that we need to use combinations of all three to elucidate genome function in human biology and disease. PMID:24753594

  1. Conserved microstructure of the Brassica B Genome of Brassica nigra in relation to homologous regions of Arabidopsis thaliana, B. rapa and B. oleracea

    PubMed Central

    2013-01-01

    Background The Brassica B genome is known to carry several important traits, yet there has been limited analyses of its underlying genome structure, especially in comparison to the closely related A and C genomes. A bacterial artificial chromosome (BAC) library of Brassica nigra was developed and screened with 17 genes from a 222 kb region of A. thaliana that had been well characterised in both the Brassica A and C genomes. Results Fingerprinting of 483 apparently non-redundant clones defined physical contigs for the corresponding regions in B. nigra. The target region is duplicated in A. thaliana and six homologous contigs were found in B. nigra resulting from the whole genome triplication event shared by the Brassiceae tribe. BACs representative of each region were sequenced to elucidate the level of microscale rearrangements across the Brassica species divide. Conclusions Although the B genome species separated from the A/C lineage some 6 Mya, comparisons between the three paleopolyploid Brassica genomes revealed extensive conservation of gene content and sequence identity. The level of fractionation or gene loss varied across genomes and genomic regions; however, the greatest loss of genes was observed to be common to all three genomes. One large-scale chromosomal rearrangement differentiated the B genome suggesting such events could contribute to the lack of recombination observed between B genome species and those of the closely related A/C lineage. PMID:23586706

  2. Open Window: When Easily Identifiable Genomes and Traits Are in the Public Domain

    PubMed Central

    Angrist, Misha

    2014-01-01

    “One can't be of an enquiring and experimental nature, and still be very sensible.” - Charles Fort [1] As the costs of personal genetic testing “self-quantification” fall, publicly accessible databases housing people's genotypic and phenotypic information are gradually increasing in number and scope. The latest entrant is openSNP, which allows participants to upload their personal genetic/genomic and self-reported phenotypic data. I believe the emergence of such open repositories of human biological data is a natural reflection of inquisitive and digitally literate people's desires to make genomic and phenotypic information more easily available to a community beyond the research establishment. Such unfettered databases hold the promise of contributing mightily to science, science education and medicine. That said, in an age of increasingly widespread governmental and corporate surveillance, we would do well to be mindful that genomic DNA is uniquely identifying. Participants in open biological databases are engaged in a real-time experiment whose outcome is unknown. PMID:24647311

  3. Comparative genomics of wild type yeast strains unveils important genome diversity

    PubMed Central

    Carreto, Laura; Eiriz, Maria F; Gomes, Ana C; Pereira, Patrícia M; Schuller, Dorit; Santos, Manuel AS

    2008-01-01

    Background Genome variability generates phenotypic heterogeneity and is of relevance for adaptation to environmental change, but the extent of such variability in natural populations is still poorly understood. For example, selected Saccharomyces cerevisiae strains are variable at the ploidy level, have gene amplifications, changes in chromosome copy number, and gross chromosomal rearrangements. This suggests that genome plasticity provides important genetic diversity upon which natural selection mechanisms can operate. Results In this study, we have used wild-type S. cerevisiae (yeast) strains to investigate genome variation in natural and artificial environments. We have used comparative genome hybridization on array (aCGH) to characterize the genome variability of 16 yeast strains, of laboratory and commercial origin, isolated from vineyards and wine cellars, and from opportunistic human infections. Interestingly, sub-telomeric instability was associated with the clinical phenotype, while Ty element insertion regions determined genomic differences of natural wine fermentation strains. Copy number depletion of ASP3 and YRF1 genes was found in all wild-type strains. Other gene families involved in transmembrane transport, sugar and alcohol metabolism or drug resistance had copy number changes, which also distinguished wine from clinical isolates. Conclusion We have isolated and genotyped more than 1000 yeast strains from natural environments and carried out an aCGH analysis of 16 strains representative of distinct genotype clusters. Important genomic variability was identified between these strains, in particular in sub-telomeric regions and in Ty-element insertion sites, suggesting that this type of genome variability is the main source of genetic diversity in natural populations of yeast. The data highlights the usefulness of yeast as a model system to unravel intraspecific natural genome diversity and to elucidate how natural selection shapes the yeast genome

  4. DMRT gene cluster analysis in the platypus: new insights into genomic organization and regulatory regions.

    PubMed

    El-Mogharbel, Nisrine; Wakefield, Matthew; Deakin, Janine E; Tsend-Ayush, Enkhjargal; Grützner, Frank; Alsop, Amber; Ezaz, Tariq; Marshall Graves, Jennifer A

    2007-01-01

    We isolated and characterized a cluster of platypus DMRT genes and compared their arrangement, location, and sequence across vertebrates. The DMRT gene cluster on human 9p24.3 harbors, in order, DMRT1, DMRT3, and DMRT2, which share a DM domain. DMRT1 is highly conserved and involved in sexual development in vertebrates, and deletions in this region cause sex reversal in humans. Sequence comparisons of DMRT genes between species have been valuable in identifying exons, control regions, and conserved nongenic regions (CNGs). The addition of platypus sequences is expected to be particularly valuable, since monotremes fill a gap in the vertebrate genome coverage. We therefore isolated and fully sequenced platypus BAC clones containing DMRT3 and DMRT2 as well as DMRT1 and then generated multispecies alignments and ran prediction programs followed by experimental verification to annotate this gene cluster. We found that the three genes have 58-66% identity to their human orthologues, lie in the same order as in other vertebrates, and colocate on 1 of the 10 platypus sex chromosomes, X5. We also predict that optimal annotation of the newly sequenced platypus genome will be challenging. The analysis of platypus sequence revealed differences in structure and sequence of the DMRT gene cluster. Multispecies comparison was particularly effective for detecting CNGs, revealing several novel potential regulatory regions within DMRT3 and DMRT2 as well as DMRT1. RT-PCR indicated that platypus DMRT1 and DMRT3 are expressed specifically in the adult testis (and not ovary), but DMRT2 has a wider expression profile, as it does for other mammals. The platypus DMRT1 expression pattern, and its location on an X chromosome, suggests an involvement in monotreme sexual development.

  5. Genome-wide identification of runs of homozygosity islands and associated genes in local dairy cattle breeds.

    PubMed

    Mastrangelo, S; Sardina, M T; Tolone, M; Di Gerlando, R; Sutera, A M; Fontanesi, L; Portolano, B

    2018-03-26

    Runs of homozygosity (ROH) are widely used as predictors of whole-genome inbreeding levels in cattle. They identify regions that have an unfavorable effect on a phenotype when homozygous, but also identify the genes associated with traits of economic interest present in these regions. Here, the distribution of ROH islands and enriched genes within these regions in four dairy cattle breeds were investigated. Cinisara (71), Modicana (72), Reggiana (168) and Italian Holstein (96) individuals were genotyped using the 50K v2 Illumina BeadChip. The genomic regions most commonly associated with ROHs were identified by selecting the top 1% of the single nucleotide polymorphisms (SNPs) most commonly observed in the ROH of each breed. In total, 11 genomic regions were identified in Cinisara and Italian Holstein, and eight in Modicana and Reggiana, indicating an increased ROH frequency level. Generally, ROH islands differed between breeds. The most homozygous region (>45% of individuals with ROH) was found in Modicana on chromosome 6 within a quantitative trail locus affecting milk fat and protein concentrations. We identified between 126 and 347 genes within ROH islands, which are involved in multiple signaling and signal transduction pathways in a wide variety of biological processes. The gene ontology enrichment provided information on possible molecular functions, biological processes and cellular components under selection related to milk production, reproduction, immune response and resistance/susceptibility to infection and diseases. Thus, scanning the genome for ROH could be an alternative strategy to detect genomic regions and genes related to important economic traits.

  6. L1-associated genomic regions are deleted in somatic cells of the healthy human brain.

    PubMed

    Erwin, Jennifer A; Paquola, Apuã C M; Singer, Tatjana; Gallina, Iryna; Novotny, Mark; Quayle, Carolina; Bedrosian, Tracy A; Alves, Francisco I A; Butcher, Cheyenne R; Herdy, Joseph R; Sarkar, Anindita; Lasken, Roger S; Muotri, Alysson R; Gage, Fred H

    2016-12-01

    The healthy human brain is a mosaic of varied genomes. Long interspersed element-1 (LINE-1 or L1) retrotransposition is known to create mosaicism by inserting L1 sequences into new locations of somatic cell genomes. Using a machine learning-based, single-cell sequencing approach, we discovered that somatic L1-associated variants (SLAVs) are composed of two classes: L1 retrotransposition insertions and retrotransposition-independent L1-associated variants. We demonstrate that a subset of SLAVs comprises somatic deletions generated by L1 endonuclease cutting activity. Retrotransposition-independent rearrangements in inherited L1s resulted in the deletion of proximal genomic regions. These rearrangements were resolved by microhomology-mediated repair, which suggests that L1-associated genomic regions are hotspots for somatic copy number variants in the brain and therefore a heritable genetic contributor to somatic mosaicism. We demonstrate that SLAVs are present in crucial neural genes, such as DLG2 (also called PSD93), and affect 44-63% of cells of the cells in the healthy brain.

  7. L1-Associated Genomic Regions are Deleted in Somatic Cells of the Healthy Human Brain

    PubMed Central

    Erwin, Jennifer A.; Paquola, Apuã C.M.; Singer, Tatjana; Gallina, Iryna; Novotny, Mark; Quayle, Carolina; Bedrosian, Tracy; Ivanio, Francisco; Butcher, Cheyenne R.; Herdy, Joseph R.; Sarkar, Anindita; Lasken, Roger S.; Muotri, Alysson R.; Gage, Fred H.

    2016-01-01

    The healthy human brain is a mosaic of varied genomes. L1 retrotransposition is known to create mosaicism by inserting L1 sequences into new locations of somatic cell genomes. Using a machine learning-based, single-cell sequencing approach, we discovered that Somatic L1-Associated Variants (SLAVs) are actually composed of two classes: L1 retrotransposition insertions and retrotransposition-independent L1-associated variants. We demonstrate that a subset of SLAVs are, in fact, somatic deletions generated by L1 endonuclease cutting activity. Retrotransposition- independent rearrangements within inherited L1s resulted in the deletion of proximal genomic regions. These rearrangements were resolved by microhomology-mediated repair, which suggests that L1-associated genomic regions are hotspots for somatic copy number variants in the brain and therefore a heritable genetic contributor to somatic mosaicism. We demonstrate that SLAVs are present in crucial neural genes, such as DLG2/PSD93, and affect between 44–63% of cells of the cells in the healthy brain. PMID:27618310

  8. Genome-wide patterns of copy number variation in the Chinese yak genome.

    PubMed

    Zhang, Xiao; Wang, Kun; Wang, Lizhong; Yang, Yongzhi; Ni, Zhengqiang; Xie, Xiuyue; Shao, Xuemin; Han, Jin; Wan, Dongshi; Qiu, Qiang

    2016-05-20

    Copy number variation (CNV) represents an important source of genetic divergence that can produce drastic phenotypic differences and may therefore be subject to selection during domestication and environmental adaptation. To investigate the evolutionary dynamics of CNV in the yak genome, we used a read depth approach to detect CNV based on genome resequencing data from 14 wild and 65 domestic yaks and determined CNV regions related to domestication and adaptations to high-altitude. We identified 2,634 CNV regions (CNVRs) comprising a total of 153 megabases (5.7 % of the yak genome) and 3,879 overlapping annotated genes. Comparison between domestic and wild yak populations identified 121 potentially selected CNVRs, harboring genes related to neuronal development, reproduction, nutrition and energy metabolism. In addition, we found 85 CNVRs that are significantly different between domestic yak living in high- and low-altitude areas, including three genes related to hypoxia response and six related to immune defense. This analysis shows that genic CNVs may play an important role in phenotypic changes during yak domestication and adaptation to life at high-altitude. We present the first refined CNV map for yak along with comprehensive genomic analysis of yak CNV. Our results provide new insights into the genetic basis of yak domestication and adaptation to living in a high-altitude environment, as well as a valuable genetic resource that will facilitate future CNV association studies of important traits in yak and other bovid species.

  9. Epigenetics, chromatin and genome organization: recent advances from the ENCODE project.

    PubMed

    Siggens, L; Ekwall, K

    2014-09-01

    The organization of the genome into functional units, such as enhancers and active or repressed promoters, is associated with distinct patterns of DNA and histone modifications. The Encyclopedia of DNA Elements (ENCODE) project has advanced our understanding of the principles of genome, epigenome and chromatin organization, identifying hundreds of thousands of potential regulatory regions and transcription factor binding sites. Part of the ENCODE consortium, GENCODE, has annotated the human genome with novel transcripts including new noncoding RNAs and pseudogenes, highlighting transcriptional complexity. Many disease variants identified in genome-wide association studies are located within putative enhancer regions defined by the ENCODE project. Understanding the principles of chromatin and epigenome organization will help to identify new disease mechanisms, biomarkers and drug targets, particularly as ongoing epigenome mapping projects generate data for primary human cell types that play important roles in disease. © 2014 The Association for the Publication of the Journal of Internal Medicine.

  10. Genome-Wide Association Analysis to Identify Loci for Milk Yield in Gyr Breed

    USDA-ARS?s Scientific Manuscript database

    A genome scan was conducted to identify QTL affecting milk yield in a Brazilian Gyr population of progeny test bulls (N=319). Data used in this study was derived from traditional genetic evaluation records computed by the Embrapa Dairy Cattleand released in May/2009 (http://www.cnpgl.embrapa.br/nova...

  11. Evaluation of artificial selection in Standard Poodles using whole-genome sequencing.

    PubMed

    Friedenberg, Steven G; Meurs, Kathryn M; Mackay, Trudy F C

    2016-12-01

    Identifying regions of artificial selection within dog breeds may provide insights into genetic variation that underlies breed-specific traits or diseases-particularly if these traits or disease predispositions are fixed within a breed. In this study, we searched for runs of homozygosity (ROH) and calculated the d i statistic (which is based upon F ST ) to identify regions of artificial selection in Standard Poodles using high-coverage, whole-genome sequencing data of 15 Standard Poodles and 49 dogs across seven other breeds. We identified consensus ROH regions ≥1 Mb in length and common to at least ten Standard Poodles covering 0.6 % of the genome, and d i regions that most distinguish Standard Poodles from other breeds covering 3.7 % of the genome. Within these regions, we identified enriched gene pathways related to olfaction, digestion, and taste, as well as pathways related to adrenal hormone biosynthesis, T cell function, and protein ubiquitination that could contribute to the pathogenesis of some Poodle-prevalent autoimmune diseases. We also validated variants related to hair coat and skull morphology that have previously been identified as being under selective pressure in Poodles, and flagged additional polymorphisms in genes such as ITGA2B, CBX4, and TNXB that may represent strong candidates for other common Poodle disorders.

  12. Genome-Wide Association Mapping Combined with Reverse Genetics Identifies New Effectors of Low Water Potential-Induced Proline Accumulation in Arabidopsis1[W][OPEN

    PubMed Central

    Verslues, Paul E.; Lasky, Jesse R.; Juenger, Thomas E.; Liu, Tzu-Wen; Kumar, M. Nagaraj

    2014-01-01

    Arabidopsis (Arabidopsis thaliana) exhibits natural genetic variation in drought response, including varying levels of proline (Pro) accumulation under low water potential. As Pro accumulation is potentially important for stress tolerance and cellular redox control, we conducted a genome-wide association (GWAS) study of low water potential-induced Pro accumulation using a panel of natural accessions and publicly available single-nucleotide polymorphism (SNP) data sets. Candidate genomic regions were prioritized for subsequent study using metrics considering both the strength and spatial clustering of the association signal. These analyses found many candidate regions likely containing gene(s) influencing Pro accumulation. Reverse genetic analysis of several candidates identified new Pro effector genes, including thioredoxins and several genes encoding Universal Stress Protein A domain proteins. These new Pro effector genes further link Pro accumulation to cellular redox and energy status. Additional new Pro effector genes found include the mitochondrial protease LON1, ribosomal protein RPL24A, protein phosphatase 2A subunit A3, a MADS box protein, and a nucleoside triphosphate hydrolase. Several of these new Pro effector genes were from regions with multiple SNPs, each having moderate association with Pro accumulation. This pattern supports the use of summary approaches that incorporate clusters of SNP associations in addition to consideration of individual SNP probability values. Further GWAS-guided reverse genetics promises to find additional effectors of Pro accumulation. The combination of GWAS and reverse genetics to efficiently identify new effector genes may be especially applicable for traits difficult to analyze by other genetic screening methods. PMID:24218491

  13. The First Complete Chloroplast Genome Sequences in Actinidiaceae: Genome Structure and Comparative Analysis.

    PubMed

    Yao, Xiaohong; Tang, Ping; Li, Zuozhou; Li, Dawei; Liu, Yifei; Huang, Hongwen

    2015-01-01

    Actinidia chinensis is an important economic plant belonging to the basal lineage of the asterids. Availability of a complete Actinidia chloroplast genome sequence is crucial to understanding phylogenetic relationships among major lineages of angiosperms and facilitates kiwifruit genetic improvement. We report here the complete nucleotide sequences of the chloroplast genomes for Actinidia chinensis and A. chinensis var deliciosa obtained through de novo assembly of Illumina paired-end reads produced by total DNA sequencing. The total genome size ranges from 155,446 to 157,557 bp, with an inverted repeat (IR) of 24,013 to 24,391 bp, a large single copy region (LSC) of 87,984 to 88,337 bp and a small single copy region (SSC) of 20,332 to 20,336 bp. The genome encodes 113 different genes, including 79 unique protein-coding genes, 30 tRNA genes and 4 ribosomal RNA genes, with 16 duplicated in the inverted repeats, and a tRNA gene (trnfM-CAU) duplicated once in the LSC region. Comparisons of IR boundaries among four asterid species showed that IR/LSC borders were extended into the 5' portion of the psbA gene and IR contraction occurred in Actinidia. The clap gene has been lost from the chloroplast genome in Actinidia, and may have been transferred to the nucleus during chloroplast evolution. Twenty-seven polymorphic simple sequence repeat (SSR) loci were identified in the Actinidia chloroplast genome. Maximum parsimony analyses of a 72-gene, 16 taxa angiosperm dataset strongly support the placement of Actinidiaceae in Ericales within the basal asterids.

  14. Predicted stem-loop structures and variation in nucleotide sequence of 3' noncoding regions among animal calicivirus genomes.

    PubMed

    Seal, B S; Neill, J D; Ridpath, J F

    1994-07-01

    Caliciviruses are nonenveloped with a polyadenylated genome of approximately 7.6 kb and a single capsid protein. The "RNA Fold" computer program was used to analyze 3'-terminal noncoding sequences of five feline calicivirus (FCV), rabbit hemorrhagic disease virus (RHDV), and two San Miguel sea lion virus (SMSV) isolates. The FCV 3'-terminal sequences are 40-46 nucleotides in length and 72-91% similar. The FCV sequences were predicted to contain two possible duplex structures and one stem-loop structure with free energies of -2.1 to -18.2 kcal/mole. The RHDV genomic 3'-terminal RNA sequences are 54 nucleotides in length and share 49% sequence similarity to homologous regions of the FCV genome. The RHDV sequence was predicted to form two duplex structures in the 3'-terminal noncoding region with a single stem-loop structure, resembling that of FCV. In contrast, the SMSV 1 and 4 genomic 3'-terminal noncoding sequences were 185 and 182 nucleotides in length, respectively. Ten possible duplex structures were predicted with an average structural free energy of -35 kcal/mole. Sequence similarity between the two SMSV isolates was 75%. Furthermore, extensive cloverleaflike structures are predicted in the 3' noncoding region of the SMSV genome, in contrast to the predicted single stem-loop structures of FCV or RHDV.

  15. Genome-wide association study for birth, weaning and yearling weight in Colombian Brahman cattle

    PubMed Central

    Martínez, Rodrigo; Bejarano, Diego; Gómez, Yolanda; Dasoneville, Romain; Jiménez, Ariel; Even, Gael; Sölkner, Johann; Mészáros, Gabor

    2017-01-01

    Abstract Genotypic and phenotypic data of 1,562 animals were analyzed to find genomic regions that potentially influence the birth weight (BW), weaning weight at seven months of age (WW) and yearling weight (YW) of Colombian Brahman cattle, with genotyping conducted using Illumina Bead chip array with 74,669 SNPs. A Single Step Genomic BLUP (ssGBLP), approach was used to estimate the proportion of variance explained by each marker. Multiple regions scattered across the genome were found to influence weights at different ages, also dependent on the trait component (direct or maternal). The most interesting regions were connected to previously identified QTLs and genes, such as ADAMTSL3, CAPN2, CAPN2, FABP6, ZEB2 influencing growth and weight traits. The identified regions will contribute to the development and refinement of genomic selection programs for Zebu Brahman cattle in Colombia. PMID:28534927

  16. A Multinational Arab Genome-Wide Association Study Identifies New Genetic Associations for Rheumatoid Arthritis.

    PubMed

    Saxena, Richa; Plenge, Robert M; Bjonnes, Andrew C; Dashti, Hassan S; Okada, Yukinori; Gad El Haq, Wessam; Hammoudeh, Mohammed; Al Emadi, Samar; Masri, Basel K; Halabi, Hussein; Badsha, Humeira; Uthman, Imad W; Margolin, Lauren; Gupta, Namrata; Mahfoud, Ziyad R; Kapiri, Marianthi; Dargham, Soha R; Aranki, Grace; Kazkaz, Layla A; Arayssi, Thurayya

    2017-05-01

    Genetic factors underlying susceptibility to rheumatoid arthritis (RA) in Arab populations are largely unknown. This genome-wide association study (GWAS) was undertaken to explore the generalizability of previously reported RA loci to Arab subjects and to discover new Arab-specific genetic loci. The Genetics of Rheumatoid Arthritis in Some Arab States Study was designed to examine the genetics and clinical features of RA patients from Jordan, the Kingdom of Saudi Arabia, Lebanon, Qatar, and the United Arab Emirates. In total, >7 million single-nucleotide polymorphisms (SNPs) were tested for association with RA overall and with seropositive or seronegative RA in 511 RA cases and 352 healthy controls. In addition, replication of 15 signals was attempted in 283 RA cases and 221 healthy controls. A genetic risk score of 68 known RA SNPs was also examined in this study population. Three loci (HLA region, intergenic 5q13, and 17p13 at SMTNL2/GGT6) reached genome-wide significance in the analyses of association with RA and with seropositive RA, and for all 3 loci, evidence of independent replication was demonstrated. Consistent with the findings in European and East Asian populations, the association of RA with HLA-DRB1 amino acid position 11 conferred the strongest effect (P = 4.8 × 10 -16 ), and a weighted genetic risk score of previously associated RA loci was found to be associated with RA (P = 3.41 × 10 -5 ) and with seropositive RA (P = 1.48 × 10 -6 ) in this population. In addition, 2 novel associations specific to Arab populations were found at the 5q13 and 17p13 loci. This first RA GWAS in Arab populations confirms that established HLA-region and known RA risk alleles contribute strongly to the risk and severity of disease in some Arab groups, suggesting that the genetic architecture of RA is similar across ethnic groups. Moreover, this study identified 2 novel RA risk loci in Arabs, offering further population-specific insights into the

  17. Integrating Transcriptome and Genome Re-Sequencing Data to Identify Key Genes and Mutations Affecting Chicken Eggshell Qualities

    PubMed Central

    Liu, Long; Zheng, Chuan Wei; Wang, De He; Hou, Zhuo Cheng; Ning, Zhong Hua

    2015-01-01

    Eggshell damages lead to economic losses in the egg production industry and are a threat to human health. We examined 49-wk-old Rhode Island White hens (Gallus gallus) that laid eggs having shells with significantly different strengths and thicknesses. We used HiSeq 2000 (Illumina) sequencing to characterize the chicken transcriptome and whole genome to identify the key genes and genetic mutations associated with eggshell calcification. We identified a total of 14,234 genes expressed in the chicken uterus, representing 89% of all annotated chicken genes. A total of 889 differentially expressed genes were identified by comparing low eggshell strength (LES) and normal eggshell strength (NES) genomes. The DEGs are enriched in calcification-related processes, including calcium ion transport and calcium signaling pathways as reveled by gene ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG) pathway analysis. Some important matrix proteins, such as OC-116, LTF and SPP1, were also expressed differentially between two groups. A total of 3,671,919 single-nucleotide polymorphisms (SNPs) and 508,035 Indels were detected in protein coding genes by whole-genome re-sequencing, including 1775 non-synonymous variations and 19 frame-shift Indels in DEGs. SNPs and Indels found in this study could be further investigated for eggshell traits. This is the first report to integrate the transcriptome and genome re-sequencing to target the genetic variations which decreased the eggshell qualities. These findings further advance our understanding of eggshell calcification in the chicken uterus. PMID:25974068

  18. Structural RNAs of known and unknown function identified in malaria parasites by comparative genomics and RNA analysis

    PubMed Central

    Chakrabarti, Kausik; Pearson, Michael; Grate, Leslie; Sterne-Weiler, Timothy; Deans, Jonathan; Donohue, John Paul; Ares, Manuel

    2007-01-01

    As the genomes of more eukaryotic pathogens are sequenced, understanding how molecular differences between parasite and host might be exploited to provide new therapies has become a major focus. Central to cell function are RNA-containing complexes involved in gene expression, such as the ribosome, the spliceosome, snoRNAs, RNase P, and telomerase, among others. In this article we identify by comparative genomics and validate by RNA analysis numerous previously unknown structural RNAs encoded by the Plasmodium falciparum genome, including the telomerase RNA, U3, 31 snoRNAs, as well as previously predicted spliceosomal snRNAs, SRP RNA, MRP RNA, and RNAse P RNA. Furthermore, we identify six new RNA coding genes of unknown function. To investigate the relationships of the RNA coding genes to other genomic features in related parasites, we developed a genome browser for P. falciparum (http://areslab.ucsc.edu/cgi-bin/hgGateway). Additional experiments provide evidence supporting the prediction that snoRNAs guide methylation of a specific position on U4 snRNA, as well as predicting an snRNA promoter element particular to Plasmodium sp. These findings should allow detailed structural comparisons between the RNA components of the gene expression machinery of the parasite and its vertebrate hosts. PMID:17901154

  19. Meta-analysis of genome-wide studies identifies WNT16 and ESR1 SNPs associated with bone mineral density in premenopausal women.

    PubMed

    Koller, Daniel L; Zheng, Hou-Feng; Karasik, David; Yerges-Armstrong, Laura; Liu, Ching-Ti; McGuigan, Fiona; Kemp, John P; Giroux, Sylvie; Lai, Dongbing; Edenberg, Howard J; Peacock, Munro; Czerwinski, Stefan A; Choh, Audrey C; McMahon, George; St Pourcain, Beate; Timpson, Nicholas J; Lawlor, Debbie A; Evans, David M; Towne, Bradford; Blangero, John; Carless, Melanie A; Kammerer, Candace; Goltzman, David; Kovacs, Christopher S; Prior, Jerilynn C; Spector, Tim D; Rousseau, Francois; Tobias, Jon H; Akesson, Kristina; Econs, Michael J; Mitchell, Braxton D; Richards, J Brent; Kiel, Douglas P; Foroud, Tatiana

    2013-03-01

    Previous genome-wide association studies (GWAS) have identified common variants in genes associated with variation in bone mineral density (BMD), although most have been carried out in combined samples of older women and men. Meta-analyses of these results have identified numerous single-nucleotide polymorphisms (SNPs) of modest effect at genome-wide significance levels in genes involved in both bone formation and resorption, as well as other pathways. We performed a meta-analysis restricted to premenopausal white women from four cohorts (n = 4061 women, aged 20 to 45 years) to identify genes influencing peak bone mass at the lumbar spine and femoral neck. After imputation, age- and weight-adjusted bone-mineral density (BMD) values were tested for association with each SNP. Association of an SNP in the WNT16 gene (rs3801387; p = 1.7 × 10(-9) ) and multiple SNPs in the ESR1/C6orf97 region (rs4870044; p = 1.3 × 10(-8) ) achieved genome-wide significance levels for lumbar spine BMD. These SNPs, along with others demonstrating suggestive evidence of association, were then tested for association in seven replication cohorts that included premenopausal women of European, Hispanic-American, and African-American descent (combined n = 5597 for femoral neck; n = 4744 for lumbar spine). When the data from the discovery and replication cohorts were analyzed jointly, the evidence was more significant (WNT16 joint p = 1.3 × 10(-11) ; ESR1/C6orf97 joint p = 1.4 × 10(-10) ). Multiple independent association signals were observed with spine BMD at the ESR1 region after conditioning on the primary signal. Analyses of femoral neck BMD also supported association with SNPs in WNT16 and ESR1/C6orf97 (p < 1 × 10(-5) ). Our results confirm that several of the genes contributing to BMD variation across a broad age range in both sexes have effects of similar magnitude on BMD of the spine in premenopausal women. These data support the

  20. Systematic analysis of transcribed loci in ENCODE regions using RACE sequencing reveals extensive transcription in the human genome.

    PubMed

    Wu, Jia Qian; Du, Jiang; Rozowsky, Joel; Zhang, Zhengdong; Urban, Alexander E; Euskirchen, Ghia; Weissman, Sherman; Gerstein, Mark; Snyder, Michael

    2008-01-03

    Recent studies of the mammalian transcriptome have revealed a large number of additional transcribed regions and extraordinary complexity in transcript diversity. However, there is still much uncertainty regarding precisely what portion of the genome is transcribed, the exact structures of these novel transcripts, and the levels of the transcripts produced. We have interrogated the transcribed loci in 420 selected ENCyclopedia Of DNA Elements (ENCODE) regions using rapid amplification of cDNA ends (RACE) sequencing. We analyzed annotated known gene regions, but primarily we focused on novel transcriptionally active regions (TARs), which were previously identified by high-density oligonucleotide tiling arrays and on random regions that were not believed to be transcribed. We found RACE sequencing to be very sensitive and were able to detect low levels of transcripts in specific cell types that were not detectable by microarrays. We also observed many instances of sense-antisense transcripts; further analysis suggests that many of the antisense transcripts (but not all) may be artifacts generated from the reverse transcription reaction. Our results show that the majority of the novel TARs analyzed (60%) are connected to other novel TARs or known exons. Of previously unannotated random regions, 17% were shown to produce overlapping transcripts. Furthermore, it is estimated that 9% of the novel transcripts encode proteins. We conclude that RACE sequencing is an efficient, sensitive, and highly accurate method for characterization of the transcriptome of specific cell/tissue types. Using this method, it appears that much of the genome is represented in polyA+ RNA. Moreover, a fraction of the novel RNAs can encode protein and are likely to be functional.

  1. Comparative Analysis of Begonia Plastid Genomes and Their Utility for Species-Level Phylogenetics

    PubMed Central

    Harrison, Nicola; Harrison, Richard J.

    2016-01-01

    Recent, rapid radiations make species-level phylogenetics difficult to resolve. We used a multiplexed, high-throughput sequencing approach to identify informative genomic regions to resolve phylogenetic relationships at low taxonomic levels in Begonia from a survey of sixteen species. A long-range PCR method was used to generate draft plastid genomes to provide a strong phylogenetic backbone, identify fast evolving regions and provide informative molecular markers for species-level phylogenetic studies in Begonia. PMID:27058864

  2. Creation and genomic analysis of irradiation hybrids in Populus

    Treesearch

    Matthew S. Zinkgraf; K. Haiby; M.C. Lieberman; L. Comai; I.M. Henry; Andrew Groover

    2016-01-01

    Establishing efficient functional genomic systems for creating and characterizing genetic variation in forest trees is challenging. Here we describe protocols for creating novel gene-dosage variation in Populus through gamma-irradiation of pollen, followed by genomic analysis to identify chromosomal regions that have been deleted or inserted in...

  3. Novel Harmful Recessive Haplotypes Identified for Fertility Traits in Nordic Holstein Cattle

    PubMed Central

    Sahana, Goutam; Nielsen, Ulrik Sander; Aamand, Gert Pedersen; Lund, Mogens Sandø; Guldbrandtsen, Bernt

    2013-01-01

    Using genomic data, lethal recessives may be discovered from haplotypes that are common in the population but never occur in the homozygote state in live animals. This approach only requires genotype data from phenotypically normal (i.e. live) individuals and not from the affected embryos that die. A total of 7,937 Nordic Holstein animals were genotyped with BovineSNP50 BeadChip and haplotypes including 25 consecutive markers were constructed and tested for absence of homozygotes states. We have identified 17 homozygote deficient haplotypes which could be loosely clustered into eight genomic regions harboring possible recessive lethal alleles. Effects of the identified haplotypes were estimated on two fertility traits: non-return rates and calving interval. Out of the eight identified genomic regions, six regions were confirmed as having an effect on fertility. The information can be used to avoid carrier-by-carrier mattings in practical animal breeding. Further, identification of causative genes/polymorphisms responsible for lethal effects will lead to accurate testing of the individuals carrying a lethal allele. PMID:24376603

  4. Genome-wide evidence for divergent selection between populations of a major agricultural pathogen.

    PubMed

    Hartmann, Fanny E; McDonald, Bruce A; Croll, Daniel

    2018-06-01

    The genetic and environmental homogeneity in agricultural ecosystems is thought to impose strong and uniform selection pressures. However, the impact of this selection on plant pathogen genomes remains largely unknown. We aimed to identify the proportion of the genome and the specific gene functions under positive selection in populations of the fungal wheat pathogen Zymoseptoria tritici. First, we performed genome scans in four field populations that were sampled from different continents and on distinct wheat cultivars to test which genomic regions are under recent selection. Based on extended haplotype homozygosity and composite likelihood ratio tests, we identified 384 and 81 selective sweeps affecting 4% and 0.5% of the 35 Mb core genome, respectively. We found differences both in the number and the position of selective sweeps across the genome between populations. Using a XtX-based outlier detection approach, we identified 51 extremely divergent genomic regions between the allopatric populations, suggesting that divergent selection led to locally adapted pathogen populations. We performed an outlier detection analysis between two sympatric populations infecting two different wheat cultivars to identify evidence for host-driven selection. Selective sweep regions harboured genes that are likely to play a role in successfully establishing host infections. We also identified secondary metabolite gene clusters and an enrichment in genes encoding transporter and protein localization functions. The latter gene functions mediate responses to environmental stress, including interactions with the host. The distinct gene functions under selection indicate that both local host genotypes and abiotic factors contributed to local adaptation. © 2018 The Authors. Molecular Ecology Published by John Wiley & Sons Ltd.

  5. Genomic dark matter: the reliability of short read mapping illustrated by the genome mappability score.

    PubMed

    Lee, Hayan; Schatz, Michael C

    2012-08-15

    Genome resequencing and short read mapping are two of the primary tools of genomics and are used for many important applications. The current state-of-the-art in mapping uses the quality values and mapping quality scores to evaluate the reliability of the mapping. These attributes, however, are assigned to individual reads and do not directly measure the problematic repeats across the genome. Here, we present the Genome Mappability Score (GMS) as a novel measure of the complexity of resequencing a genome. The GMS is a weighted probability that any read could be unambiguously mapped to a given position and thus measures the overall composition of the genome itself. We have developed the Genome Mappability Analyzer to compute the GMS of every position in a genome. It leverages the parallelism of cloud computing to analyze large genomes, and enabled us to identify the 5-14% of the human, mouse, fly and yeast genomes that are difficult to analyze with short reads. We examined the accuracy of the widely used BWA/SAMtools polymorphism discovery pipeline in the context of the GMS, and found discovery errors are dominated by false negatives, especially in regions with poor GMS. These errors are fundamental to the mapping process and cannot be overcome by increasing coverage. As such, the GMS should be considered in every resequencing project to pinpoint the 'dark matter' of the genome, including of known clinically relevant variations in these regions. The source code and profiles of several model organisms are available at http://gma-bio.sourceforge.net

  6. Comparative genome analysis of a large Dutch Legionella pneumophila strain collection identifies five markers highly correlated with clinical strains

    PubMed Central

    2010-01-01

    Background Discrimination between clinical and environmental strains within many bacterial species is currently underexplored. Genomic analyses have clearly shown the enormous variability in genome composition between different strains of a bacterial species. In this study we have used Legionella pneumophila, the causative agent of Legionnaire's disease, to search for genomic markers related to pathogenicity. During a large surveillance study in The Netherlands well-characterized patient-derived strains and environmental strains were collected. We have used a mixed-genome microarray to perform comparative-genome analysis of 257 strains from this collection. Results Microarray analysis indicated that 480 DNA markers (out of in total 3360 markers) showed clear variation in presence between individual strains and these were therefore selected for further analysis. Unsupervised statistical analysis of these markers showed the enormous genomic variation within the species but did not show any correlation with a pathogenic phenotype. We therefore used supervised statistical analysis to identify discriminating markers. Genetic programming was used both to identify predictive markers and to define their interrelationships. A model consisting of five markers was developed that together correctly predicted 100% of the clinical strains and 69% of the environmental strains. Conclusions A novel approach for identifying predictive markers enabling discrimination between clinical and environmental isolates of L. pneumophila is presented. Out of over 3000 possible markers, five were selected that together enabled correct prediction of all the clinical strains included in this study. This novel approach for identifying predictive markers can be applied to all bacterial species, allowing for better discrimination between strains well equipped to cause human disease and relatively harmless strains. PMID:20630115

  7. Identifying Driver Genomic Alterations in Cancers by Searching Minimum-Weight, Mutually Exclusive Sets

    PubMed Central

    Lu, Songjian; Lu, Kevin N.; Cheng, Shi-Yuan; Hu, Bo; Ma, Xiaojun; Nystrom, Nicholas; Lu, Xinghua

    2015-01-01

    An important goal of cancer genomic research is to identify the driving pathways underlying disease mechanisms and the heterogeneity of cancers. It is well known that somatic genome alterations (SGAs) affecting the genes that encode the proteins within a common signaling pathway exhibit mutual exclusivity, in which these SGAs usually do not co-occur in a tumor. With some success, this characteristic has been utilized as an objective function to guide the search for driver mutations within a pathway. However, mutual exclusivity alone is not sufficient to indicate that genes affected by such SGAs are in common pathways. Here, we propose a novel, signal-oriented framework for identifying driver SGAs. First, we identify the perturbed cellular signals by mining the gene expression data. Next, we search for a set of SGA events that carries strong information with respect to such perturbed signals while exhibiting mutual exclusivity. Finally, we design and implement an efficient exact algorithm to solve an NP-hard problem encountered in our approach. We apply this framework to the ovarian and glioblastoma tumor data available at the TCGA database, and perform systematic evaluations. Our results indicate that the signal-oriented approach enhances the ability to find informative sets of driver SGAs that likely constitute signaling pathways. PMID:26317392

  8. Inferring causal genomic alterations in breast cancer using gene expression data

    PubMed Central

    2011-01-01

    Background One of the primary objectives in cancer research is to identify causal genomic alterations, such as somatic copy number variation (CNV) and somatic mutations, during tumor development. Many valuable studies lack genomic data to detect CNV; therefore, methods that are able to infer CNVs from gene expression data would help maximize the value of these studies. Results We developed a framework for identifying recurrent regions of CNV and distinguishing the cancer driver genes from the passenger genes in the regions. By inferring CNV regions across many datasets we were able to identify 109 recurrent amplified/deleted CNV regions. Many of these regions are enriched for genes involved in many important processes associated with tumorigenesis and cancer progression. Genes in these recurrent CNV regions were then examined in the context of gene regulatory networks to prioritize putative cancer driver genes. The cancer driver genes uncovered by the framework include not only well-known oncogenes but also a number of novel cancer susceptibility genes validated via siRNA experiments. Conclusions To our knowledge, this is the first effort to systematically identify and validate drivers for expression based CNV regions in breast cancer. The framework where the wavelet analysis of copy number alteration based on expression coupled with the gene regulatory network analysis, provides a blueprint for leveraging genomic data to identify key regulatory components and gene targets. This integrative approach can be applied to many other large-scale gene expression studies and other novel types of cancer data such as next-generation sequencing based expression (RNA-Seq) as well as CNV data. PMID:21806811

  9. Genomic analysis of the chromosome 15q11-q13 Prader-Willi syndrome region and characterization of transcripts for GOLGA8E and WHCD1L1 from the proximal breakpoint region.

    PubMed

    Jiang, Yong-Hui; Wauki, Kekio; Liu, Qian; Bressler, Jan; Pan, Yanzhen; Kashork, Catherine D; Shaffer, Lisa G; Beaudet, Arthur L

    2008-01-28

    Prader-Willi syndrome (PWS) is a neurobehavioral disorder characterized by neonatal hypotonia, childhood obesity, dysmorphic features, hypogonadism, mental retardation, and behavioral problems. Although PWS is most often caused by a paternal interstitial deletion of a 6-Mb region of chromosome 15q11-q13, the identity of the exact protein coding or noncoding RNAs whose deficiency produces the PWS phenotype is uncertain. There are also reports describing a PWS-like phenotype in a subset of patients with full mutations in the FMR1 (fragile X mental retardation 1) gene. Taking advantage of the human genome sequence, we have performed extensive sequence analysis and molecular studies for the PWS candidate region. We have characterized transcripts for the first time for two UCSC Genome Browser predicted protein-coding genes, GOLGA8E (golgin subfamily a, 8E) and WHDC1L1 (WAS protein homology region containing 1-like 1) and have further characterized two previously reported genes, CYF1P1 and NIPA2; all four genes are in the region close to the proximal/centromeric deletion breakpoint (BP1). GOLGA8E belongs to the golgin subfamily of coiled-coil proteins associated with the Golgi apparatus. Six out of 16 golgin subfamily proteins in the human genome have been mapped in the chromosome 15q11-q13 and 15q24-q26 regions. We have also identified more than 38 copies of GOLGA8E-like sequence in the 15q11-q14 and 15q23-q26 regions which supports the presence of a GOLGA8E-associated low copy repeat (LCR). Analysis of the 15q11-q13 region by PFGE also revealed a polymorphic region between BP1 and BP2. WHDC1L1 is a novel gene with similarity to mouse Whdc1 (WAS protein homology region 2 domain containing 1) and human JMY protein (junction-mediating and regulatory protein). Expression analysis of cultured human cells and brain tissues from PWS patients indicates that CYFIP1 and NIPA2 are biallelically expressed. However, we were not able to determine the allele-specific expression

  10. RRE: a tool for the extraction of non-coding regions surrounding annotated genes from genomic datasets.

    PubMed

    Lazzarato, F; Franceschinis, G; Botta, M; Cordero, F; Calogero, R A

    2004-11-01

    RRE allows the extraction of non-coding regions surrounding a coding sequence [i.e. gene upstream region, 5'-untranslated region (5'-UTR), introns, 3'-UTR, downstream region] from annotated genomic datasets available at NCBI. RRE parser and web-based interface are accessible at http://www.bioinformatica.unito.it/bioinformatics/rre/rre.html

  11. Novel genes identified in a high-density genome wide association study for nicotine dependence.

    PubMed

    Bierut, Laura Jean; Madden, Pamela A F; Breslau, Naomi; Johnson, Eric O; Hatsukami, Dorothy; Pomerleau, Ovide F; Swan, Gary E; Rutter, Joni; Bertelsen, Sarah; Fox, Louis; Fugman, Douglas; Goate, Alison M; Hinrichs, Anthony L; Konvicka, Karel; Martin, Nicholas G; Montgomery, Grant W; Saccone, Nancy L; Saccone, Scott F; Wang, Jen C; Chase, Gary A; Rice, John P; Ballinger, Dennis G

    2007-01-01

    Tobacco use is a leading contributor to disability and death worldwide, and genetic factors contribute in part to the development of nicotine dependence. To identify novel genes for which natural variation contributes to the development of nicotine dependence, we performed a comprehensive genome wide association study using nicotine dependent smokers as cases and non-dependent smokers as controls. To allow the efficient, rapid, and cost effective screen of the genome, the study was carried out using a two-stage design. In the first stage, genotyping of over 2.4 million single nucleotide polymorphisms (SNPs) was completed in case and control pools. In the second stage, we selected SNPs for individual genotyping based on the most significant allele frequency differences between cases and controls from the pooled results. Individual genotyping was performed in 1050 cases and 879 controls using 31 960 selected SNPs. The primary analysis, a logistic regression model with covariates of age, gender, genotype and gender by genotype interaction, identified 35 SNPs with P-values less than 10(-4) (minimum P-value 1.53 x 10(-6)). Although none of the individual findings is statistically significant after correcting for multiple tests, additional statistical analyses support the existence of true findings in this group. Our study nominates several novel genes, such as Neurexin 1 (NRXN1), in the development of nicotine dependence while also identifying a known candidate gene, the beta3 nicotinic cholinergic receptor. This work anticipates the future directions of large-scale genome wide association studies with state-of-the-art methodological approaches and sharing of data with the scientific community.

  12. Genome-wide association analysis of chronic lymphocytic leukaemia, Hodgkin lymphoma and multiple myeloma identifies pleiotropic risk loci

    NASA Astrophysics Data System (ADS)

    Law, Philip J.; Sud, Amit; Mitchell, Jonathan S.; Henrion, Marc; Orlando, Giulia; Lenive, Oleg; Broderick, Peter; Speedy, Helen E.; Johnson, David C.; Kaiser, Martin; Weinhold, Niels; Cooke, Rosie; Sunter, Nicola J.; Jackson, Graham H.; Summerfield, Geoffrey; Harris, Robert J.; Pettitt, Andrew R.; Allsup, David J.; Carmichael, Jonathan; Bailey, James R.; Pratt, Guy; Rahman, Thahira; Pepper, Chris; Fegan, Chris; von Strandmann, Elke Pogge; Engert, Andreas; Försti, Asta; Chen, Bowang; Filho, Miguel Inacio Da Silva; Thomsen, Hauke; Hoffmann, Per; Noethen, Markus M.; Eisele, Lewin; Jöckel, Karl-Heinz; Allan, James M.; Swerdlow, Anthony J.; Goldschmidt, Hartmut; Catovsky, Daniel; Morgan, Gareth J.; Hemminki, Kari; Houlston, Richard S.

    2017-01-01

    B-cell malignancies (BCM) originate from the same cell of origin, but at different maturation stages and have distinct clinical phenotypes. Although genetic risk variants for individual BCMs have been identified, an agnostic, genome-wide search for shared genetic susceptibility has not been performed. We explored genome-wide association studies of chronic lymphocytic leukaemia (CLL, N = 1,842), Hodgkin lymphoma (HL, N = 1,465) and multiple myeloma (MM, N = 3,790). We identified a novel pleiotropic risk locus at 3q22.2 (NCK1, rs11715604, P = 1.60 × 10-9) with opposing effects between CLL (P = 1.97 × 10-8) and HL (P = 3.31 × 10-3). Eight established non-HLA risk loci showed pleiotropic associations. Within the HLA region, Ser37 + Phe37 in HLA-DRB1 (P = 1.84 × 10-12) was associated with increased CLL and HL risk (P = 4.68 × 10-12), and reduced MM risk (P = 1.12 × 10-2), and Gly70 in HLA-DQB1 (P = 3.15 × 10-10) showed opposing effects between CLL (P = 3.52 × 10-3) and HL (P = 3.41 × 10-9). By integrating eQTL, Hi-C and ChIP-seq data, we show that the pleiotropic risk loci are enriched for B-cell regulatory elements, as well as an over-representation of binding of key B-cell transcription factors. These data identify shared biological pathways influencing the development of CLL, HL and MM. The identification of these risk loci furthers our understanding of the aetiological basis of BCMs.

  13. Scanning the human genome at kilobase resolution.

    PubMed

    Chen, Jun; Kim, Yeong C; Jung, Yong-Chul; Xuan, Zhenyu; Dworkin, Geoff; Zhang, Yanming; Zhang, Michael Q; Wang, San Ming

    2008-05-01

    Normal genome variation and pathogenic genome alteration frequently affect small regions in the genome. Identifying those genomic changes remains a technical challenge. We report here the development of the DGS (Ditag Genome Scanning) technique for high-resolution analysis of genome structure. The basic features of DGS include (1) use of high-frequent restriction enzymes to fractionate the genome into small fragments; (2) collection of two tags from two ends of a given DNA fragment to form a ditag to represent the fragment; (3) application of the 454 sequencing system to reach a comprehensive ditag sequence collection; (4) determination of the genome origin of ditags by mapping to reference ditags from known genome sequences; (5) use of ditag sequences directly as the sense and antisense PCR primers to amplify the original DNA fragment. To study the relationship between ditags and genome structure, we performed a computational study by using the human genome reference sequences as a model, and analyzed the ditags experimentally collected from the well-characterized normal human DNA GM15510 and the leukemic human DNA of Kasumi-1 cells. Our studies show that DGS provides a kilobase resolution for studying genome structure with high specificity and high genome coverage. DGS can be applied to validate genome assembly, to compare genome similarity and variation in normal populations, and to identify genomic abnormality including insertion, inversion, deletion, translocation, and amplification in pathological genomes such as cancer genomes.

  14. Copy Number Variations in Tilapia Genomes.

    PubMed

    Li, Bi Jun; Li, Hong Lian; Meng, Zining; Zhang, Yong; Lin, Haoran; Yue, Gen Hua; Xia, Jun Hong

    2017-02-01

    Discovering the nature and pattern of genome variation is fundamental in understanding phenotypic diversity among populations. Although several millions of single nucleotide polymorphisms (SNPs) have been discovered in tilapia, the genome-wide characterization of larger structural variants, such as copy number variation (CNV) regions has not been carried out yet. We conducted a genome-wide scan for CNVs in 47 individuals from three tilapia populations. Based on 254 Gb of high-quality paired-end sequencing reads, we identified 4642 distinct high-confidence CNVs. These CNVs account for 1.9% (12.411 Mb) of the used Nile tilapia reference genome. A total of 1100 predicted CNVs were found overlapping with exon regions of protein genes. Further association analysis based on linear model regression found 85 CNVs ranging between 300 and 27,000 base pairs significantly associated to population types (R 2  > 0.9 and P > 0.001). Our study sheds first insights on genome-wide CNVs in tilapia. These CNVs among and within tilapia populations may have functional effects on phenotypes and specific adaptation to particular environments.

  15. Deep whole-genome sequencing of 90 Han Chinese genomes.

    PubMed

    Lan, Tianming; Lin, Haoxiang; Zhu, Wenjuan; Laurent, Tellier Christian Asker Melchior; Yang, Mengcheng; Liu, Xin; Wang, Jun; Wang, Jian; Yang, Huanming; Xu, Xun; Guo, Xiaosen

    2017-09-01

    Next-generation sequencing provides a high-resolution insight into human genetic information. However, the focus of previous studies has primarily been on low-coverage data due to the high cost of sequencing. Although the 1000 Genomes Project and the Haplotype Reference Consortium have both provided powerful reference panels for imputation, low-frequency and novel variants remain difficult to discover and call with accuracy on the basis of low-coverage data. Deep sequencing provides an optimal solution for the problem of these low-frequency and novel variants. Although whole-exome sequencing is also a viable choice for exome regions, it cannot account for noncoding regions, sometimes resulting in the absence of important, causal variants. For Han Chinese populations, the majority of variants have been discovered based upon low-coverage data from the 1000 Genomes Project. However, high-coverage, whole-genome sequencing data are limited for any population, and a large amount of low-frequency, population-specific variants remain uncharacterized. We have performed whole-genome sequencing at a high depth (∼×80) of 90 unrelated individuals of Chinese ancestry, collected from the 1000 Genomes Project samples, including 45 Northern Han Chinese and 45 Southern Han Chinese samples. Eighty-three of these 90 have been sequenced by the 1000 Genomes Project. We have identified 12 568 804 single nucleotide polymorphisms, 2 074 210 short InDels, and 26 142 structural variations from these 90 samples. Compared to the Han Chinese data from the 1000 Genomes Project, we have found 7 000 629 novel variants with low frequency (defined as minor allele frequency < 5%), including 5 813 503 single nucleotide polymorphisms, 1 169 199 InDels, and 17 927 structural variants. Using deep sequencing data, we have built a greatly expanded spectrum of genetic variation for the Han Chinese genome. Compared to the 1000 Genomes Project, these Han Chinese deep sequencing data enhance the

  16. A tailing genome walking method suitable for genomes with high local GC content.

    PubMed

    Liu, Taian; Fang, Yongxiang; Yao, Wenjuan; Guan, Qisai; Bai, Gang; Jing, Zhizhong

    2013-10-15

    The tailing genome walking strategies are simple and efficient. However, they sometimes can be restricted due to the low stringency of homo-oligomeric primers. Here we modified their conventional tailing step by adding polythymidine and polyguanine to the target single-stranded DNA (ssDNA). The tailed ssDNA was then amplified exponentially with a specific primer in the known region and a primer comprising 5' polycytosine and 3' polyadenosine. The successful application of this novel method for identifying integration sites mediated by φC31 integrase in goat genome indicates that the method is more suitable for genomes with high complexity and local GC content. Copyright © 2013 Elsevier Inc. All rights reserved.

  17. Genome-wide association and large scale follow-up identifies 16 new loci influencing lung function

    PubMed Central

    Artigas, María Soler; Loth, Daan W; Wain, Louise V; Gharib, Sina A; Obeidat, Ma’en; Tang, Wenbo; Zhai, Guangju; Zhao, Jing Hua; Smith, Albert Vernon; Huffman, Jennifer E; Albrecht, Eva; Jackson, Catherine M; Evans, David M; Cadby, Gemma; Fornage, Myriam; Manichaikul, Ani; Lopez, Lorna M; Johnson, Toby; Aldrich, Melinda C; Aspelund, Thor; Barroso, Inês; Campbell, Harry; Cassano, Patricia A; Couper, David J; Eiriksdottir, Gudny; Franceschini, Nora; Garcia, Melissa; Gieger, Christian; Gislason, Gauti Kjartan; Grkovic, Ivica; Hammond, Christopher J; Hancock, Dana B; Harris, Tamara B; Ramasamy, Adaikalavan; Heckbert, Susan R; Heliövaara, Markku; Homuth, Georg; Hysi, Pirro G; James, Alan L; Jankovic, Stipan; Joubert, Bonnie R; Karrasch, Stefan; Klopp, Norman; Koch, Beate; Kritchevsky, Stephen B; Launer, Lenore J; Liu, Yongmei; Loehr, Laura R; Lohman, Kurt; Loos, Ruth JF; Lumley, Thomas; Al Balushi, Khalid A; Ang, Wei Q; Barr, R Graham; Beilby, John; Blakey, John D; Boban, Mladen; Boraska, Vesna; Brisman, Jonas; Britton, John R; Brusselle, Guy G; Cooper, Cyrus; Curjuric, Ivan; Dahgam, Santosh; Deary, Ian J; Ebrahim, Shah; Eijgelsheim, Mark; Francks, Clyde; Gaysina, Darya; Granell, Raquel; Gu, Xiangjun; Hankinson, John L; Hardy, Rebecca; Harris, Sarah E; Henderson, John; Henry, Amanda; Hingorani, Aroon D; Hofman, Albert; Holt, Patrick G; Hui, Jennie; Hunter, Michael L; Imboden, Medea; Jameson, Karen A; Kerr, Shona M; Kolcic, Ivana; Kronenberg, Florian; Liu, Jason Z; Marchini, Jonathan; McKeever, Tricia; Morris, Andrew D; Olin, Anna-Carin; Porteous, David J; Postma, Dirkje S; Rich, Stephen S; Ring, Susan M; Rivadeneira, Fernando; Rochat, Thierry; Sayer, Avan Aihie; Sayers, Ian; Sly, Peter D; Smith, George Davey; Sood, Akshay; Starr, John M; Uitterlinden, André G; Vonk, Judith M; Wannamethee, S Goya; Whincup, Peter H; Wijmenga, Cisca; Williams, O Dale; Wong, Andrew; Mangino, Massimo; Marciante, Kristin D; McArdle, Wendy L; Meibohm, Bernd; Morrison, Alanna C; North, Kari E; Omenaas, Ernst; Palmer, Lyle J; Pietiläinen, Kirsi H; Pin, Isabelle; Polašek, Ozren; Pouta, Anneli; Psaty, Bruce M; Hartikainen, Anna-Liisa; Rantanen, Taina; Ripatti, Samuli; Rotter, Jerome I; Rudan, Igor; Rudnicka, Alicja R; Schulz, Holger; Shin, So-Youn; Spector, Tim D; Surakka, Ida; Vitart, Veronique; Völzke, Henry; Wareham, Nicholas J; Warrington, Nicole M; Wichmann, H-Erich; Wild, Sarah H; Wilk, Jemma B; Wjst, Matthias; Wright, Alan F; Zgaga, Lina; Zemunik, Tatijana; Pennell, Craig E; Nyberg, Fredrik; Kuh, Diana; Holloway, John W; Boezen, H Marike; Lawlor, Debbie A; Morris, Richard W; Probst-Hensch, Nicole; Kaprio, Jaakko; Wilson, James F; Hayward, Caroline; Kähönen, Mika; Heinrich, Joachim; Musk, Arthur W; Jarvis, Deborah L; Gläser, Sven; Järvelin, Marjo-Riitta; Stricker, Bruno H Ch; Elliott, Paul; O’Connor, George T; Strachan, David P; London, Stephanie J; Hall, Ian P; Gudnason, Vilmundur; Tobin, Martin D

    2011-01-01

    Pulmonary function measures reflect respiratory health and predict mortality, and are used in the diagnosis of chronic obstructive pulmonary disease (COPD). We tested genome-wide association with the forced expiratory volume in 1 second (FEV1) and the ratio of FEV1 to forced vital capacity (FVC) in 48,201 individuals of European ancestry, with follow-up of top associations in up to an additional 46,411 individuals. We identified new regions showing association (combined P<5×10−8) with pulmonary function, in or near MFAP2, TGFB2, HDAC4, RARB, MECOM (EVI1), SPATA9, ARMC2, NCR3, ZKSCAN3, CDC123, C10orf11, LRP1, CCDC38, MMP15, CFDP1, and KCNE2. Identification of these 16 new loci may provide insight into the molecular mechanisms regulating pulmonary function and into molecular targets for future therapy to alleviate reduced lung function. PMID:21946350

  18. The Complete Chloroplast Genome Sequences of Five Epimedium Species: Lights into Phylogenetic and Taxonomic Analyses

    PubMed Central

    Zhang, Yanjun; Du, Liuwen; Liu, Ao; Chen, Jianjun; Wu, Li; Hu, Weiming; Zhang, Wei; Kim, Kyunghee; Lee, Sang-Choon; Yang, Tae-Jin; Wang, Ying

    2016-01-01

    Epimedium L. is a phylogenetically and economically important genus in the family Berberidaceae. We here sequenced the complete chloroplast (cp) genomes of four Epimedium species using Illumina sequencing technology via a combination of de novo and reference-guided assembly, which was also the first comprehensive cp genome analysis on Epimedium combining the cp genome sequence of E. koreanum previously reported. The five Epimedium cp genomes exhibited typical quadripartite and circular structure that was rather conserved in genomic structure and the synteny of gene order. However, these cp genomes presented obvious variations at the boundaries of the four regions because of the expansion and contraction of the inverted repeat (IR) region and the single-copy (SC) boundary regions. The trnQ-UUG duplication occurred in the five Epimedium cp genomes, which was not found in the other basal eudicotyledons. The rapidly evolving cp genome regions were detected among the five cp genomes, as well as the difference of simple sequence repeats (SSR) and repeat sequence were identified. Phylogenetic relationships among the five Epimedium species based on their cp genomes showed accordance with the updated system of the genus on the whole, but reminded that the evolutionary relationships and the divisions of the genus need further investigation applying more evidences. The availability of these cp genomes provided valuable genetic information for accurately identifying species, taxonomy and phylogenetic resolution and evolution of Epimedium, and assist in exploration and utilization of Epimedium plants. PMID:27014326

  19. Genomic analysis identified a potential novel molecular mechanism for high-altitude adaptation in sheep at the Himalayas.

    PubMed

    Gorkhali, Neena Amatya; Dong, Kunzhe; Yang, Min; Song, Shen; Kader, Adiljian; Shrestha, Bhola Shankar; He, Xiaohong; Zhao, Qianjun; Pu, Yabin; Li, Xiangchen; Kijas, James; Guan, Weijun; Han, Jianlin; Jiang, Lin; Ma, Yuehui

    2016-07-22

    Sheep has successfully adapted to the extreme high-altitude Himalayan region. To identify genes underlying such adaptation, we genotyped genome-wide single nucleotide polymorphisms (SNPs) of four major sheep breeds living at different altitudes in Nepal and downloaded SNP array data from additional Asian and Middle East breeds. Using a di value-based genomic comparison between four high-altitude and eight lowland Asian breeds, we discovered the most differentiated variants at the locus of FGF-7 (Keratinocyte growth factor-7), which was previously reported as a good protective candidate for pulmonary injuries. We further found a SNP upstream of FGF-7 that appears to contribute to the divergence signature. First, the SNP occurred at an extremely conserved site. Second, the SNP showed an increasing allele frequency with the elevated altitude in Nepalese sheep. Third, the electrophoretic mobility shift assays (EMSA) analysis using human lung cancer cells revealed the allele-specific DNA-protein interactions. We thus hypothesized that FGF-7 gene potentially enhances lung function by regulating its expression level in high-altitude sheep through altering its binding of specific transcription factors. Especially, FGF-7 gene was not implicated in previous studies of other high-altitude species, suggesting a potential novel adaptive mechanism to high altitude in sheep at the Himalayas.

  20. Genome-wide meta-analysis identifies new susceptibility loci for migraine.

    PubMed

    Anttila, Verneri; Winsvold, Bendik S; Gormley, Padhraig; Kurth, Tobias; Bettella, Francesco; McMahon, George; Kallela, Mikko; Malik, Rainer; de Vries, Boukje; Terwindt, Gisela; Medland, Sarah E; Todt, Unda; McArdle, Wendy L; Quaye, Lydia; Koiranen, Markku; Ikram, M Arfan; Lehtimäki, Terho; Stam, Anine H; Ligthart, Lannie; Wedenoja, Juho; Dunham, Ian; Neale, Benjamin M; Palta, Priit; Hamalainen, Eija; Schürks, Markus; Rose, Lynda M; Buring, Julie E; Ridker, Paul M; Steinberg, Stacy; Stefansson, Hreinn; Jakobsson, Finnbogi; Lawlor, Debbie A; Evans, David M; Ring, Susan M; Färkkilä, Markus; Artto, Ville; Kaunisto, Mari A; Freilinger, Tobias; Schoenen, Jean; Frants, Rune R; Pelzer, Nadine; Weller, Claudia M; Zielman, Ronald; Heath, Andrew C; Madden, Pamela A F; Montgomery, Grant W; Martin, Nicholas G; Borck, Guntram; Göbel, Hartmut; Heinze, Axel; Heinze-Kuhn, Katja; Williams, Frances M K; Hartikainen, Anna-Liisa; Pouta, Anneli; van den Ende, Joyce; Uitterlinden, Andre G; Hofman, Albert; Amin, Najaf; Hottenga, Jouke-Jan; Vink, Jacqueline M; Heikkilä, Kauko; Alexander, Michael; Muller-Myhsok, Bertram; Schreiber, Stefan; Meitinger, Thomas; Wichmann, Heinz Erich; Aromaa, Arpo; Eriksson, Johan G; Traynor, Bryan; Trabzuni, Daniah; Rossin, Elizabeth; Lage, Kasper; Jacobs, Suzanne B R; Gibbs, J Raphael; Birney, Ewan; Kaprio, Jaakko; Penninx, Brenda W; Boomsma, Dorret I; van Duijn, Cornelia; Raitakari, Olli; Jarvelin, Marjo-Riitta; Zwart, John-Anker; Cherkas, Lynn; Strachan, David P; Kubisch, Christian; Ferrari, Michel D; van den Maagdenberg, Arn M J M; Dichgans, Martin; Wessman, Maija; Smith, George Davey; Stefansson, Kari; Daly, Mark J; Nyholt, Dale R; Chasman, Daniel; Palotie, Aarno

    2013-08-01

    Migraine is the most common brain disorder, affecting approximately 14% of the adult population, but its molecular mechanisms are poorly understood. We report the results of a meta-analysis across 29 genome-wide association studies, including a total of 23,285 individuals with migraine (cases) and 95,425 population-matched controls. We identified 12 loci associated with migraine susceptibility (P<5×10(-8)). Five loci are new: near AJAP1 at 1p36, near TSPAN2 at 1p13, within FHL5 at 6q16, within C7orf10 at 7p14 and near MMP16 at 8q21. Three of these loci were identified in disease subgroup analyses. Brain tissue expression quantitative trait locus analysis suggests potential functional candidate genes at four loci: APOA1BP, TBC1D7, FUT9, STAT6 and ATP5B.

  1. Complete Chloroplast Genome Sequences of Important Oilseed Crop Sesamum indicum L

    PubMed Central

    Yi, Dong-Keun; Kim, Ki-Joong

    2012-01-01

    Sesamum indicum is an important crop plant species for yielding oil. The complete chloroplast (cp) genome of S. indicum (GenBank acc no. JN637766) is 153,324 bp in length, and has a pair of inverted repeat (IR) regions consisting of 25,141 bp each. The lengths of the large single copy (LSC) and the small single copy (SSC) regions are 85,170 bp and 17,872 bp, respectively. Comparative cp DNA sequence analyses of S. indicum with other cp genomes reveal that the genome structure, gene order, gene and intron contents, AT contents, codon usage, and transcription units are similar to the typical angiosperm cp genomes. Nucleotide diversity of the IR region between Sesamum and three other cp genomes is much lower than that of the LSC and SSC regions in both the coding region and noncoding region. As a summary, the regional constraints strongly affect the sequence evolution of the cp genomes, while the functional constraints weakly affect the sequence evolution of cp genomes. Five short inversions associated with short palindromic sequences that form step-loop structures were observed in the chloroplast genome of S. indicum. Twenty-eight different simple sequence repeat loci have been detected in the chloroplast genome of S. indicum. Almost all of the SSR loci were composed of A or T, so this may also contribute to the A-T richness of the cp genome of S. indicum. Seven large repeated loci in the chloroplast genome of S. indicum were also identified and these loci are useful to developing S. indicum-specific cp genome vectors. The complete cp DNA sequences of S. indicum reported in this paper are prerequisite to modifying this important oilseed crop by cp genetic engineering techniques. PMID:22606240

  2. Comparative Genomics and Transcriptional Analysis of Prophages Identified in the Genomes of Lactobacillus gasseri, Lactobacillus salivarius, and Lactobacillus casei†

    PubMed Central

    Ventura, Marco; Canchaya, Carlos; Bernini, Valentina; Altermann, Eric; Barrangou, Rodolphe; McGrath, Stephen; Claesson, Marcus J.; Li, Yin; Leahy, Sinead; Walker, Carey D.; Zink, Ralf; Neviani, Erasmo; Steele, Jim; Broadbent, Jeff; Klaenhammer, Todd R.; Fitzgerald, Gerald F.; O'Toole, Paul W.; van Sinderen, Douwe

    2006-01-01

    Lactobacillus gasseri ATCC 33323, Lactobacillus salivarius subsp. salivarius UCC 118, and Lactobacillus casei ATCC 334 contain one (LgaI), four (Sal1, Sal2, Sal3, Sal4), and one (Lca1) distinguishable prophage sequences, respectively. Sequence analysis revealed that LgaI, Lca1, Sal1, and Sal2 prophages belong to the group of Sfi11-like pac site and cos site Siphoviridae, respectively. Phylogenetic investigation of these newly described prophage sequences revealed that they have not followed an evolutionary development similar to that of their bacterial hosts and that they show a high degree of diversity, even within a species. The attachment sites were determined for all these prophage elements; LgaI as well as Sal1 integrates in tRNA genes, while prophage Sal2 integrates in a predicted arginino-succinate lyase-encoding gene. In contrast, Lca1 and the Sal3 and Sal4 prophage remnants are integrated in noncoding regions in the L. casei ATCC 334 and L. salivarius UCC 118 genomes. Northern analysis showed that large parts of the prophage genomes are transcriptionally silent and that transcription is limited to genome segments located near the attachment site. Finally, pulsed-field gel electrophoresis followed by Southern blot hybridization with specific prophage probes indicates that these prophage sequences are narrowly distributed within lactobacilli. PMID:16672450

  3. A haplotype map of genomic variations and genome-wide association studies of agronomic traits in foxtail millet (Setaria italica).

    PubMed

    Jia, Guanqing; Huang, Xuehui; Zhi, Hui; Zhao, Yan; Zhao, Qiang; Li, Wenjun; Chai, Yang; Yang, Lifang; Liu, Kunyan; Lu, Hengyun; Zhu, Chuanrang; Lu, Yiqi; Zhou, Congcong; Fan, Danlin; Weng, Qijun; Guo, Yunli; Huang, Tao; Zhang, Lei; Lu, Tingting; Feng, Qi; Hao, Hangfei; Liu, Hongkuan; Lu, Ping; Zhang, Ning; Li, Yuhui; Guo, Erhu; Wang, Shujun; Wang, Suying; Liu, Jinrong; Zhang, Wenfei; Chen, Guoqiu; Zhang, Baojin; Li, Wei; Wang, Yongfang; Li, Haiquan; Zhao, Baohua; Li, Jiayang; Diao, Xianmin; Han, Bin

    2013-08-01

    Foxtail millet (Setaria italica) is an important grain crop that is grown in arid regions. Here we sequenced 916 diverse foxtail millet varieties, identified 2.58 million SNPs and used 0.8 million common SNPs to construct a haplotype map of the foxtail millet genome. We classified the foxtail millet varieties into two divergent groups that are strongly correlated with early and late flowering times. We phenotyped the 916 varieties under five different environments and identified 512 loci associated with 47 agronomic traits by genome-wide association studies. We performed a de novo assembly of deeply sequenced genomes of a Setaria viridis accession (the wild progenitor of S. italica) and an S. italica variety and identified complex interspecies and intraspecies variants. We also identified 36 selective sweeps that seem to have occurred during modern breeding. This study provides fundamental resources for genetics research and genetic improvement in foxtail millet.

  4. Genomic Copy Number Dictates a Gene-Independent Cell Response to CRISPR/Cas9 Targeting | Office of Cancer Genomics

    Cancer.gov

    The CRISPR/Cas9 system enables genome editing and somatic cell genetic screens in mammalian cells. We performed genome-scale loss-of-function screens in 33 cancer cell lines to identify genes essential for proliferation/survival and found a strong correlation between increased gene copy number and decreased cell viability after genome editing. Within regions of copy-number gain, CRISPR/Cas9 targeting of both expressed and unexpressed genes, as well as intergenic loci, led to significantly decreased cell proliferation through induction of a G2 cell-cycle arrest.

  5. Comparison of GWAS models to identify non-additive genetic control of flowering time in sunflower hybrids.

    PubMed

    Bonnafous, Fanny; Fievet, Ghislain; Blanchet, Nicolas; Boniface, Marie-Claude; Carrère, Sébastien; Gouzy, Jérôme; Legrand, Ludovic; Marage, Gwenola; Bret-Mestries, Emmanuelle; Munos, Stéphane; Pouilly, Nicolas; Vincourt, Patrick; Langlade, Nicolas; Mangin, Brigitte

    2018-02-01

    This study compares five models of GWAS, to show the added value of non-additive modeling of allelic effects to identify genomic regions controlling flowering time of sunflower hybrids. Genome-wide association studies are a powerful and widely used tool to decipher the genetic control of complex traits. One of the main challenges for hybrid crops, such as maize or sunflower, is to model the hybrid vigor in the linear mixed models, considering the relatedness between individuals. Here, we compared two additive and three non-additive association models for their ability to identify genomic regions associated with flowering time in sunflower hybrids. A panel of 452 sunflower hybrids, corresponding to incomplete crossing between 36 male lines and 36 female lines, was phenotyped in five environments and genotyped for 2,204,423 SNPs. Intra-locus effects were estimated in multi-locus models to detect genomic regions associated with flowering time using the different models. Thirteen quantitative trait loci were identified in total, two with both model categories and one with only non-additive models. A quantitative trait loci on LG09, detected by both the additive and non-additive models, is located near a GAI homolog and is presented in detail. Overall, this study shows the added value of non-additive modeling of allelic effects for identifying genomic regions that control traits of interest and that could participate in the heterosis observed in hybrids.

  6. Dcode.org anthology of comparative genomic tools.

    PubMed

    Loots, Gabriela G; Ovcharenko, Ivan

    2005-07-01

    Comparative genomics provides the means to demarcate functional regions in anonymous DNA sequences. The successful application of this method to identifying novel genes is currently shifting to deciphering the non-coding encryption of gene regulation across genomes. To facilitate the practical application of comparative sequence analysis to genetics and genomics, we have developed several analytical and visualization tools for the analysis of arbitrary sequences and whole genomes. These tools include two alignment tools, zPicture and Mulan; a phylogenetic shadowing tool, eShadow for identifying lineage- and species-specific functional elements; two evolutionary conserved transcription factor analysis tools, rVista and multiTF; a tool for extracting cis-regulatory modules governing the expression of co-regulated genes, Creme 2.0; and a dynamic portal to multiple vertebrate and invertebrate genome alignments, the ECR Browser. Here, we briefly describe each one of these tools and provide specific examples on their practical applications. All the tools are publicly available at the http://www.dcode.org/ website.

  7. QTLs Regulating the Contents of Antioxidants, Phenolics, and Flavonoids in Soybean Seeds Share a Common Genomic Region.

    PubMed

    Li, Man-Wah; Muñoz, Nacira B; Wong, Chi-Fai; Wong, Fuk-Ling; Wong, Kwong-Sen; Wong, Johanna Wing-Hang; Qi, Xinpeng; Li, Kwan-Pok; Ng, Ming-Sin; Lam, Hon-Ming

    2016-01-01

    Soybean seeds are a rich source of phenolic compounds, especially isoflavonoids, which are important nutraceuticals. Our study using 14 wild- and 16 cultivated-soybean accessions shows that seeds from cultivated soybeans generally contain lower total antioxidants compared to their wild counterparts, likely an unintended consequence of domestication or human selection. Using a recombinant inbred population resulting from a wild and a cultivated soybean parent and a bin map approach, we have identified an overlapping genomic region containing major quantitative trait loci (QTLs) that regulate the seed contents of total antioxidants, phenolics, and flavonoids. The QTL for seed antioxidant content contains 14 annotated genes based on the Williams 82 reference genome (Gmax1.01). None of these genes encodes functions that are related to the phenylpropanoid pathway of soybean. However, we found three putative Multidrug And Toxic Compound Extrusion (MATE) transporter genes within this QTL and one adjacent to it (GmMATE1-4). Moreover, we have identified non-synonymous changes between GmMATE1 and GmMATE2, and that GmMATE3 encodes an antisense transcript that expresses in pods. Whether the polymorphisms in GmMATE proteins are major determinants of the antioxidant contents, or whether the antisense transcripts of GmMATE3 play important regulatory roles, awaits further functional investigations.

  8. Evolutionary genomics of animal personality.

    PubMed

    van Oers, Kees; Mueller, Jakob C

    2010-12-27

    Research on animal personality can be approached from both a phenotypic and a genetic perspective. While using a phenotypic approach one can measure present selection on personality traits and their combinations. However, this approach cannot reconstruct the historical trajectory that was taken by evolution. Therefore, it is essential for our understanding of the causes and consequences of personality diversity to link phenotypic variation in personality traits with polymorphisms in genomic regions that code for this trait variation. Identifying genes or genome regions that underlie personality traits will open exciting possibilities to study natural selection at the molecular level, gene-gene and gene-environment interactions, pleiotropic effects and how gene expression shapes personality phenotypes. In this paper, we will discuss how genome information revealed by already established approaches and some more recent techniques such as high-throughput sequencing of genomic regions in a large number of individuals can be used to infer micro-evolutionary processes, historical selection and finally the maintenance of personality trait variation. We will do this by reviewing recent advances in molecular genetics of animal personality, but will also use advanced human personality studies as case studies of how molecular information may be used in animal personality research in the near future.

  9. Identifying regional opportunities for accelerated timber managemnet

    Treesearch

    David A. Gansner; Joseph E. Barnard; Samuel F. Gingrich; Samuel F. Gingrich

    1973-01-01

    Describes a procedure for identifying regional opportunities for accelerated timber management and demonstrates its application. Results provide a basis for rational choices among alternative management strategies and permit meaningful micro- and macro-evaluations of treatment response.

  10. An enhanced genome-scale metabolic reconstruction of Streptomyces clavuligerus identifies novel strain improvement strategies.

    PubMed

    Toro, León; Pinilla, Laura; Avignone-Rossa, Claudio; Ríos-Estepa, Rigoberto

    2018-05-01

    In this work, we expanded and updated a genome-scale metabolic model of Streptomyces clavuligerus. The model includes 1021 genes and 1494 biochemical reactions; genome-reaction information was curated and new features related to clavam metabolism and to the biomass synthesis equation were incorporated. The model was validated using experimental data from the literature and simulations were performed to predict cellular growth and clavulanic acid biosynthesis. Flux balance analysis (FBA) showed that limiting concentrations of phosphate and an excess of ammonia accumulation are unfavorable for growth and clavulanic acid biosynthesis. The evaluation of different objective functions for FBA showed that maximization of ATP yields the best predictions for cellular behavior in continuous cultures, while the maximization of growth rate provides better predictions for batch cultures. Through gene essentiality analysis, 130 essential genes were found using a limited in silico media, while 100 essential genes were identified in amino acid-supplemented media. Finally, a strain design was carried out to identify candidate genes to be overexpressed or knocked out so as to maximize antibiotic biosynthesis. Interestingly, potential metabolic engineering targets, identified in this study, have not been tested experimentally.

  11. snpTree--a web-server to identify and construct SNP trees from whole genome sequence data.

    PubMed

    Leekitcharoenphon, Pimlapas; Kaas, Rolf S; Thomsen, Martin Christen Frølund; Friis, Carsten; Rasmussen, Simon; Aarestrup, Frank M

    2012-01-01

    The advances and decreasing economical cost of whole genome sequencing (WGS), will soon make this technology available for routine infectious disease epidemiology. In epidemiological studies, outbreak isolates have very little diversity and require extensive genomic analysis to differentiate and classify isolates. One of the successfully and broadly used methods is analysis of single nucletide polymorphisms (SNPs). Currently, there are different tools and methods to identify SNPs including various options and cut-off values. Furthermore, all current methods require bioinformatic skills. Thus, we lack a standard and simple automatic tool to determine SNPs and construct phylogenetic tree from WGS data. Here we introduce snpTree, a server for online-automatic SNPs analysis. This tool is composed of different SNPs analysis suites, perl and python scripts. snpTree can identify SNPs and construct phylogenetic trees from WGS as well as from assembled genomes or contigs. WGS data in fastq format are aligned to reference genomes by BWA while contigs in fasta format are processed by Nucmer. SNPs are concatenated based on position on reference genome and a tree is constructed from concatenated SNPs using FastTree and a perl script. The online server was implemented by HTML, Java and python script.The server was evaluated using four published bacterial WGS data sets (V. cholerae, S. aureus CC398, S. Typhimurium and M. tuberculosis). The evaluation results for the first three cases was consistent and concordant for both raw reads and assembled genomes. In the latter case the original publication involved extensive filtering of SNPs, which could not be repeated using snpTree. The snpTree server is an easy to use option for rapid standardised and automatic SNP analysis in epidemiological studies also for users with limited bioinformatic experience. The web server is freely accessible at http://www.cbs.dtu.dk/services/snpTree-1.0/.

  12. Short and long-term genome stability analysis of prokaryotic genomes.

    PubMed

    Brilli, Matteo; Liò, Pietro; Lacroix, Vincent; Sagot, Marie-France

    2013-05-08

    Gene organization dynamics is actively studied because it provides useful evolutionary information, makes functional annotation easier and often enables to characterize pathogens. There is therefore a strong interest in understanding the variability of this trait and the possible correlations with life-style. Two kinds of events affect genome organization: on one hand translocations and recombinations change the relative position of genes shared by two genomes (i.e. the backbone gene order); on the other, insertions and deletions leave the backbone gene order unchanged but they alter the gene neighborhoods by breaking the syntenic regions. A complete picture about genome organization evolution therefore requires to account for both kinds of events. We developed an approach where we model chromosomes as graphs on which we compute different stability estimators; we consider genome rearrangements as well as the effect of gene insertions and deletions. In a first part of the paper, we fit a measure of backbone gene order conservation (hereinafter called backbone stability) against phylogenetic distance for over 3000 genome comparisons, improving existing models for the divergence in time of backbone stability. Intra- and inter-specific comparisons were treated separately to focus on different time-scales. The use of multiple genomes of a same species allowed to identify genomes with diverging gene order with respect to their conspecific. The inter-species analysis indicates that pathogens are more often unstable with respect to non-pathogens. In a second part of the text, we show that in pathogens, gene content dynamics (insertions and deletions) have a much more dramatic effect on genome organization stability than backbone rearrangements. In this work, we studied genome organization divergence taking into account the contribution of both genome order rearrangements and genome content dynamics. By studying species with multiple sequenced genomes available, we were

  13. Genome-Wide Association Scan in HIV-1-Infected Individuals Identifying Variants Influencing Disease Course

    PubMed Central

    van Manen, Daniëlle; Delaneau, Olivier; Kootstra, Neeltje A.; Boeser-Nunnink, Brigitte D.; Limou, Sophie; Bol, Sebastiaan M.; Burger, Judith A.; Zwinderman, Aeilko H.; Moerland, Perry D.; van 't Slot, Ruben; Zagury, Jean-François; van 't Wout, Angélique B.; Schuitemaker, Hanneke

    2011-01-01

    Background AIDS develops typically after 7–11 years of untreated HIV-1 infection, with extremes of very rapid disease progression (<2 years) and long-term non-progression (>15 years). To reveal additional host genetic factors that may impact on the clinical course of HIV-1 infection, we designed a genome-wide association study (GWAS) in 404 participants of the Amsterdam Cohort Studies on HIV-1 infection and AIDS. Methods The association of SNP genotypes with the clinical course of HIV-1 infection was tested in Cox regression survival analyses using AIDS-diagnosis and AIDS-related death as endpoints. Results Multiple, not previously identified SNPs, were identified to be strongly associated with disease progression after HIV-1 infection, albeit not genome-wide significant. However, three independent SNPs in the top ten associations between SNP genotypes and time between seroconversion and AIDS-diagnosis, and one from the top ten associations between SNP genotypes and time between seroconversion and AIDS-related death, had P-values smaller than 0.05 in the French Genomics of Resistance to Immunodeficiency Virus cohort on disease progression. Conclusions Our study emphasizes that the use of different phenotypes in GWAS may be useful to unravel the full spectrum of host genetic factors that may be associated with the clinical course of HIV-1 infection. PMID:21811574

  14. Micro-Scale Genomic DNA Copy Number Aberrations as Another Means of Mutagenesis in Breast Cancer

    PubMed Central

    Chao, Hann-Hsiang; He, Xiaping; Parker, Joel S.; Zhao, Wei; Perou, Charles M.

    2012-01-01

    Introduction In breast cancer, the basal-like subtype has high levels of genomic instability relative to other breast cancer subtypes with many basal-like-specific regions of aberration. There is evidence that this genomic instability extends to smaller scale genomic aberrations, as shown by a previously described micro-deletion event in the PTEN gene in the Basal-like SUM149 breast cancer cell line. Methods We sought to identify if small regions of genomic DNA copy number changes exist by using a high density, gene-centric Comparative Genomic Hybridizations (CGH) array on cell lines and primary tumors. A custom tiling array for CGH (244,000 probes, 200 bp tiling resolution) was created to identify small regions of genomic change, which was focused on previously identified basal-like-specific, and general cancer genes. Tumor genomic DNA from 94 patients and 2 breast cancer cell lines was labeled and hybridized to these arrays. Aberrations were called using SWITCHdna and the smallest 25% of SWITCHdna-defined genomic segments were called micro-aberrations (<64 contiguous probes, ∼ 15 kb). Results Our data showed that primary tumor breast cancer genomes frequently contained many small-scale copy number gains and losses, termed micro-aberrations, most of which are undetectable using typical-density genome-wide aCGH arrays. The basal-like subtype exhibited the highest incidence of these events. These micro-aberrations sometimes altered expression of the involved gene. We confirmed the presence of the PTEN micro-amplification in SUM149 and by mRNA-seq showed that this resulted in loss of expression of all exons downstream of this event. Micro-aberrations disproportionately affected the 5′ regions of the affected genes, including the promoter region, and high frequency of micro-aberrations was associated with poor survival. Conclusion Using a high-probe-density, gene-centric aCGH microarray, we present evidence of small-scale genomic aberrations that can contribute to

  15. A Genome-wide CRISPR Screen in Toxoplasma Identifies Essential Apicomplexan Genes.

    PubMed

    Sidik, Saima M; Huet, Diego; Ganesan, Suresh M; Huynh, My-Hang; Wang, Tim; Nasamu, Armiyaw S; Thiru, Prathapan; Saeij, Jeroen P J; Carruthers, Vern B; Niles, Jacquin C; Lourido, Sebastian

    2016-09-08

    Apicomplexan parasites are leading causes of human and livestock diseases such as malaria and toxoplasmosis, yet most of their genes remain uncharacterized. Here, we present the first genome-wide genetic screen of an apicomplexan. We adapted CRISPR/Cas9 to assess the contribution of each gene from the parasite Toxoplasma gondii during infection of human fibroblasts. Our analysis defines ∼200 previously uncharacterized, fitness-conferring genes unique to the phylum, from which 16 were investigated, revealing essential functions during infection of human cells. Secondary screens identify as an invasion factor the claudin-like apicomplexan microneme protein (CLAMP), which resembles mammalian tight-junction proteins and localizes to secretory organelles, making it critical to the initiation of infection. CLAMP is present throughout sequenced apicomplexan genomes and is essential during the asexual stages of the malaria parasite Plasmodium falciparum. These results provide broad-based functional information on T. gondii genes and will facilitate future approaches to expand the horizon of antiparasitic interventions. Copyright © 2016 Elsevier Inc. All rights reserved.

  16. A site specific model and analysis of the neutral somatic mutation rate in whole-genome cancer data.

    PubMed

    Bertl, Johanna; Guo, Qianyun; Juul, Malene; Besenbacher, Søren; Nielsen, Morten Muhlig; Hornshøj, Henrik; Pedersen, Jakob Skou; Hobolth, Asger

    2018-04-19

    Detailed modelling of the neutral mutational process in cancer cells is crucial for identifying driver mutations and understanding the mutational mechanisms that act during cancer development. The neutral mutational process is very complex: whole-genome analyses have revealed that the mutation rate differs between cancer types, between patients and along the genome depending on the genetic and epigenetic context. Therefore, methods that predict the number of different types of mutations in regions or specific genomic elements must consider local genomic explanatory variables. A major drawback of most methods is the need to average the explanatory variables across the entire region or genomic element. This procedure is particularly problematic if the explanatory variable varies dramatically in the element under consideration. To take into account the fine scale of the explanatory variables, we model the probabilities of different types of mutations for each position in the genome by multinomial logistic regression. We analyse 505 cancer genomes from 14 different cancer types and compare the performance in predicting mutation rate for both regional based models and site-specific models. We show that for 1000 randomly selected genomic positions, the site-specific model predicts the mutation rate much better than regional based models. We use a forward selection procedure to identify the most important explanatory variables. The procedure identifies site-specific conservation (phyloP), replication timing, and expression level as the best predictors for the mutation rate. Finally, our model confirms and quantifies certain well-known mutational signatures. We find that our site-specific multinomial regression model outperforms the regional based models. The possibility of including genomic variables on different scales and patient specific variables makes it a versatile framework for studying different mutational mechanisms. Our model can serve as the neutral null model

  17. QTL mapping of genome regions controlling temephos resistance in larvae of the mosquito Aedes aegypti.

    PubMed

    Reyes-Solis, Guadalupe Del Carmen; Saavedra-Rodriguez, Karla; Suarez, Adriana Flores; Black, William C

    2014-10-01

    The mosquito Aedes aegypti is the principal vector of dengue and yellow fever flaviviruses. Temephos is an organophosphate insecticide used globally to suppress Ae. aegypti larval populations but resistance has evolved in many locations. Quantitative Trait Loci (QTL) controlling temephos survival in Ae. aegypti larvae were mapped in a pair of F3 advanced intercross lines arising from temephos resistant parents from Solidaridad, México and temephos susceptible parents from Iquitos, Peru. Two sets of 200 F3 larvae were exposed to a discriminating dose of temephos and then dead larvae were collected and preserved for DNA isolation every two hours up to 16 hours. Larvae surviving longer than 16 hours were considered resistant. For QTL mapping, single nucleotide polymorphisms (SNPs) were identified at 23 single copy genes and 26 microsatellite loci of known physical positions in the Ae. aegypti genome. In both reciprocal crosses, Multiple Interval Mapping identified eleven QTL associated with time until death. In the Solidaridad×Iquitos (SLD×Iq) cross twelve were associated with survival but in the reciprocal IqxSLD cross, only six QTL were survival associated. Polymorphisms at acetylcholine esterase (AchE) loci 1 and 2 were not associated with either resistance phenotype suggesting that target site insensitivity is not an organophosphate resistance mechanism in this region of México. Temephos resistance is under the control of many metabolic genes of small effect and dispersed throughout the Ae. aegypti genome.

  18. QTL Mapping of Genome Regions Controlling Temephos Resistance in Larvae of the Mosquito Aedes aegypti

    PubMed Central

    Reyes-Solis, Guadalupe del Carmen; Saavedra-Rodriguez, Karla; Suarez, Adriana Flores; Black, William C.

    2014-01-01

    Introduction The mosquito Aedes aegypti is the principal vector of dengue and yellow fever flaviviruses. Temephos is an organophosphate insecticide used globally to suppress Ae. aegypti larval populations but resistance has evolved in many locations. Methodology/Principal Findings Quantitative Trait Loci (QTL) controlling temephos survival in Ae. aegypti larvae were mapped in a pair of F3 advanced intercross lines arising from temephos resistant parents from Solidaridad, México and temephos susceptible parents from Iquitos, Peru. Two sets of 200 F3 larvae were exposed to a discriminating dose of temephos and then dead larvae were collected and preserved for DNA isolation every two hours up to 16 hours. Larvae surviving longer than 16 hours were considered resistant. For QTL mapping, single nucleotide polymorphisms (SNPs) were identified at 23 single copy genes and 26 microsatellite loci of known physical positions in the Ae. aegypti genome. In both reciprocal crosses, Multiple Interval Mapping identified eleven QTL associated with time until death. In the Solidaridad×Iquitos (SLD×Iq) cross twelve were associated with survival but in the reciprocal IqxSLD cross, only six QTL were survival associated. Polymorphisms at acetylcholine esterase (AchE) loci 1 and 2 were not associated with either resistance phenotype suggesting that target site insensitivity is not an organophosphate resistance mechanism in this region of México. Conclusions/Significance Temephos resistance is under the control of many metabolic genes of small effect and dispersed throughout the Ae. aegypti genome. PMID:25330200

  19. Somatic Mutation Patterns in Hemizygous Genomic Regions Unveil Purifying Selection during Tumor Evolution

    PubMed Central

    Basu, Swaraj; Larsson, Erik

    2016-01-01

    Identification of cancer driver genes using somatic mutation patterns indicative of positive selection has become a major goal in cancer genomics. However, cancer cells additionally depend on a large number of genes involved in basic cellular processes. While such genes should in theory be subject to strong purifying (negative) selection against damaging somatic mutations, these patterns have been elusive and purifying selection remains inadequately explored in cancer. Here, we hypothesized that purifying selection should be evident in hemizygous genomic regions, where damaging mutations cannot be compensated for by healthy alleles. Using a 7,781-sample pan-cancer dataset, we first confirmed this in POLR2A, an essential gene where hemizygous deletions are known to confer elevated sensitivity to pharmacological suppression. We next used this principle to identify several genes and pathways that show patterns indicative of purifying selection to avoid deleterious mutations. These include the POLR2A interacting protein INTS10 as well as genes involved in mRNA splicing, nonsense-mediated mRNA decay and other RNA processing pathways. Many of these genes belong to large protein complexes, and strong overlaps were observed with recent functional screens for gene essentiality in human cells. Our analysis supports that purifying selection acts to preserve the remaining function of many hemizygously deleted essential genes in tumors, indicating vulnerabilities that might be exploited by future therapeutic strategies. PMID:28027311

  20. Exploring lateral genetic transfer among microbial genomes using TF-IDF.

    PubMed

    Cong, Yingnan; Chan, Yao-Ban; Ragan, Mark A

    2016-07-25

    Many microbes can acquire genetic material from their environment and incorporate it into their genome, a process known as lateral genetic transfer (LGT). Computational approaches have been developed to detect genomic regions of lateral origin, but typically lack sensitivity, ability to distinguish donor from recipient, and scalability to very large datasets. To address these issues we have introduced an alignment-free method based on ideas from document analysis, term frequency-inverse document frequency (TF-IDF). Here we examine the performance of TF-IDF on three empirical datasets: 27 genomes of Escherichia coli and Shigella, 110 genomes of enteric bacteria, and 143 genomes across 12 bacterial and three archaeal phyla. We investigate the effect of k-mer size, gap size and delineation of groups on the inference of genomic regions of lateral origin, finding an interplay among these parameters and sequence divergence. Because TF-IDF identifies donor groups and delineates regions of lateral origin within recipient genomes, aggregating these regions by gene enables us to explore, for the first time, the mosaic nature of lateral genes including the multiplicity of biological sources, ancestry of transfer and over-writing by subsequent transfers. We carry out Gene Ontology enrichment tests to investigate which biological processes are potentially affected by LGT.