Jung, Seung-Hyun; Shin, Seung-Hun; Yim, Seon-Hee; Choi, Hye-Sun; Lee, Sug-Hyung; Chung, Yeun-Jun
2009-07-31
Recently, microarray-based comparative genomic hybridization (array-CGH) has emerged as a very efficient technology with higher resolution for the genome-wide identification of copy number alterations (CNA). Although CNAs are thought to affect gene expression, there is no platform currently available for the integrated CNA-expression analysis. To achieve high-resolution copy number analysis integrated with expression profiles, we established human 30k oligoarray-based genome-wide copy number analysis system and explored the applicability of this system for integrated genome and transcriptome analysis using MDA-MB-231 cell line. We compared the CNAs detected by the oligoarray with those detected by the 3k BAC array for validation. The oligoarray identified the single copy difference more accurately and sensitively than the BAC array. Seventeen CNAs detected by both platforms in MDA-MB-231 such as gains of 5p15.33-13.1, 8q11.22-8q21.13, 17p11.2, and losses of 1p32.3, 8p23.3-8p11.21, and 9p21 were consistently identified in previous studies on breast cancer. There were 122 other small CNAs (mean size 1.79 mb) that were detected by oligoarray only, not by BAC-array. We performed genomic qPCR targeting 7 CNA regions, detected by oligoarray only, and one non-CNA region to validate the oligoarray CNA detection. All qPCR results were consistent with the oligoarray-CGH results. When we explored the possibility of combined interpretation of both DNA copy number and RNA expression profiles, mean DNA copy number and RNA expression levels showed a significant correlation. In conclusion, this 30k oligoarray-CGH system can be a reasonable choice for analyzing whole genome CNAs and RNA expression profiles at a lower cost.
Yasuike, Motoshige; Fujiwara, Atushi; Nakamura, Yoji; Iwasaki, Yuki; Nishiki, Issei; Sugaya, Takuma; Shimizu, Akio; Sano, Motohiko; Kobayashi, Takanori; Ototake, Mitsuru
2016-02-01
Bluefin tunas are one of the most important fishery resources worldwide. Because of high market values, bluefin tuna farming has been rapidly growing during recent years. At present, the most common form of the tuna farming is based on the stocking of wild-caught fish. Therefore, concerns have been raised about the negative impact of the tuna farming on wild stocks. Recently, the Pacific bluefin tuna (PBT), Thunnus orientalis, has succeeded in completing the reproduction cycle under aquaculture conditions, but production bottlenecks remain to be solved because of very little biological information on bluefin tunas. Functional genomics approaches promise to rapidly increase our knowledge on biological processes in the bluefin tuna. Here, we describe the development of the first 44K PBT oligonucleotide microarray (oligo-array), based on whole-genome shotgun (WGS) sequencing and large-scale expressed sequence tags (ESTs) data. In addition, we also introduce an initial 44K PBT oligo-array experiment using in vitro grown peripheral blood leukocytes (PBLs) stimulated with immunostimulants such as lipopolysaccharide (LPS: a cell wall component of Gram-negative bacteria) or polyinosinic:polycytidylic acid (poly I:C: a synthetic mimic of viral infection). This pilot 44K PBT oligo-array analysis successfully addressed distinct immune processes between LPS- and poly I:C- stimulated PBLs. Thus, we expect that this oligo-array will provide an excellent opportunity to analyze global gene expression profiles for a better understanding of diseases and stress, as well as for reproduction, development and influence of nutrition on tuna aquaculture production. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.
Røe, Oluf Dimitri; Anderssen, Endre; Helge, Eli; Pettersen, Caroline Hild; Olsen, Karina Standahl; Sandeck, Helmut; Haaverstad, Rune; Lundgren, Steinar; Larsson, Erik
2009-01-01
Background Malignant pleural mesothelioma is considered an almost incurable tumour with increasing incidence worldwide. It usually develops in the parietal pleura, from mesothelial lining or submesothelial cells, subsequently invading the visceral pleura. Chromosomal and genomic aberrations of mesothelioma are diverse and heterogenous. Genome-wide profiling of mesothelioma versus parietal and visceral normal pleural tissue could thus reveal novel genes and pathways explaining its aggressive phenotype. Methodology and Principal Findings Well-characterised tissue from five mesothelioma patients and normal parietal and visceral pleural samples from six non-cancer patients were profiled by Affymetrix oligoarray of 38 500 genes. The lists of differentially expressed genes tested for overrepresentation in KEGG PATHWAYS (Kyoto Encyclopedia of Genes and Genomes) and GO (gene ontology) terms revealed large differences of expression between visceral and parietal pleura, and both tissues differed from mesothelioma. Cell growth and intrinsic resistance in tumour versus parietal pleura was reflected in highly overexpressed cell cycle, mitosis, replication, DNA repair and anti-apoptosis genes. Several genes of the “salvage pathway” that recycle nucleobases were overexpressed, among them TYMS, encoding thymidylate synthase, the main target of the antifolate drug pemetrexed that is active in mesothelioma. Circadian rhythm genes were expressed in favour of tumour growth. The local invasive, non-metastatic phenotype of mesothelioma, could partly be due to overexpression of the known metastasis suppressors NME1 and NME2. Down-regulation of several tumour suppressor genes could contribute to mesothelioma progression. Genes involved in cell communication were down-regulated, indicating that mesothelioma may shield itself from the immune system. Similarly, in non-cancer parietal versus visceral pleura signal transduction, soluble transporter and adhesion genes were down-regulated. This could represent a genetical platform of the parietal pleura propensity to develop mesothelioma. Conclusions Genome-wide microarray approach using complex human tissue samples revealed novel expression patterns, reflecting some important features of mesothelioma biology that should be further explored. PMID:19662092
A Danish family with dominant deafness-onychodystrophy syndrome.
Vind-Kezunovic, Dina; Torring, Pernille M
2013-01-01
The rare hereditary disorder "dominant deafness and onychodystrophy (DDOD) syndrome" (OMIM 124480) has been described in a few case reports. No putative DDOD gene or locus has been mapped and the cause of the disorder remains unknown. We present here three male family members in three generations with sensori-neural deafness, onychodystrophy and brachydactyly inherited via autosomal dominant transmission. The family members presented with absent fingernails on the first and fifth digits. As to the feet, there were absent nails on second to fifth toes in two family members, whereas the third family member only had absent nails on the fifth toe. The proband had late dentition and his father a history of late dentition, but otherwise the teeth appeared normal. Comparative genomic hybridization array analysis (Agilent 400k oligoarray) of the proband did not detect any copy number variation. This Danish family fits within the spectrum of dominant deafness and onychodystrophy syndrome and further characterises this rare disorder.
Copy Number Variation in the Horse Genome
Ghosh, Sharmila; Qu, Zhipeng; Das, Pranab J.; Fang, Erica; Juras, Rytis; Cothran, E. Gus; McDonell, Sue; Kenney, Daniel G.; Lear, Teri L.; Adelson, David L.; Chowdhary, Bhanu P.; Raudsepp, Terje
2014-01-01
We constructed a 400K WG tiling oligoarray for the horse and applied it for the discovery of copy number variations (CNVs) in 38 normal horses of 16 diverse breeds, and the Przewalski horse. Probes on the array represented 18,763 autosomal and X-linked genes, and intergenic, sub-telomeric and chrY sequences. We identified 258 CNV regions (CNVRs) across all autosomes, chrX and chrUn, but not in chrY. CNVs comprised 1.3% of the horse genome with chr12 being most enriched. American Miniature horses had the highest and American Quarter Horses the lowest number of CNVs in relation to Thoroughbred reference. The Przewalski horse was similar to native ponies and draft breeds. The majority of CNVRs involved genes, while 20% were located in intergenic regions. Similar to previous studies in horses and other mammals, molecular functions of CNV-associated genes were predominantly in sensory perception, immunity and reproduction. The findings were integrated with previous studies to generate a composite genome-wide dataset of 1476 CNVRs. Of these, 301 CNVRs were shared between studies, while 1174 were novel and require further validation. Integrated data revealed that to date, 41 out of over 400 breeds of the domestic horse have been analyzed for CNVs, of which 11 new breeds were added in this study. Finally, the composite CNV dataset was applied in a pilot study for the discovery of CNVs in 6 horses with XY disorders of sexual development. A homozygous deletion involving AKR1C gene cluster in chr29 in two affected horses was considered possibly causative because of the known role of AKR1C genes in testicular androgen synthesis and sexual development. While the findings improve and integrate the knowledge of CNVs in horses, they also show that for effective discovery of variants of biomedical importance, more breeds and individuals need to be analyzed using comparable methodological approaches. PMID:25340504
Molecular characterization of chronic-type adult T-cell leukemia/lymphoma.
Yoshida, Noriaki; Karube, Kennosuke; Utsunomiya, Atae; Tsukasaki, Kunihiro; Imaizumi, Yoshitaka; Taira, Naoya; Uike, Naokuni; Umino, Akira; Arita, Kotaro; Suguro, Miyuki; Tsuzuki, Shinobu; Kinoshita, Tomohiro; Ohshima, Koichi; Seto, Masao
2014-11-01
Adult T-cell leukemia/lymphoma (ATL) is a human T-cell leukemia virus type-1-induced neoplasm with four clinical subtypes: acute, lymphoma, chronic, and smoldering. Although the chronic type is regarded as indolent ATL, about half of the cases progress to acute-type ATL. The molecular pathogenesis of acute transformation in chronic-type ATL is only partially understood. In an effort to determine the molecular pathogeneses of ATL, and especially the molecular mechanism of acute transformation, oligo-array comparative genomic hybridization and comprehensive gene expression profiling were applied to 27 and 35 cases of chronic and acute type ATL, respectively. The genomic profile of the chronic type was nearly identical to that of acute-type ATL, although more genomic alterations characteristic of acute-type ATL were observed. Among the genomic alterations frequently observed in acute-type ATL, the loss of CDKN2A, which is involved in cell-cycle deregulation, was especially characteristic of acute-type ATL compared with chronic-type ATL. Furthermore, we found that genomic alteration of CD58, which is implicated in escape from the immunosurveillance mechanism, is more frequently observed in acute-type ATL than in the chronic-type. Interestingly, the chronic-type cases with cell-cycle deregulation and disruption of immunosurveillance mechanism were associated with earlier progression to acute-type ATL. These findings suggested that cell-cycle deregulation and the immune escape mechanism play important roles in acute transformation of the chronic type and indicated that these alterations are good predictive markers for chronic-type ATL. ©2014 American Association for Cancer Research.
USDA-ARS?s Scientific Manuscript database
Expressed sequence tag (EST) simple sequence repeats (SSRs) in Prunus were mined, and flanking primers designed and used for genome-wide characterization and selection of primers to optimize marker distribution and reliability. A total of 12,618 contigs were assembled from 84,727 ESTs, along with 34...
Molecular Pathology of Adult T-Cell Leukemia/Lymphoma.
Ohshima, Koichi
2015-01-01
Adult T-cell leukemia/lymphoma (ATLL) is a peripheral T-cell neoplasm of highly pleomorphic lymphoid cells. ATLL is usually widely disseminated, and it is caused by human T-cell leukemia virus type 1 (HTLV-1). It is a disease with a long latency, and affected individuals are usually exposed to the virus very early in life. The cumulative incidence of ATLL is estimated to be 2.5% among HTLV-1 carriers. ATLL cells express CD2, CD3, CD5, CD4, and CD25, as well as CCR4 and FoxP3 of the regulatory T-cell marker. HTLV-1 is causally linked to ATLL, but infection alone is not sufficient to result in neoplastic transformation. A significant finding in this connection is that the Tax viral protein leads to transcriptional activation of many genes, while the HTLV-1 basic leucine zipper factor is thought to be important for T-cell proliferation and oncogenesis. Half of ATLL cases retain the ability to express HTLV-1 Tax, which is a target of HTLV-1-specific cytotoxic T lymphocytes (CTL). An increase in HTLV-1-specific CTL responses is observed in some asymptomatic HTLV-1 carriers. Although HTLV-1-specific CTL are also present in the peripheral blood of ATLL patients, they do not expand sufficiently. We investigated the clinicopathological features and analyzed the staining of Tax-specific CTL and FoxP3. Tax-specific CTL correlated inversely with FoxP3, an increase in the ratio of CD163+ tumor-associated macrophages was associated with worse clinical prognosis, and ATLL cell lines proliferated significantly following direct co-culture with M2 macrophages. Several clinical variants of ATLL have been identified: acute, lymphomatous, chronic, and smoldering. Oligo-array comparative genomic hybridization revealed that genomic loss of 9p21.3 was a significant characteristic of acute-type, but not of chronic-type ATLL. Furthermore, we found that genomic alteration of CD58, which is implicated in immune escape, is more frequently observed in acute than in chronic ATLL. Interestingly, the chronic cases with cell cycle deregulation and disruption of immunosurveillance mechanism were associated with faster progression to acute ATLL. Immune evasion, microenvironment, and genetic alteration are therefore important in the multi-step progression of ATLL lymphomagenesis. © 2015 S. Karger AG, Basel.
Ohashi, J; Clark, A G
2005-05-01
The recent cataloguing of a large number of SNPs enables us to perform genome-wide association studies for detecting common genetic variants associated with disease. Such studies, however, generally have limited research budgets for genotyping and phenotyping. It is therefore necessary to optimize the study design by determining the most cost-effective numbers of SNPs and individuals to analyze. In this report we applied the stepwise focusing method, with two-stage design, developed by Satagopan et al. (2002) and Saito & Kamatani (2002), to optimize the cost-effectiveness of a genome-wide direct association study using a transmission/disequilibrium test (TDT). The stepwise focusing method consists of two steps: a large number of SNPs are examined in the first focusing step, and then all the SNPs showing a significant P-value are tested again using a larger set of individuals in the second focusing step. In the framework of optimization, the numbers of SNPs and families and the significance levels in the first and second steps were regarded as variables to be considered. Our results showed that the stepwise focusing method achieves a distinct gain of power compared to a conventional method with the same research budget.
Optimization and quality control of genome-wide Hi-C library preparation.
Zhang, Xiang-Yuan; He, Chao; Ye, Bing-Yu; Xie, De-Jian; Shi, Ming-Lei; Zhang, Yan; Shen, Wen-Long; Li, Ping; Zhao, Zhi-Hu
2017-09-20
Highest-throughput chromosome conformation capture (Hi-C) is one of the key assays for genome- wide chromatin interaction studies. It is a time-consuming process that involves many steps and many different kinds of reagents, consumables, and equipments. At present, the reproducibility is unsatisfactory. By optimizing the key steps of the Hi-C experiment, such as crosslinking, pretreatment of digestion, inactivation of restriction enzyme, and in situ ligation etc., we established a robust Hi-C procedure and prepared two biological replicates of Hi-C libraries from the GM12878 cells. After preliminary quality control by Sanger sequencing, the two replicates were high-throughput sequenced. The bioinformatics analysis of the raw sequencing data revealed the mapping-ability and pair-mate rate of the raw data were around 90% and 72%, respectively. Additionally, after removal of self-circular ligations and dangling-end products, more than 96% of the valid pairs were reached. Genome-wide interactome profiling shows clear topological associated domains (TADs), which is consistent with previous reports. Further correlation analysis showed that the two biological replicates strongly correlate with each other in terms of both bin coverage and all bin pairs. All these results indicated that the optimized Hi-C procedure is robust and stable, which will be very helpful for the wide applications of the Hi-C assay.
NASA Astrophysics Data System (ADS)
Kikuchi, Shoshi
2009-02-01
Completion of the high-precision genome sequence analysis of rice led to the collection of about 35,000 full-length cDNA clones and the determination of their complete sequences. Mapping of these full-length cDNA sequences has given us information on (1) the number of genes expressed in the rice genome; (2) the start and end positions and exon-intron structures of rice genes; (3) alternative transcripts; (4) possible encoded proteins; (5) non-protein-coding (np) RNAs; (6) the density of gene localization on the chromosome; (7) setting the parameters of gene prediction programs; and (8) the construction of a microarray system that monitors global gene expression. Manual curation for rice gene annotation by using mapping information on full-length cDNA and EST assemblies has revealed about 32,000 expressed genes in the rice genome. Analysis of major gene families, such as those encoding membrane transport proteins (pumps, ion channels, and secondary transporters), along with the evolution from bacteria to higher animals and plants, reveals how gene numbers have increased through adaptation to circumstances. Family-based gene annotation also gives us a new way of comparing organisms. Massive amounts of data on gene expression under many kinds of physiological conditions are being accumulated in rice oligoarrays (22K and 44K) based on full-length cDNA sequences. Cluster analyses of genes that have the same promoter cis-elements, that have similar expression profiles, or that encode enzymes in the same metabolic pathways or signal transduction cascades give us clues to understanding the networks of gene expression in rice. As a tool for that purpose, we recently developed "RiCES", a tool for searching for cis-elements in the promoter regions of clustered genes.
MAKER-P: a tool-kit for the creation, management, and quality control of plant genome annotations
USDA-ARS?s Scientific Manuscript database
We have optimized and extended the widely used annotation-engine MAKER for use on plant genomes. We have benchmarked the resulting software, MAKER-P, using the A. thaliana genome and the TAIR10 gene models. Here we demonstrate the ability of the MAKER-P toolkit to generate de novo repeat databases, ...
Soyer, Jessica L; El Ghalid, Mennat; Glaser, Nicolas; Ollivier, Bénédicte; Linglin, Juliette; Grandaubert, Jonathan; Balesdent, Marie-Hélène; Connolly, Lanelle R; Freitag, Michael; Rouxel, Thierry; Fudal, Isabelle
2014-03-01
Plant pathogens secrete an arsenal of small secreted proteins (SSPs) acting as effectors that modulate host immunity to facilitate infection. SSP-encoding genes are often located in particular genomic environments and show waves of concerted expression at diverse stages of plant infection. To date, little is known about the regulation of their expression. The genome of the Ascomycete Leptosphaeria maculans comprises alternating gene-rich GC-isochores and gene-poor AT-isochores. The AT-isochores harbor mosaics of transposable elements, encompassing one-third of the genome, and are enriched in putative effector genes that present similar expression patterns, namely no expression or low-level expression during axenic cultures compared to strong induction of expression during primary infection of oilseed rape (Brassica napus). Here, we investigated the involvement of one specific histone modification, histone H3 lysine 9 methylation (H3K9me3), in epigenetic regulation of concerted effector gene expression in L. maculans. For this purpose, we silenced the expression of two key players in heterochromatin assembly and maintenance, HP1 and DIM-5 by RNAi. By using HP1-GFP as a heterochromatin marker, we observed that almost no chromatin condensation is visible in strains in which LmDIM5 was silenced by RNAi. By whole genome oligoarrays we observed overexpression of 369 or 390 genes, respectively, in the silenced-LmHP1 and -LmDIM5 transformants during growth in axenic culture, clearly favouring expression of SSP-encoding genes within AT-isochores. The ectopic integration of four effector genes in GC-isochores led to their overexpression during growth in axenic culture. These data strongly suggest that epigenetic control, mediated by HP1 and DIM-5, represses the expression of at least part of the effector genes located in AT-isochores during growth in axenic culture. Our hypothesis is that changes of lifestyle and a switch toward pathogenesis lift chromatin-mediated repression, allowing a rapid response to new environmental conditions.
Genome-Wide Association Study and Linkage Analysis of the Healthy Aging Index
Minster, Ryan L.; Sanders, Jason L.; Singh, Jatinder; Kammerer, Candace M.; Barmada, M. Michael; Matteini, Amy M.; Zhang, Qunyuan; Wojczynski, Mary K.; Daw, E. Warwick; Brody, Jennifer A.; Arnold, Alice M.; Lunetta, Kathryn L.; Murabito, Joanne M.; Christensen, Kaare; Perls, Thomas T.; Province, Michael A.
2015-01-01
Background. The Healthy Aging Index (HAI) is a tool for measuring the extent of health and disease across multiple systems. Methods. We conducted a genome-wide association study and a genome-wide linkage analysis to map quantitative trait loci associated with the HAI and a modified HAI weighted for mortality risk in 3,140 individuals selected for familial longevity from the Long Life Family Study. The genome-wide association study used the Long Life Family Study as the discovery cohort and individuals from the Cardiovascular Health Study and the Framingham Heart Study as replication cohorts. Results. There were no genome-wide significant findings from the genome-wide association study; however, several single-nucleotide polymorphisms near ZNF704 on chromosome 8q21.13 were suggestively associated with the HAI in the Long Life Family Study (p < 10− 6) and nominally replicated in the Cardiovascular Health Study and Framingham Heart Study. Linkage results revealed significant evidence (log-odds score = 3.36) for a quantitative trait locus for mortality-optimized HAI in women on chromosome 9p24–p23. However, results of fine-mapping studies did not implicate any specific candidate genes within this region of interest. Conclusions. ZNF704 may be a potential candidate gene for studies of the genetic underpinnings of longevity. PMID:25758594
High-Performance Mixed Models Based Genome-Wide Association Analysis with omicABEL software
Fabregat-Traver, Diego; Sharapov, Sodbo Zh.; Hayward, Caroline; Rudan, Igor; Campbell, Harry; Aulchenko, Yurii; Bientinesi, Paolo
2014-01-01
To raise the power of genome-wide association studies (GWAS) and avoid false-positive results in structured populations, one can rely on mixed model based tests. When large samples are used, and when multiple traits are to be studied in the ’omics’ context, this approach becomes computationally challenging. Here we consider the problem of mixed-model based GWAS for arbitrary number of traits, and demonstrate that for the analysis of single-trait and multiple-trait scenarios different computational algorithms are optimal. We implement these optimal algorithms in a high-performance computing framework that uses state-of-the-art linear algebra kernels, incorporates optimizations, and avoids redundant computations, increasing throughput while reducing memory usage and energy consumption. We show that, compared to existing libraries, our algorithms and software achieve considerable speed-ups. The OmicABEL software described in this manuscript is available under the GNU GPL v. 3 license as part of the GenABEL project for statistical genomics at http: //www.genabel.org/packages/OmicABEL. PMID:25717363
High-Performance Mixed Models Based Genome-Wide Association Analysis with omicABEL software.
Fabregat-Traver, Diego; Sharapov, Sodbo Zh; Hayward, Caroline; Rudan, Igor; Campbell, Harry; Aulchenko, Yurii; Bientinesi, Paolo
2014-01-01
To raise the power of genome-wide association studies (GWAS) and avoid false-positive results in structured populations, one can rely on mixed model based tests. When large samples are used, and when multiple traits are to be studied in the 'omics' context, this approach becomes computationally challenging. Here we consider the problem of mixed-model based GWAS for arbitrary number of traits, and demonstrate that for the analysis of single-trait and multiple-trait scenarios different computational algorithms are optimal. We implement these optimal algorithms in a high-performance computing framework that uses state-of-the-art linear algebra kernels, incorporates optimizations, and avoids redundant computations, increasing throughput while reducing memory usage and energy consumption. We show that, compared to existing libraries, our algorithms and software achieve considerable speed-ups. The OmicABEL software described in this manuscript is available under the GNU GPL v. 3 license as part of the GenABEL project for statistical genomics at http: //www.genabel.org/packages/OmicABEL.
SuperDCA for genome-wide epistasis analysis.
Puranen, Santeri; Pesonen, Maiju; Pensar, Johan; Xu, Ying Ying; Lees, John A; Bentley, Stephen D; Croucher, Nicholas J; Corander, Jukka
2018-05-29
The potential for genome-wide modelling of epistasis has recently surfaced given the possibility of sequencing densely sampled populations and the emerging families of statistical interaction models. Direct coupling analysis (DCA) has previously been shown to yield valuable predictions for single protein structures, and has recently been extended to genome-wide analysis of bacteria, identifying novel interactions in the co-evolution between resistance, virulence and core genome elements. However, earlier computational DCA methods have not been scalable to enable model fitting simultaneously to 10 4 -10 5 polymorphisms, representing the amount of core genomic variation observed in analyses of many bacterial species. Here, we introduce a novel inference method (SuperDCA) that employs a new scoring principle, efficient parallelization, optimization and filtering on phylogenetic information to achieve scalability for up to 10 5 polymorphisms. Using two large population samples of Streptococcus pneumoniae, we demonstrate the ability of SuperDCA to make additional significant biological findings about this major human pathogen. We also show that our method can uncover signals of selection that are not detectable by genome-wide association analysis, even though our analysis does not require phenotypic measurements. SuperDCA, thus, holds considerable potential in building understanding about numerous organisms at a systems biological level.
USDA-ARS?s Scientific Manuscript database
Introduction: Personalized diets based on an individual's genome to optimize the success of dietary intervention and reduce genetic cardiovascular disease (CVD) risk, is one of the challenges most frequently discussed in the scientific community. Moreover, it has been widely welcomed and demanded by...
Park, Jeongbin; Bae, Sangsu
2018-03-15
Following the type II CRISPR-Cas9 system, type V CRISPR-Cpf1 endonucleases have been found to be applicable for genome editing in various organisms in vivo. However, there are as yet no web-based tools capable of optimally selecting guide RNAs (gRNAs) among all possible genome-wide target sites. Here, we present Cpf1-Database, a genome-wide gRNA library design tool for LbCpf1 and AsCpf1, which have DNA recognition sequences of 5'-TTTN-3' at the 5' ends of target sites. Cpf1-Database provides a sophisticated but simple way to design gRNAs for AsCpf1 nucleases on the genome scale. One can easily access the data using a straightforward web interface, and using the powerful collections feature one can easily design gRNAs for thousands of genes in short time. Free access at http://www.rgenome.net/cpf1-database/. sangsubae@hanyang.ac.kr.
Genome-Wide Association Study and Linkage Analysis of the Healthy Aging Index.
Minster, Ryan L; Sanders, Jason L; Singh, Jatinder; Kammerer, Candace M; Barmada, M Michael; Matteini, Amy M; Zhang, Qunyuan; Wojczynski, Mary K; Daw, E Warwick; Brody, Jennifer A; Arnold, Alice M; Lunetta, Kathryn L; Murabito, Joanne M; Christensen, Kaare; Perls, Thomas T; Province, Michael A; Newman, Anne B
2015-08-01
The Healthy Aging Index (HAI) is a tool for measuring the extent of health and disease across multiple systems. We conducted a genome-wide association study and a genome-wide linkage analysis to map quantitative trait loci associated with the HAI and a modified HAI weighted for mortality risk in 3,140 individuals selected for familial longevity from the Long Life Family Study. The genome-wide association study used the Long Life Family Study as the discovery cohort and individuals from the Cardiovascular Health Study and the Framingham Heart Study as replication cohorts. There were no genome-wide significant findings from the genome-wide association study; however, several single-nucleotide polymorphisms near ZNF704 on chromosome 8q21.13 were suggestively associated with the HAI in the Long Life Family Study (p < 10(-) (6)) and nominally replicated in the Cardiovascular Health Study and Framingham Heart Study. Linkage results revealed significant evidence (log-odds score = 3.36) for a quantitative trait locus for mortality-optimized HAI in women on chromosome 9p24-p23. However, results of fine-mapping studies did not implicate any specific candidate genes within this region of interest. ZNF704 may be a potential candidate gene for studies of the genetic underpinnings of longevity. © The Author 2015. Published by Oxford University Press on behalf of The Gerontological Society of America. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
GeNets: a unified web platform for network-based genomic analyses.
Li, Taibo; Kim, April; Rosenbluh, Joseph; Horn, Heiko; Greenfeld, Liraz; An, David; Zimmer, Andrew; Liberzon, Arthur; Bistline, Jon; Natoli, Ted; Li, Yang; Tsherniak, Aviad; Narayan, Rajiv; Subramanian, Aravind; Liefeld, Ted; Wong, Bang; Thompson, Dawn; Calvo, Sarah; Carr, Steve; Boehm, Jesse; Jaffe, Jake; Mesirov, Jill; Hacohen, Nir; Regev, Aviv; Lage, Kasper
2018-06-18
Functional genomics networks are widely used to identify unexpected pathway relationships in large genomic datasets. However, it is challenging to compare the signal-to-noise ratios of different networks and to identify the optimal network with which to interpret a particular genetic dataset. We present GeNets, a platform in which users can train a machine-learning model (Quack) to carry out these comparisons and execute, store, and share analyses of genetic and RNA-sequencing datasets.
2016-02-11
the White- head Genome Technology Core for sequencing . This work was supported by the UCSF Program for Breakthrough Biomedical Research (funded in...landscape of the yeast genome defined by RNA sequencing . Science 320, 1344–1349. Nedialkova, D.D., and Leidel, S.A. (2015). Optimization of Codon Translation... the CC BY license (http://creativecommons.org/licenses/by/4.0/). SUMMARY Ribosome-footprint profiling provides genome -wide snapshots of translation
Multi-Instance Metric Transfer Learning for Genome-Wide Protein Function Prediction.
Xu, Yonghui; Min, Huaqing; Wu, Qingyao; Song, Hengjie; Ye, Bicui
2017-02-06
Multi-Instance (MI) learning has been proven to be effective for the genome-wide protein function prediction problems where each training example is associated with multiple instances. Many studies in this literature attempted to find an appropriate Multi-Instance Learning (MIL) method for genome-wide protein function prediction under a usual assumption, the underlying distribution from testing data (target domain, i.e., TD) is the same as that from training data (source domain, i.e., SD). However, this assumption may be violated in real practice. To tackle this problem, in this paper, we propose a Multi-Instance Metric Transfer Learning (MIMTL) approach for genome-wide protein function prediction. In MIMTL, we first transfer the source domain distribution to the target domain distribution by utilizing the bag weights. Then, we construct a distance metric learning method with the reweighted bags. At last, we develop an alternative optimization scheme for MIMTL. Comprehensive experimental evidence on seven real-world organisms verifies the effectiveness and efficiency of the proposed MIMTL approach over several state-of-the-art methods.
Das, Pranab J; McCarthy, Fiona; Vishnoi, Monika; Paria, Nandina; Gresham, Cathy; Li, Gang; Kachroo, Priyanka; Sudderth, A Kendrick; Teague, Sheila; Love, Charles C; Varner, Dickson D; Chowdhary, Bhanu P; Raudsepp, Terje
2013-01-01
Mature mammalian sperm contain a complex population of RNAs some of which might regulate spermatogenesis while others probably play a role in fertilization and early development. Due to this limited knowledge, the biological functions of sperm RNAs remain enigmatic. Here we report the first characterization of the global transcriptome of the sperm of fertile stallions. The findings improved understanding of the biological significance of sperm RNAs which in turn will allow the discovery of sperm-based biomarkers for stallion fertility. The stallion sperm transcriptome was interrogated by analyzing sperm and testes RNA on a 21,000-element equine whole-genome oligoarray and by RNA-seq. Microarray analysis revealed 6,761 transcripts in the sperm, of which 165 were sperm-enriched, and 155 were differentially expressed between the sperm and testes. Next, 70 million raw reads were generated by RNA-seq of which 50% could be aligned with the horse reference genome. A total of 19,257 sequence tags were mapped to all horse chromosomes and the mitochondrial genome. The highest density of mapped transcripts was in gene-rich ECA11, 12 and 13, and the lowest in gene-poor ECA9 and X; 7 gene transcripts originated from ECAY. Structural annotation aligned sperm transcripts with 4,504 known horse and/or human genes, rRNAs and 82 miRNAs, whereas 13,354 sequence tags remained anonymous. The data were aligned with selected equine gene models to identify additional exons and splice variants. Gene Ontology annotations showed that sperm transcripts were associated with molecular processes (chemoattractant-activated signal transduction, ion transport) and cellular components (membranes and vesicles) related to known sperm functions at fertilization, while some messenger and micro RNAs might be critical for early development. The findings suggest that the rich repertoire of coding and non-coding RNAs in stallion sperm is not a random remnant from spermatogenesis in testes but a selectively retained and functionally coherent collection of RNAs.
Das, Pranab J.; McCarthy, Fiona; Vishnoi, Monika; Paria, Nandina; Gresham, Cathy; Li, Gang; Kachroo, Priyanka; Sudderth, A. Kendrick; Teague, Sheila; Love, Charles C.; Varner, Dickson D.; Chowdhary, Bhanu P.; Raudsepp, Terje
2013-01-01
Mature mammalian sperm contain a complex population of RNAs some of which might regulate spermatogenesis while others probably play a role in fertilization and early development. Due to this limited knowledge, the biological functions of sperm RNAs remain enigmatic. Here we report the first characterization of the global transcriptome of the sperm of fertile stallions. The findings improved understanding of the biological significance of sperm RNAs which in turn will allow the discovery of sperm-based biomarkers for stallion fertility. The stallion sperm transcriptome was interrogated by analyzing sperm and testes RNA on a 21,000-element equine whole-genome oligoarray and by RNA-seq. Microarray analysis revealed 6,761 transcripts in the sperm, of which 165 were sperm-enriched, and 155 were differentially expressed between the sperm and testes. Next, 70 million raw reads were generated by RNA-seq of which 50% could be aligned with the horse reference genome. A total of 19,257 sequence tags were mapped to all horse chromosomes and the mitochondrial genome. The highest density of mapped transcripts was in gene-rich ECA11, 12 and 13, and the lowest in gene-poor ECA9 and X; 7 gene transcripts originated from ECAY. Structural annotation aligned sperm transcripts with 4,504 known horse and/or human genes, rRNAs and 82 miRNAs, whereas 13,354 sequence tags remained anonymous. The data were aligned with selected equine gene models to identify additional exons and splice variants. Gene Ontology annotations showed that sperm transcripts were associated with molecular processes (chemoattractant-activated signal transduction, ion transport) and cellular components (membranes and vesicles) related to known sperm functions at fertilization, while some messenger and micro RNAs might be critical for early development. The findings suggest that the rich repertoire of coding and non-coding RNAs in stallion sperm is not a random remnant from spermatogenesis in testes but a selectively retained and functionally coherent collection of RNAs. PMID:23409192
Evolution-guided optimization of biosynthetic pathways.
Raman, Srivatsan; Rogers, Jameson K; Taylor, Noah D; Church, George M
2014-12-16
Engineering biosynthetic pathways for chemical production requires extensive optimization of the host cellular metabolic machinery. Because it is challenging to specify a priori an optimal design, metabolic engineers often need to construct and evaluate a large number of variants of the pathway. We report a general strategy that combines targeted genome-wide mutagenesis to generate pathway variants with evolution to enrich for rare high producers. We convert the intracellular presence of the target chemical into a fitness advantage for the cell by using a sensor domain responsive to the chemical to control a reporter gene necessary for survival under selective conditions. Because artificial selection tends to amplify unproductive cheaters, we devised a negative selection scheme to eliminate cheaters while preserving library diversity. This scheme allows us to perform multiple rounds of evolution (addressing ∼10(9) cells per round) with minimal carryover of cheaters after each round. Based on candidate genes identified by flux balance analysis, we used targeted genome-wide mutagenesis to vary the expression of pathway genes involved in the production of naringenin and glucaric acid. Through up to four rounds of evolution, we increased production of naringenin and glucaric acid by 36- and 22-fold, respectively. Naringenin production (61 mg/L) from glucose was more than double the previous highest titer reported. Whole-genome sequencing of evolved strains revealed additional untargeted mutations that likely benefit production, suggesting new routes for optimization.
Galanter, Joshua Mark; Fernandez-Lopez, Juan Carlos; Gignoux, Christopher R; Barnholtz-Sloan, Jill; Fernandez-Rozadilla, Ceres; Via, Marc; Hidalgo-Miranda, Alfredo; Contreras, Alejandra V; Figueroa, Laura Uribe; Raska, Paola; Jimenez-Sanchez, Gerardo; Zolezzi, Irma Silva; Torres, Maria; Ponte, Clara Ruiz; Ruiz, Yarimar; Salas, Antonio; Nguyen, Elizabeth; Eng, Celeste; Borjas, Lisbeth; Zabala, William; Barreto, Guillermo; González, Fernando Rondón; Ibarra, Adriana; Taboada, Patricia; Porras, Liliana; Moreno, Fabián; Bigham, Abigail; Gutierrez, Gerardo; Brutsaert, Tom; León-Velarde, Fabiola; Moore, Lorna G; Vargas, Enrique; Cruz, Miguel; Escobedo, Jorge; Rodriguez-Santana, José; Rodriguez-Cintrón, William; Chapela, Rocio; Ford, Jean G; Bustamante, Carlos; Seminara, Daniela; Shriver, Mark; Ziv, Elad; Burchard, Esteban Gonzalez; Haile, Robert; Parra, Esteban; Carracedo, Angel
2012-01-01
Most individuals throughout the Americas are admixed descendants of Native American, European, and African ancestors. Complex historical factors have resulted in varying proportions of ancestral contributions between individuals within and among ethnic groups. We developed a panel of 446 ancestry informative markers (AIMs) optimized to estimate ancestral proportions in individuals and populations throughout Latin America. We used genome-wide data from 953 individuals from diverse African, European, and Native American populations to select AIMs optimized for each of the three main continental populations that form the basis of modern Latin American populations. We selected markers on the basis of locus-specific branch length to be informative, well distributed throughout the genome, capable of being genotyped on widely available commercial platforms, and applicable throughout the Americas by minimizing within-continent heterogeneity. We then validated the panel in samples from four admixed populations by comparing ancestry estimates based on the AIMs panel to estimates based on genome-wide association study (GWAS) data. The panel provided balanced discriminatory power among the three ancestral populations and accurate estimates of individual ancestry proportions (R² > 0.9 for ancestral components with significant between-subject variance). Finally, we genotyped samples from 18 populations from Latin America using the AIMs panel and estimated variability in ancestry within and between these populations. This panel and its reference genotype information will be useful resources to explore population history of admixture in Latin America and to correct for the potential effects of population stratification in admixed samples in the region.
Galanter, Joshua Mark; Fernandez-Lopez, Juan Carlos; Gignoux, Christopher R.; Barnholtz-Sloan, Jill; Fernandez-Rozadilla, Ceres; Via, Marc; Hidalgo-Miranda, Alfredo; Contreras, Alejandra V.; Figueroa, Laura Uribe; Raska, Paola; Jimenez-Sanchez, Gerardo; Silva Zolezzi, Irma; Torres, Maria; Ponte, Clara Ruiz; Ruiz, Yarimar; Salas, Antonio; Nguyen, Elizabeth; Eng, Celeste; Borjas, Lisbeth; Zabala, William; Barreto, Guillermo; Rondón González, Fernando; Ibarra, Adriana; Taboada, Patricia; Porras, Liliana; Moreno, Fabián; Bigham, Abigail; Gutierrez, Gerardo; Brutsaert, Tom; León-Velarde, Fabiola; Moore, Lorna G.; Vargas, Enrique; Cruz, Miguel; Escobedo, Jorge; Rodriguez-Santana, José; Rodriguez-Cintrón, William; Chapela, Rocio; Ford, Jean G.; Bustamante, Carlos; Seminara, Daniela; Shriver, Mark; Ziv, Elad; Gonzalez Burchard, Esteban; Haile, Robert
2012-01-01
Most individuals throughout the Americas are admixed descendants of Native American, European, and African ancestors. Complex historical factors have resulted in varying proportions of ancestral contributions between individuals within and among ethnic groups. We developed a panel of 446 ancestry informative markers (AIMs) optimized to estimate ancestral proportions in individuals and populations throughout Latin America. We used genome-wide data from 953 individuals from diverse African, European, and Native American populations to select AIMs optimized for each of the three main continental populations that form the basis of modern Latin American populations. We selected markers on the basis of locus-specific branch length to be informative, well distributed throughout the genome, capable of being genotyped on widely available commercial platforms, and applicable throughout the Americas by minimizing within-continent heterogeneity. We then validated the panel in samples from four admixed populations by comparing ancestry estimates based on the AIMs panel to estimates based on genome-wide association study (GWAS) data. The panel provided balanced discriminatory power among the three ancestral populations and accurate estimates of individual ancestry proportions (R2>0.9 for ancestral components with significant between-subject variance). Finally, we genotyped samples from 18 populations from Latin America using the AIMs panel and estimated variability in ancestry within and between these populations. This panel and its reference genotype information will be useful resources to explore population history of admixture in Latin America and to correct for the potential effects of population stratification in admixed samples in the region. PMID:22412386
Meng, Shan; He, Jianbo; Zhao, Tuanjie; Xing, Guangnan; Li, Yan; Yang, Shouping; Lu, Jiangjie; Wang, Yufeng; Gai, Junyi
2016-08-01
Utilizing an innovative GWAS in CSLRP, 44 QTL 199 alleles with 72.2 % contribution to SIFC variation were detected and organized into a QTL-allele matrix for cross design and gene annotation. The seed isoflavone content (SIFC) of soybeans is of great importance to health care. The Chinese soybean landrace population (CSLRP) as a genetic reservoir was studied for its whole-genome quantitative trait loci (QTL) system of the SIFC using an innovative restricted two-stage multi-locus genome-wide association study procedure (RTM-GWAS). A sample of 366 landraces was tested under four environments and sequenced using RAD-seq (restriction-site-associated DNA sequencing) technique to obtain 116,769 single nucleotide polymorphisms (SNPs) then organized into 29,119 SNP linkage disequilibrium blocks (SNPLDBs) for GWAS. The detected 44 QTL 199 alleles on 16 chromosomes (explaining 72.2 % of the total phenotypic variation) with the allele effects (92 positive and 107 negative) of the CSLRP were organized into a QTL-allele matrix showing the SIFC population genetic structure. Additional differentiation among eco-regions due to the SIFC in addition to that of genome-wide markers was found. All accessions comprised both positive and negative alleles, implying a great potential for recombination within the population. The optimal crosses were predicted from the matrices, showing transgressive potentials in the CSLRP. From the detected QTL system, 55 candidate genes related to 11 biological processes were χ (2)-tested as an SIFC candidate gene system. The present study explored the genome-wide SIFC QTL/gene system with the innovative RTM-GWAS and found the potentials of the QTL-allele matrix in optimal cross design and population genetic and genomic studies, which may have provided a solution to match the breeding by design strategy at both QTL and gene levels in breeding programs.
Non-isotopic Method for In Situ LncRNA Visualization and Quantitation.
Maqsodi, Botoul; Nikoloff, Corina
2016-01-01
In mammals and other eukaryotes, most of the genome is transcribed in a developmentally regulated manner to produce large numbers of long noncoding RNAs (lncRNAs). Genome-wide studies have identified thousands of lncRNAs lacking protein-coding capacity. RNA in situ hybridization technique is especially beneficial for the visualization of RNA (mRNA and lncRNA) expression in a heterogeneous population of cells/tissues; however its utility has been hampered by complicated procedures typically developed and optimized for the detection of a specific gene and therefore not amenable to a wide variety of genes and tissues.Recently, bDNA has revolutionized RNA in situ detection with fully optimized, robust assays for the detection of any mRNA and lncRNA targets in formalin-fixed paraffin-embedded (FFPE) and fresh frozen tissue sections using manual processing.
Baude, Jessica; Vial, Ludovic; Villard, Camille; Campillo, Tony; Lavire, Céline; Nesme, Xavier
2016-01-01
ABSTRACT The rhizosphere-inhabiting species Agrobacterium fabrum (genomospecies G8 of the Agrobacterium tumefaciens species complex) is known to degrade hydroxycinnamic acids (HCAs), especially ferulic acid and p-coumaric acid, via the novel A. fabrum HCA degradation pathway. Gene expression profiles of A. fabrum strain C58 were investigated in the presence of HCAs, using a C58 whole-genome oligoarray. Both ferulic acid and p-coumaric acid caused variations in the expression of more than 10% of the C58 genes. Genes of the A. fabrum HCA degradation pathway, together with the genes involved in iron acquisition, were among the most highly induced in the presence of HCAs. Two operons coding for the biosynthesis of a particular siderophore, as well as genes of the A. fabrum HCA degradation pathway, have been described as being specific to the species. We demonstrate here their coordinated expression, emphasizing the interdependence between the iron concentration in the growth medium and the rate at which ferulic acid is degraded by cells. The coordinated expression of these functions may be advantageous in HCA-rich but iron-starved environments in which microorganisms have to compete for both iron and carbon sources, such as in plant roots. The present results confirm that there is cooperation between the A. fabrum-specific genes, defining a particular ecological niche. IMPORTANCE We previously identified seven genomic regions in Agrobacterium fabrum that were specifically present in all of the members of this species only. Here we demonstrated that two of these regions, encoding the hydroxycinnamic acid degradation pathway and the iron acquisition pathway, were regulated in a coordinated manner. The coexpression of these functions may be advantageous in hydroxycinnamic acid-rich but iron-starved environments in which microorganisms have to compete for both iron and carbon sources, such as in plant roots. These data support the view that bacterial genomic species emerged from a bacterial population by acquiring specific functions that allowed them to outcompete their closest relatives. In conclusion, bacterial species could be defined not only as genomic species but also as ecological species. PMID:27060117
Haraksingh, Rajini R; Abyzov, Alexej; Urban, Alexander Eckehart
2017-04-24
High-resolution microarray technology is routinely used in basic research and clinical practice to efficiently detect copy number variants (CNVs) across the entire human genome. A new generation of arrays combining high probe densities with optimized designs will comprise essential tools for genome analysis in the coming years. We systematically compared the genome-wide CNV detection power of all 17 available array designs from the Affymetrix, Agilent, and Illumina platforms by hybridizing the well-characterized genome of 1000 Genomes Project subject NA12878 to all arrays, and performing data analysis using both manufacturer-recommended and platform-independent software. We benchmarked the resulting CNV call sets from each array using a gold standard set of CNVs for this genome derived from 1000 Genomes Project whole genome sequencing data. The arrays tested comprise both SNP and aCGH platforms with varying designs and contain between ~0.5 to ~4.6 million probes. Across the arrays CNV detection varied widely in number of CNV calls (4-489), CNV size range (~40 bp to ~8 Mbp), and percentage of non-validated CNVs (0-86%). We discovered strikingly strong effects of specific array design principles on performance. For example, some SNP array designs with the largest numbers of probes and extensive exonic coverage produced a considerable number of CNV calls that could not be validated, compared to designs with probe numbers that are sometimes an order of magnitude smaller. This effect was only partially ameliorated using different analysis software and optimizing data analysis parameters. High-resolution microarrays will continue to be used as reliable, cost- and time-efficient tools for CNV analysis. However, different applications tolerate different limitations in CNV detection. Our study quantified how these arrays differ in total number and size range of detected CNVs as well as sensitivity, and determined how each array balances these attributes. This analysis will inform appropriate array selection for future CNV studies, and allow better assessment of the CNV-analytical power of both published and ongoing array-based genomics studies. Furthermore, our findings emphasize the importance of concurrent use of multiple analysis algorithms and independent experimental validation in array-based CNV detection studies.
designGG: an R-package and web tool for the optimal design of genetical genomics experiments.
Li, Yang; Swertz, Morris A; Vera, Gonzalo; Fu, Jingyuan; Breitling, Rainer; Jansen, Ritsert C
2009-06-18
High-dimensional biomolecular profiling of genetically different individuals in one or more environmental conditions is an increasingly popular strategy for exploring the functioning of complex biological systems. The optimal design of such genetical genomics experiments in a cost-efficient and effective way is not trivial. This paper presents designGG, an R package for designing optimal genetical genomics experiments. A web implementation for designGG is available at http://gbic.biol.rug.nl/designGG. All software, including source code and documentation, is freely available. DesignGG allows users to intelligently select and allocate individuals to experimental units and conditions such as drug treatment. The user can maximize the power and resolution of detecting genetic, environmental and interaction effects in a genome-wide or local mode by giving more weight to genome regions of special interest, such as previously detected phenotypic quantitative trait loci. This will help to achieve high power and more accurate estimates of the effects of interesting factors, and thus yield a more reliable biological interpretation of data. DesignGG is applicable to linkage analysis of experimental crosses, e.g. recombinant inbred lines, as well as to association analysis of natural populations.
Kujur, Alice; Bajaj, Deepak; Upadhyaya, Hari D.; Das, Shouvik; Ranjan, Rajeev; Shree, Tanima; Saxena, Maneesha S.; Badoni, Saurabh; Kumar, Vinod; Tripathi, Shailesh; Gowda, C. L. L.; Sharma, Shivali; Singh, Sube; Tyagi, Akhilesh K.; Parida, Swarup K.
2015-01-01
The genome-wide discovery and high-throughput genotyping of SNPs in chickpea natural germplasm lines is indispensable to extrapolate their natural allelic diversity, domestication, and linkage disequilibrium (LD) patterns leading to the genetic enhancement of this vital legume crop. We discovered 44,844 high-quality SNPs by sequencing of 93 diverse cultivated desi, kabuli, and wild chickpea accessions using reference genome- and de novo-based GBS (genotyping-by-sequencing) assays that were physically mapped across eight chromosomes of desi and kabuli. Of these, 22,542 SNPs were structurally annotated in different coding and non-coding sequence components of genes. Genes with 3296 non-synonymous and 269 regulatory SNPs could functionally differentiate accessions based on their contrasting agronomic traits. A high experimental validation success rate (92%) and reproducibility (100%) along with strong sensitivity (93–96%) and specificity (99%) of GBS-based SNPs was observed. This infers the robustness of GBS as a high-throughput assay for rapid large-scale mining and genotyping of genome-wide SNPs in chickpea with sub-optimal use of resources. With 23,798 genome-wide SNPs, a relatively high intra-specific polymorphic potential (49.5%) and broader molecular diversity (13–89%)/functional allelic diversity (18–77%) was apparent among 93 chickpea accessions, suggesting their tremendous applicability in rapid selection of desirable diverse accessions/inter-specific hybrids in chickpea crossbred varietal improvement program. The genome-wide SNPs revealed complex admixed domestication pattern, extensive LD estimates (0.54–0.68) and extended LD decay (400–500 kb) in a structured population inclusive of 93 accessions. These findings reflect the utility of our identified SNPs for subsequent genome-wide association study (GWAS) and selective sweep-based domestication trait dissection analysis to identify potential genomic loci (gene-associated targets) specifically regulating important complex quantitative agronomic traits in chickpea. The numerous informative genome-wide SNPs, natural allelic diversity-led domestication pattern, and LD-based information generated in our study have got multidimensional applicability with respect to chickpea genomics-assisted breeding. PMID:25873920
Le Roch, K G; Chung, D-W D; Ponts, N
2012-01-01
The first draft of the human malaria parasite's genome was released in 2002. Since then, the malaria scientific community has witnessed a steady embrace of new and powerful functional genomic studies. Over the years, these approaches have slowly revolutionized malaria research and enabled the comprehensive, unbiased investigation of various aspects of the parasite's biology. These genome-wide analyses delivered a refined annotation of the parasite's genome, delivered a better knowledge of its RNA, proteins and metabolite derivatives, and fostered the discovery of new vaccine and drug targets. Despite the positive impacts of these genomic studies, most research and investment still focus on protein targets, drugs and vaccine candidates that were known before the publication of the parasite genome sequence. However, recent access to next-generation sequencing technologies, along with an increased number of genome-wide applications, is expanding the impact of the parasite genome on biomedical research, contributing to a paradigm shift in research activities that may possibly lead to new optimized diagnosis and treatments. This review provides an update of Plasmodium falciparum genome sequences and an overview of the rapid development of genomics and system biology applications that have an immense potential of creating powerful tools for a successful malaria eradication campaign. © 2011 Blackwell Publishing Ltd.
A MBD-seq protocol for large-scale methylome-wide studies with (very) low amounts of DNA.
Aberg, Karolina A; Chan, Robin F; Shabalin, Andrey A; Zhao, Min; Turecki, Gustavo; Staunstrup, Nicklas Heine; Starnawska, Anna; Mors, Ole; Xie, Lin Y; van den Oord, Edwin Jcg
2017-09-01
We recently showed that, after optimization, our methyl-CpG binding domain sequencing (MBD-seq) application approximates the methylome-wide coverage obtained with whole-genome bisulfite sequencing (WGB-seq), but at a cost that enables adequately powered large-scale association studies. A prior drawback of MBD-seq is the relatively large amount of genomic DNA (ideally >1 µg) required to obtain high-quality data. Biomaterials are typically expensive to collect, provide a finite amount of DNA, and may simply not yield sufficient starting material. The ability to use low amounts of DNA will increase the breadth and number of studies that can be conducted. Therefore, we further optimized the enrichment step. With this low starting material protocol, MBD-seq performed equally well, or better, than the protocol requiring ample starting material (>1 µg). Using only 15 ng of DNA as input, there is minimal loss in data quality, achieving 93% of the coverage of WGB-seq (with standard amounts of input DNA) at similar false/positive rates. Furthermore, across a large number of genomic features, the MBD-seq methylation profiles closely tracked those observed for WGB-seq with even slightly larger effect sizes. This suggests that MBD-seq provides similar information about the methylome and classifies methylation status somewhat more accurately. Performance decreases with <15 ng DNA as starting material but, even with as little as 5 ng, MBD-seq still achieves 90% of the coverage of WGB-seq with comparable genome-wide methylation profiles. Thus, the proposed protocol is an attractive option for adequately powered and cost-effective methylome-wide investigations using (very) low amounts of DNA.
Discovering time-lagged rules from microarray data using gene profile classifiers
2011-01-01
Background Gene regulatory networks have an essential role in every process of life. In this regard, the amount of genome-wide time series data is becoming increasingly available, providing the opportunity to discover the time-delayed gene regulatory networks that govern the majority of these molecular processes. Results This paper aims at reconstructing gene regulatory networks from multiple genome-wide microarray time series datasets. In this sense, a new model-free algorithm called GRNCOP2 (Gene Regulatory Network inference by Combinatorial OPtimization 2), which is a significant evolution of the GRNCOP algorithm, was developed using combinatorial optimization of gene profile classifiers. The method is capable of inferring potential time-delay relationships with any span of time between genes from various time series datasets given as input. The proposed algorithm was applied to time series data composed of twenty yeast genes that are highly relevant for the cell-cycle study, and the results were compared against several related approaches. The outcomes have shown that GRNCOP2 outperforms the contrasted methods in terms of the proposed metrics, and that the results are consistent with previous biological knowledge. Additionally, a genome-wide study on multiple publicly available time series data was performed. In this case, the experimentation has exhibited the soundness and scalability of the new method which inferred highly-related statistically-significant gene associations. Conclusions A novel method for inferring time-delayed gene regulatory networks from genome-wide time series datasets is proposed in this paper. The method was carefully validated with several publicly available data sets. The results have demonstrated that the algorithm constitutes a usable model-free approach capable of predicting meaningful relationships between genes, revealing the time-trends of gene regulation. PMID:21524308
Hoffmann, Thomas J; Zhan, Yiping; Kvale, Mark N; Hesselson, Stephanie E; Gollub, Jeremy; Iribarren, Carlos; Lu, Yontao; Mei, Gangwu; Purdy, Matthew M; Quesenberry, Charles; Rowell, Sarah; Shapero, Michael H; Smethurst, David; Somkin, Carol P; Van den Eeden, Stephen K; Walter, Larry; Webster, Teresa; Whitmer, Rachel A; Finn, Andrea; Schaefer, Catherine; Kwok, Pui-Yan; Risch, Neil
2011-12-01
Four custom Axiom genotyping arrays were designed for a genome-wide association (GWA) study of 100,000 participants from the Kaiser Permanente Research Program on Genes, Environment and Health. The array optimized for individuals of European race/ethnicity was previously described. Here we detail the development of three additional microarrays optimized for individuals of East Asian, African American, and Latino race/ethnicity. For these arrays, we decreased redundancy of high-performing SNPs to increase SNP capacity. The East Asian array was designed using greedy pairwise SNP selection. However, removing SNPs from the target set based on imputation coverage is more efficient than pairwise tagging. Therefore, we developed a novel hybrid SNP selection method for the African American and Latino arrays utilizing rounds of greedy pairwise SNP selection, followed by removal from the target set of SNPs covered by imputation. The arrays provide excellent genome-wide coverage and are valuable additions for large-scale GWA studies. Copyright © 2011 Elsevier Inc. All rights reserved.
Sunflower Hybrid Breeding: From Markers to Genomic Selection
Dimitrijevic, Aleksandra; Horn, Renate
2018-01-01
In sunflower, molecular markers for simple traits as, e.g., fertility restoration, high oleic acid content, herbicide tolerance or resistances to Plasmopara halstedii, Puccinia helianthi, or Orobanche cumana have been successfully used in marker-assisted breeding programs for years. However, agronomically important complex quantitative traits like yield, heterosis, drought tolerance, oil content or selection for disease resistance, e.g., against Sclerotinia sclerotiorum have been challenging and will require genome-wide approaches. Plant genetic resources for sunflower are being collected and conserved worldwide that represent valuable resources to study complex traits. Sunflower association panels provide the basis for genome-wide association studies, overcoming disadvantages of biparental populations. Advances in technologies and the availability of the sunflower genome sequence made novel approaches on the whole genome level possible. Genotype-by-sequencing, and whole genome sequencing based on next generation sequencing technologies facilitated the production of large amounts of SNP markers for high density maps as well as SNP arrays and allowed genome-wide association studies and genomic selection in sunflower. Genome wide or candidate gene based association studies have been performed for traits like branching, flowering time, resistance to Sclerotinia head and stalk rot. First steps in genomic selection with regard to hybrid performance and hybrid oil content have shown that genomic selection can successfully address complex quantitative traits in sunflower and will help to speed up sunflower breeding programs in the future. To make sunflower more competitive toward other oil crops higher levels of resistance against pathogens and better yield performance are required. In addition, optimizing plant architecture toward a more complex growth type for higher plant densities has the potential to considerably increase yields per hectare. Integrative approaches combining omic technologies (genomics, transcriptomics, proteomics, metabolomics and phenomics) using bioinformatic tools will facilitate the identification of target genes and markers for complex traits and will give a better insight into the mechanisms behind the traits. PMID:29387071
Plett, Jonathan M; Khachane, Amit; Ouassou, Malika; Sundberg, Björn; Kohler, Annegret; Martin, Francis
2014-04-01
The plant hormones ethylene, jasmonic acid and salicylic acid have interconnecting roles during the response of plant tissues to mutualistic and pathogenic symbionts. We used morphological studies of transgenic- or hormone-treated Populus roots as well as whole-genome oligoarrays to examine how these hormones affect root colonization by the mutualistic ectomycorrhizal fungus Laccaria bicolor S238N. We found that genes regulated by ethylene, jasmonic acid and salicylic acid were regulated in the late stages of the interaction between L. bicolor and poplar. Both ethylene and jasmonic acid treatments were found to impede fungal colonization of roots, and this effect was correlated to an increase in the expression of certain transcription factors (e.g. ETHYLENE RESPONSE FACTOR1) and a decrease in the expression of genes associated with microbial perception and cell wall modification. Further, we found that ethylene and jasmonic acid showed extensive transcriptional cross-talk, cross-talk that was opposed by salicylic acid signaling. We conclude that ethylene and jasmonic acid pathways are induced late in the colonization of root tissues in order to limit fungal growth within roots. This induction is probably an adaptive response by the plant such that its growth and vigor are not compromised by the fungus. © 2013 The Authors New Phytologist © 2013 New Phytologist Trust.
Optimized guide RNA structure for genome editing via Cas9
Xu, Jianyong; Lian, Wei; Jia, Yuning; Li, Lingyun; Huang, Zhong
2017-01-01
The genome editing tool Cas9-gRNA (guide RNA) has been successfully applied in different cell types and organisms with high efficiency. However, more efforts need to be made to enhance both efficiency and specificity. In the current study, we optimized the guide RNA structure of Streptococcus pyogenes CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats)/Cas (CRISPR-associated) system to improve its genome editing efficiency. Comparing with the original functional structure of guide RNA, which is composed of crRNA and tracrRNA, the widely used chimeric gRNA has shorter crRNA and tracrRNA sequence. The deleted RNA sequence could form extra loop structure, which might enhance the stability of the guide RNA structure and subsequently the genome editing efficiency. Thus the genome editing efficiency of different forms of guide RNA was tested. And we found that the chimeric structure of gRNA with original full length of crRNA and tracrRNA showed higher genome editing efficiency than the conventional chimeric structure or other types of gRNA we tested. Therefore our data here uncovered the new type of gRNA structure with higher genome editing efficiency. PMID:29212218
Comparative genomics meets topology: a novel view on genome median and halving problems.
Alexeev, Nikita; Avdeyev, Pavel; Alekseyev, Max A
2016-11-11
Genome median and genome halving are combinatorial optimization problems that aim at reconstruction of ancestral genomes by minimizing the number of evolutionary events between them and genomes of the extant species. While these problems have been widely studied in past decades, their solutions are often either not efficient or not biologically adequate. These shortcomings have been recently addressed by restricting the problems solution space. We show that the restricted variants of genome median and halving problems are, in fact, closely related. We demonstrate that these problems have a neat topological interpretation in terms of embedded graphs and polygon gluings. We illustrate how such interpretation can lead to solutions to these problems in particular cases. This study provides an unexpected link between comparative genomics and topology, and demonstrates advantages of solving genome median and halving problems within the topological framework.
Zhang, Qian; Jun, Se -Ran; Leuze, Michael; ...
2017-01-19
The development of rapid, economical genome sequencing has shed new light on the classification of viruses. As of October 2016, the National Center for Biotechnology Information (NCBI) database contained >2 million viral genome sequences and a reference set of ~4000 viral genome sequences that cover a wide range of known viral families. Whole-genome sequences can be used to improve viral classification and provide insight into the viral tree of life . However, due to the lack of evolutionary conservation amongst diverse viruses, it is not feasible to build a viral tree of life using traditional phylogenetic methods based on conservedmore » proteins. In this study, we used an alignment-free method that uses k-mers as genomic features for a large-scale comparison of complete viral genomes available in RefSeq. To determine the optimal feature length, k (an essential step in constructing a meaningful dendrogram), we designed a comprehensive strategy that combines three approaches: (1) cumulative relative entropy, (2) average number of common features among genomes, and (3) the Shannon diversity index. This strategy was used to determine k for all 3,905 complete viral genomes in RefSeq. Lastly, the resulting dendrogram shows consistency with the viral taxonomy of the ICTV and the Baltimore classification of viruses.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Qian; Jun, Se -Ran; Leuze, Michael
The development of rapid, economical genome sequencing has shed new light on the classification of viruses. As of October 2016, the National Center for Biotechnology Information (NCBI) database contained >2 million viral genome sequences and a reference set of ~4000 viral genome sequences that cover a wide range of known viral families. Whole-genome sequences can be used to improve viral classification and provide insight into the viral tree of life . However, due to the lack of evolutionary conservation amongst diverse viruses, it is not feasible to build a viral tree of life using traditional phylogenetic methods based on conservedmore » proteins. In this study, we used an alignment-free method that uses k-mers as genomic features for a large-scale comparison of complete viral genomes available in RefSeq. To determine the optimal feature length, k (an essential step in constructing a meaningful dendrogram), we designed a comprehensive strategy that combines three approaches: (1) cumulative relative entropy, (2) average number of common features among genomes, and (3) the Shannon diversity index. This strategy was used to determine k for all 3,905 complete viral genomes in RefSeq. Lastly, the resulting dendrogram shows consistency with the viral taxonomy of the ICTV and the Baltimore classification of viruses.« less
Zhang, Qian; Jun, Se-Ran; Leuze, Michael; Ussery, David; Nookaew, Intawat
2017-01-01
The development of rapid, economical genome sequencing has shed new light on the classification of viruses. As of October 2016, the National Center for Biotechnology Information (NCBI) database contained >2 million viral genome sequences and a reference set of ~4000 viral genome sequences that cover a wide range of known viral families. Whole-genome sequences can be used to improve viral classification and provide insight into the viral “tree of life”. However, due to the lack of evolutionary conservation amongst diverse viruses, it is not feasible to build a viral tree of life using traditional phylogenetic methods based on conserved proteins. In this study, we used an alignment-free method that uses k-mers as genomic features for a large-scale comparison of complete viral genomes available in RefSeq. To determine the optimal feature length, k (an essential step in constructing a meaningful dendrogram), we designed a comprehensive strategy that combines three approaches: (1) cumulative relative entropy, (2) average number of common features among genomes, and (3) the Shannon diversity index. This strategy was used to determine k for all 3,905 complete viral genomes in RefSeq. The resulting dendrogram shows consistency with the viral taxonomy of the ICTV and the Baltimore classification of viruses. PMID:28102365
Population Stratification in the Context of Diverse Epidemiologic Surveys Sans Genome-Wide Data
Oetjens, Matthew T.; Brown-Gentry, Kristin; Goodloe, Robert; Dilks, Holli H.; Crawford, Dana C.
2016-01-01
Population stratification or confounding by genetic ancestry is a potential cause of false associations in genetic association studies. Estimation of and adjustment for genetic ancestry has become common practice thanks in part to the availability of ancestry informative markers on genome-wide association study (GWAS) arrays. While array data is now widespread, these data are not ubiquitous as several large epidemiologic and clinic-based studies lack genome-wide data. One such large epidemiologic-based study lacking genome-wide data accessible to investigators is the National Health and Nutrition Examination Surveys (NHANES), population-based cross-sectional surveys of Americans linked to demographic, health, and lifestyle data conducted by the Centers for Disease Control and Prevention. DNA samples (n = 14,998) were extracted from biospecimens from consented NHANES participants between 1991–1994 (NHANES III, phase 2) and 1999–2002 and represent three major self-identified racial/ethnic groups: non-Hispanic whites (n = 6,634), non-Hispanic blacks (n = 3,458), and Mexican Americans (n = 3,950). We as the Epidemiologic Architecture for Genes Linked to Environment study genotyped candidate gene and GWAS-identified index variants in NHANES as part of the larger Population Architecture using Genomics and Epidemiology I study for collaborative genetic association studies. To enable basic quality control such as estimation of genetic ancestry to control for population stratification in NHANES san genome-wide data, we outline here strategies that use limited genetic data to identify the markers optimal for characterizing genetic ancestry. From among 411 and 295 autosomal SNPs available in NHANES III and NHANES 1999–2002, we demonstrate that markers with ancestry information can be identified to estimate global ancestry. Despite limited resolution, global genetic ancestry is highly correlated with self-identified race for the majority of participants, although less so for ethnicity. Overall, the strategies outlined here for a large epidemiologic study can be applied to other datasets accessible for genotype–phenotype studies but are sans genome-wide data. PMID:27200085
Bohra, Abhishek; Singh, Narendra P
2015-08-01
Unprecedented developments in legume genomics over the last decade have resulted in the acquisition of a wide range of modern genomic resources to underpin genetic improvement of grain legumes. The genome enabled insights direct investigators in various ways that primarily include unearthing novel structural variations, retrieving the lost genetic diversity, introducing novel/exotic alleles from wider gene pools, finely resolving the complex quantitative traits and so forth. To this end, ready availability of cost-efficient and high-density genotyping assays allows genome wide prediction to be increasingly recognized as the key selection criterion in crop breeding. Further, the high-dimensional measurements of agronomically significant phenotypes obtained by using new-generation screening techniques will empower reference based resequencing as well as allele mining and trait mapping methods to comprehensively associate genome diversity with the phenome scale variation. Besides stimulating the forward genetic systems, accessibility to precisely delineated genomic segments reveals novel candidates for reverse genetic techniques like targeted genome editing. The shifting paradigm in plant genomics in turn necessitates optimization of crop breeding strategies to enable the most efficient integration of advanced omics knowledge and tools. We anticipate that the crop improvement schemes will be bolstered remarkably with rational deployment of these genome-guided approaches, ultimately resulting in expanded plant breeding capacities and improved crop performance.
Automated multiplex genome-scale engineering in yeast
Si, Tong; Chao, Ran; Min, Yuhao; Wu, Yuying; Ren, Wen; Zhao, Huimin
2017-01-01
Genome-scale engineering is indispensable in understanding and engineering microorganisms, but the current tools are mainly limited to bacterial systems. Here we report an automated platform for multiplex genome-scale engineering in Saccharomyces cerevisiae, an important eukaryotic model and widely used microbial cell factory. Standardized genetic parts encoding overexpression and knockdown mutations of >90% yeast genes are created in a single step from a full-length cDNA library. With the aid of CRISPR-Cas, these genetic parts are iteratively integrated into the repetitive genomic sequences in a modular manner using robotic automation. This system allows functional mapping and multiplex optimization on a genome scale for diverse phenotypes including cellulase expression, isobutanol production, glycerol utilization and acetic acid tolerance, and may greatly accelerate future genome-scale engineering endeavours in yeast. PMID:28469255
Comparing genomes with rearrangements and segmental duplications.
Shao, Mingfu; Moret, Bernard M E
2015-06-15
Large-scale evolutionary events such as genomic rearrange.ments and segmental duplications form an important part of the evolution of genomes and are widely studied from both biological and computational perspectives. A basic computational problem is to infer these events in the evolutionary history for given modern genomes, a task for which many algorithms have been proposed under various constraints. Algorithms that can handle both rearrangements and content-modifying events such as duplications and losses remain few and limited in their applicability. We study the comparison of two genomes under a model including general rearrangements (through double-cut-and-join) and segmental duplications. We formulate the comparison as an optimization problem and describe an exact algorithm to solve it by using an integer linear program. We also devise a sufficient condition and an efficient algorithm to identify optimal substructures, which can simplify the problem while preserving optimality. Using the optimal substructures with the integer linear program (ILP) formulation yields a practical and exact algorithm to solve the problem. We then apply our algorithm to assign in-paralogs and orthologs (a necessary step in handling duplications) and compare its performance with that of the state-of-the-art method MSOAR, using both simulations and real data. On simulated datasets, our method outperforms MSOAR by a significant margin, and on five well-annotated species, MSOAR achieves high accuracy, yet our method performs slightly better on each of the 10 pairwise comparisons. http://lcbb.epfl.ch/softwares/coser. © The Author 2015. Published by Oxford University Press.
Fernández, Jesús; Toro, Miguel Á; Sonesson, Anna K; Villanueva, Beatriz
2014-01-01
The success of an aquaculture breeding program critically depends on the way in which the base population of breeders is constructed since all the genetic variability for the traits included originally in the breeding goal as well as those to be included in the future is contained in the initial founders. Traditionally, base populations were created from a number of wild strains by sampling equal numbers from each strain. However, for some aquaculture species improved strains are already available and, therefore, mean phenotypic values for economically important traits can be used as a criterion to optimize the sampling when creating base populations. Also, the increasing availability of genome-wide genotype information in aquaculture species could help to refine the estimation of relationships within and between candidate strains and, thus, to optimize the percentage of individuals to be sampled from each strain. This study explores the advantages of using phenotypic and genome-wide information when constructing base populations for aquaculture breeding programs in terms of initial and subsequent trait performance and genetic diversity level. Results show that a compromise solution between diversity and performance can be found when creating base populations. Up to 6% higher levels of phenotypic performance can be achieved at the same level of global diversity in the base population by optimizing the selection of breeders instead of sampling equal numbers from each strain. The higher performance observed in the base population persisted during 10 generations of phenotypic selection applied in the subsequent breeding program.
Multi-instance multi-label distance metric learning for genome-wide protein function prediction.
Xu, Yonghui; Min, Huaqing; Song, Hengjie; Wu, Qingyao
2016-08-01
Multi-instance multi-label (MIML) learning has been proven to be effective for the genome-wide protein function prediction problems where each training example is associated with not only multiple instances but also multiple class labels. To find an appropriate MIML learning method for genome-wide protein function prediction, many studies in the literature attempted to optimize objective functions in which dissimilarity between instances is measured using the Euclidean distance. But in many real applications, Euclidean distance may be unable to capture the intrinsic similarity/dissimilarity in feature space and label space. Unlike other previous approaches, in this paper, we propose to learn a multi-instance multi-label distance metric learning framework (MIMLDML) for genome-wide protein function prediction. Specifically, we learn a Mahalanobis distance to preserve and utilize the intrinsic geometric information of both feature space and label space for MIML learning. In addition, we try to deal with the sparsely labeled data by giving weight to the labeled data. Extensive experiments on seven real-world organisms covering the biological three-domain system (i.e., archaea, bacteria, and eukaryote; Woese et al., 1990) show that the MIMLDML algorithm is superior to most state-of-the-art MIML learning algorithms. Copyright © 2016 Elsevier Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kalamorz, Falk; Keis, Stefanie; Stanton, Jo-Ann
The genes and molecular machines that allow for a thermoalkaliphilic lifestyle have not been defined. To address this goal, we report on the improved high-quality draft genome sequence of Caldalkalibacillus thermarum strain TA2.A1, an obligately aerobic bacterium that grows optimally at pH 9.5 and 65 to 70 C on a wide variety of carbon and energy sources.
De novo DNA methylation during monkey pre-implantation embryogenesis.
Gao, Fei; Niu, Yuyu; Sun, Yi Eve; Lu, Hanlin; Chen, Yongchang; Li, Siguang; Kang, Yu; Luo, Yuping; Si, Chenyang; Yu, Juehua; Li, Chang; Sun, Nianqin; Si, Wei; Wang, Hong; Ji, Weizhi; Tan, Tao
2017-04-01
Critical epigenetic regulation of primate embryogenesis entails DNA methylome changes. Here we report genome-wide composition, patterning, and stage-specific dynamics of DNA methylation in pre-implantation rhesus monkey embryos as well as male and female gametes studied using an optimized tagmentation-based whole-genome bisulfite sequencing method. We show that upon fertilization, both paternal and maternal genomes undergo active DNA demethylation, and genome-wide de novo DNA methylation is also initiated in the same period. By the 8-cell stage, remethylation becomes more pronounced than demethylation, resulting in an increase in global DNA methylation. Promoters of genes associated with oxidative phosphorylation are preferentially remethylated at the 8-cell stage, suggesting that this mode of energy metabolism may not be favored. Unlike in rodents, X chromosome inactivation is not observed during monkey pre-implantation development. Our study provides the first comprehensive illustration of the 'wax and wane' phases of DNA methylation dynamics. Most importantly, our DNA methyltransferase loss-of-function analysis indicates that DNA methylation influences early monkey embryogenesis.
De novo DNA methylation during monkey pre-implantation embryogenesis
Gao, Fei; Niu, Yuyu; Sun, Yi Eve; Lu, Hanlin; Chen, Yongchang; Li, Siguang; Kang, Yu; Luo, Yuping; Si, Chenyang; Yu, Juehua; Li, Chang; Sun, Nianqin; Si, Wei; Wang, Hong; Ji, Weizhi; Tan, Tao
2017-01-01
Critical epigenetic regulation of primate embryogenesis entails DNA methylome changes. Here we report genome-wide composition, patterning, and stage-specific dynamics of DNA methylation in pre-implantation rhesus monkey embryos as well as male and female gametes studied using an optimized tagmentation-based whole-genome bisulfite sequencing method. We show that upon fertilization, both paternal and maternal genomes undergo active DNA demethylation, and genome-wide de novo DNA methylation is also initiated in the same period. By the 8-cell stage, remethylation becomes more pronounced than demethylation, resulting in an increase in global DNA methylation. Promoters of genes associated with oxidative phosphorylation are preferentially remethylated at the 8-cell stage, suggesting that this mode of energy metabolism may not be favored. Unlike in rodents, X chromosome inactivation is not observed during monkey pre-implantation development. Our study provides the first comprehensive illustration of the 'wax and wane' phases of DNA methylation dynamics. Most importantly, our DNA methyltransferase loss-of-function analysis indicates that DNA methylation influences early monkey embryogenesis. PMID:28233770
Optimization of cDNA-AFLP experiments using genomic sequence data.
Kivioja, Teemu; Arvas, Mikko; Saloheimo, Markku; Penttilä, Merja; Ukkonen, Esko
2005-06-01
cDNA amplified fragment length polymorphism (cDNA-AFLP) is one of the few genome-wide level expression profiling methods capable of finding genes that have not yet been cloned or even predicted from sequence but have interesting expression patterns under the studied conditions. In cDNA-AFLP, a complex cDNA mixture is divided into small subsets using restriction enzymes and selective PCR. A large cDNA-AFLP experiment can require a substantial amount of resources, such as hundreds of PCR amplifications and gel electrophoresis runs, followed by manual cutting of a large number of bands from the gels. Our aim was to test whether this workload can be reduced by rational design of the experiment. We used the available genomic sequence information to optimize cDNA-AFLP experiments beforehand so that as many transcripts as possible could be profiled with a given amount of resources. Optimization of the selection of both restriction enzymes and selective primers for cDNA-AFLP experiments has not been performed previously. The in silico tests performed suggest that substantial amounts of resources can be saved by the optimization of cDNA-AFLP experiments.
A survey about methods dedicated to epistasis detection.
Niel, Clément; Sinoquet, Christine; Dina, Christian; Rocheleau, Ghislain
2015-01-01
During the past decade, findings of genome-wide association studies (GWAS) improved our knowledge and understanding of disease genetics. To date, thousands of SNPs have been associated with diseases and other complex traits. Statistical analysis typically looks for association between a phenotype and a SNP taken individually via single-locus tests. However, geneticists admit this is an oversimplified approach to tackle the complexity of underlying biological mechanisms. Interaction between SNPs, namely epistasis, must be considered. Unfortunately, epistasis detection gives rise to analytic challenges since analyzing every SNP combination is at present impractical at a genome-wide scale. In this review, we will present the main strategies recently proposed to detect epistatic interactions, along with their operating principle. Some of these methods are exhaustive, such as multifactor dimensionality reduction, likelihood ratio-based tests or receiver operating characteristic curve analysis; some are non-exhaustive, such as machine learning techniques (random forests, Bayesian networks) or combinatorial optimization approaches (ant colony optimization, computational evolution system).
Differential DNA Methylation Analysis without a Reference Genome.
Klughammer, Johanna; Datlinger, Paul; Printz, Dieter; Sheffield, Nathan C; Farlik, Matthias; Hadler, Johanna; Fritsch, Gerhard; Bock, Christoph
2015-12-22
Genome-wide DNA methylation mapping uncovers epigenetic changes associated with animal development, environmental adaptation, and species evolution. To address the lack of high-throughput methods for DNA methylation analysis in non-model organisms, we developed an integrated approach for studying DNA methylation differences independent of a reference genome. Experimentally, our method relies on an optimized 96-well protocol for reduced representation bisulfite sequencing (RRBS), which we have validated in nine species (human, mouse, rat, cow, dog, chicken, carp, sea bass, and zebrafish). Bioinformatically, we developed the RefFreeDMA software to deduce ad hoc genomes directly from RRBS reads and to pinpoint differentially methylated regions between samples or groups of individuals (http://RefFreeDMA.computational-epigenetics.org). The identified regions are interpreted using motif enrichment analysis and/or cross-mapping to annotated genomes. We validated our method by reference-free analysis of cell-type-specific DNA methylation in the blood of human, cow, and carp. In summary, we present a cost-effective method for epigenome analysis in ecology and evolution, which enables epigenome-wide association studies in natural populations and species without a reference genome. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
A statistical approach for inferring the 3D structure of the genome.
Varoquaux, Nelle; Ay, Ferhat; Noble, William Stafford; Vert, Jean-Philippe
2014-06-15
Recent technological advances allow the measurement, in a single Hi-C experiment, of the frequencies of physical contacts among pairs of genomic loci at a genome-wide scale. The next challenge is to infer, from the resulting DNA-DNA contact maps, accurate 3D models of how chromosomes fold and fit into the nucleus. Many existing inference methods rely on multidimensional scaling (MDS), in which the pairwise distances of the inferred model are optimized to resemble pairwise distances derived directly from the contact counts. These approaches, however, often optimize a heuristic objective function and require strong assumptions about the biophysics of DNA to transform interaction frequencies to spatial distance, and thereby may lead to incorrect structure reconstruction. We propose a novel approach to infer a consensus 3D structure of a genome from Hi-C data. The method incorporates a statistical model of the contact counts, assuming that the counts between two loci follow a Poisson distribution whose intensity decreases with the physical distances between the loci. The method can automatically adjust the transfer function relating the spatial distance to the Poisson intensity and infer a genome structure that best explains the observed data. We compare two variants of our Poisson method, with or without optimization of the transfer function, to four different MDS-based algorithms-two metric MDS methods using different stress functions, a non-metric version of MDS and ChromSDE, a recently described, advanced MDS method-on a wide range of simulated datasets. We demonstrate that the Poisson models reconstruct better structures than all MDS-based methods, particularly at low coverage and high resolution, and we highlight the importance of optimizing the transfer function. On publicly available Hi-C data from mouse embryonic stem cells, we show that the Poisson methods lead to more reproducible structures than MDS-based methods when we use data generated using different restriction enzymes, and when we reconstruct structures at different resolutions. A Python implementation of the proposed method is available at http://cbio.ensmp.fr/pastis. © The Author 2014. Published by Oxford University Press.
Sexton-Oates, Alexandra; Carmody, Jake; Ekinci, Elif I.; Dwyer, Karen M.; Saffery, Richard
2018-01-01
Aim To characterise the genomic DNA (gDNA) yield from urine and quality of derived methylation data generated from the widely used Illuminia Infinium MethylationEPIC (HM850K) platform and compare this with buffy coat samples. Background DNA methylation is the most widely studied epigenetic mark and variations in DNA methylation profile have been implicated in diabetes which affects approximately 415 million people worldwide. Methods QIAamp Viral RNA Mini Kit and QIAamp DNA micro kit were used to extract DNA from frozen and fresh urine samples as well as increasing volumes of fresh urine. Matched buffy coats to the frozen urine were also obtained and DNA was extracted from the buffy coats using the QIAamp DNA Mini Kit. Genomic DNA of greater concentration than 20μg/ml were used for methylation analysis using the HM850K array. Results Irrespective of extraction technique or the use of fresh versus frozen urine samples, limited genomic DNA was obtained using a starting sample volume of 5ml (0–0.86μg/mL). In order to optimize the yield, we increased starting volumes to 50ml fresh urine, which yielded only 0–9.66μg/mL A different kit, QIAamp DNA Micro Kit, was trialled in six fresh urine samples and ten frozen urine samples with inadequate DNA yields from 0–17.7μg/mL and 0–1.6μg/mL respectively. Sufficient genomic DNA was obtained from only 4 of the initial 41 frozen urine samples (10%) for DNA methylation profiling. In comparison, all four buffy coat samples (100%) provided sufficient genomic DNA. Conclusion High quality data can be obtained provided a sufficient yield of genomic DNA is isolated. Despite optimizing various extraction methodologies, the modest amount of genomic DNA derived from urine, may limit the generalisability of this approach for the identification of DNA methylation biomarkers of chronic diabetic kidney disease. PMID:29462136
Lecamwasam, Ashani; Sexton-Oates, Alexandra; Carmody, Jake; Ekinci, Elif I; Dwyer, Karen M; Saffery, Richard
2018-01-01
To characterise the genomic DNA (gDNA) yield from urine and quality of derived methylation data generated from the widely used Illuminia Infinium MethylationEPIC (HM850K) platform and compare this with buffy coat samples. DNA methylation is the most widely studied epigenetic mark and variations in DNA methylation profile have been implicated in diabetes which affects approximately 415 million people worldwide. QIAamp Viral RNA Mini Kit and QIAamp DNA micro kit were used to extract DNA from frozen and fresh urine samples as well as increasing volumes of fresh urine. Matched buffy coats to the frozen urine were also obtained and DNA was extracted from the buffy coats using the QIAamp DNA Mini Kit. Genomic DNA of greater concentration than 20μg/ml were used for methylation analysis using the HM850K array. Irrespective of extraction technique or the use of fresh versus frozen urine samples, limited genomic DNA was obtained using a starting sample volume of 5ml (0-0.86μg/mL). In order to optimize the yield, we increased starting volumes to 50ml fresh urine, which yielded only 0-9.66μg/mL A different kit, QIAamp DNA Micro Kit, was trialled in six fresh urine samples and ten frozen urine samples with inadequate DNA yields from 0-17.7μg/mL and 0-1.6μg/mL respectively. Sufficient genomic DNA was obtained from only 4 of the initial 41 frozen urine samples (10%) for DNA methylation profiling. In comparison, all four buffy coat samples (100%) provided sufficient genomic DNA. High quality data can be obtained provided a sufficient yield of genomic DNA is isolated. Despite optimizing various extraction methodologies, the modest amount of genomic DNA derived from urine, may limit the generalisability of this approach for the identification of DNA methylation biomarkers of chronic diabetic kidney disease.
Genetic screens and functional genomics using CRISPR/Cas9 technology.
Hartenian, Ella; Doench, John G
2015-04-01
Functional genomics attempts to understand the genome by perturbing the flow of information from DNA to RNA to protein, in order to learn how gene dysfunction leads to disease. CRISPR/Cas9 technology is the newest tool in the geneticist's toolbox, allowing researchers to edit DNA with unprecedented ease, speed and accuracy, and representing a novel means to perform genome-wide genetic screens to discover gene function. In this review, we first summarize the discovery and characterization of CRISPR/Cas9, and then compare it to other genome engineering technologies. We discuss its initial use in screening applications, with a focus on optimizing on-target activity and minimizing off-target effects. Finally, we comment on future challenges and opportunities afforded by this technology. © 2015 FEBS.
Structure-seq2: sensitive and accurate genome-wide profiling of RNA structure in vivo
Ritchey, Laura E.; Su, Zhao; Tang, Yin; Tack, David C.
2017-01-01
Abstract RNA serves many functions in biology such as splicing, temperature sensing, and innate immunity. These functions are often determined by the structure of RNA. There is thus a pressing need to understand RNA structure and how it changes during diverse biological processes both in vivo and genome-wide. Here, we present Structure-seq2, which provides nucleotide-resolution RNA structural information in vivo and genome-wide. This optimized version of our original Structure-seq method increases sensitivity by at least 4-fold and improves data quality by minimizing formation of a deleterious by-product, reducing ligation bias, and improving read coverage. We also present a variation of Structure-seq2 in which a biotinylated nucleotide is incorporated during reverse transcription, which greatly facilitates the protocol by eliminating two PAGE purification steps. We benchmark Structure-seq2 on both mRNA and rRNA structure in rice (Oryza sativa). We demonstrate that Structure-seq2 can lead to new biological insights. Our Structure-seq2 datasets uncover hidden breaks in chloroplast rRNA and identify a previously unreported N1-methyladenosine (m1A) in a nuclear-encoded Oryza sativa rRNA. Overall, Structure-seq2 is a rapid, sensitive, and unbiased method to probe RNA in vivo and genome-wide that facilitates new insights into RNA biology. PMID:28637286
USDA-ARS?s Scientific Manuscript database
Next-generation sequencing technology such as genotyping-by-sequencing (GBS) made low-cost, but often low-coverage, whole-genome sequencing widely available. Extensive inbreeding in crop plants provides an untapped, high quality source of phased haplotypes for imputing missing genotypes. We introduc...
2014-01-01
Background Genome-wide microarrays have been useful for predicting chemical-genetic interactions at the gene level. However, interpreting genome-wide microarray results can be overwhelming due to the vast output of gene expression data combined with off-target transcriptional responses many times induced by a drug treatment. This study demonstrates how experimental and computational methods can interact with each other, to arrive at more accurate predictions of drug-induced perturbations. We present a two-stage strategy that links microarray experimental testing and network training conditions to predict gene perturbations for a drug with a known mechanism of action in a well-studied organism. Results S. cerevisiae cells were treated with the antifungal, fluconazole, and expression profiling was conducted under different biological conditions using Affymetrix genome-wide microarrays. Transcripts were filtered with a formal network-based method, sparse simultaneous equation models and Lasso regression (SSEM-Lasso), under different network training conditions. Gene expression results were evaluated using both gene set and single gene target analyses, and the drug’s transcriptional effects were narrowed first by pathway and then by individual genes. Variables included: (i) Testing conditions – exposure time and concentration and (ii) Network training conditions – training compendium modifications. Two analyses of SSEM-Lasso output – gene set and single gene – were conducted to gain a better understanding of how SSEM-Lasso predicts perturbation targets. Conclusions This study demonstrates that genome-wide microarrays can be optimized using a two-stage strategy for a more in-depth understanding of how a cell manifests biological reactions to a drug treatment at the transcription level. Additionally, a more detailed understanding of how the statistical model, SSEM-Lasso, propagates perturbations through a network of gene regulatory interactions is achieved. PMID:24444313
Genome-Wide Tuning of Protein Expression Levels to Rapidly Engineer Microbial Traits.
Freed, Emily F; Winkler, James D; Weiss, Sophie J; Garst, Andrew D; Mutalik, Vivek K; Arkin, Adam P; Knight, Rob; Gill, Ryan T
2015-11-20
The reliable engineering of biological systems requires quantitative mapping of predictable and context-independent expression over a broad range of protein expression levels. However, current techniques for modifying expression levels are cumbersome and are not amenable to high-throughput approaches. Here we present major improvements to current techniques through the design and construction of E. coli genome-wide libraries using synthetic DNA cassettes that can tune expression over a ∼10(4) range. The cassettes also contain molecular barcodes that are optimized for next-generation sequencing, enabling rapid and quantitative tracking of alleles that have the highest fitness advantage. We show these libraries can be used to determine which genes and expression levels confer greater fitness to E. coli under different growth conditions.
Optimized gene editing technology for Drosophila melanogaster using germ line-specific Cas9.
Ren, Xingjie; Sun, Jin; Housden, Benjamin E; Hu, Yanhui; Roesel, Charles; Lin, Shuailiang; Liu, Lu-Ping; Yang, Zhihao; Mao, Decai; Sun, Lingzhu; Wu, Qujie; Ji, Jun-Yuan; Xi, Jianzhong; Mohr, Stephanie E; Xu, Jiang; Perrimon, Norbert; Ni, Jian-Quan
2013-11-19
The ability to engineer genomes in a specific, systematic, and cost-effective way is critical for functional genomic studies. Recent advances using the CRISPR-associated single-guide RNA system (Cas9/sgRNA) illustrate the potential of this simple system for genome engineering in a number of organisms. Here we report an effective and inexpensive method for genome DNA editing in Drosophila melanogaster whereby plasmid DNAs encoding short sgRNAs under the control of the U6b promoter are injected into transgenic flies in which Cas9 is specifically expressed in the germ line via the nanos promoter. We evaluate the off-targets associated with the method and establish a Web-based resource, along with a searchable, genome-wide database of predicted sgRNAs appropriate for genome engineering in flies. Finally, we discuss the advantages of our method in comparison with other recently published approaches.
Optimization of multi-environment trials for genomic selection based on crop models.
Rincent, R; Kuhn, E; Monod, H; Oury, F-X; Rousset, M; Allard, V; Le Gouis, J
2017-08-01
We propose a statistical criterion to optimize multi-environment trials to predict genotype × environment interactions more efficiently, by combining crop growth models and genomic selection models. Genotype × environment interactions (GEI) are common in plant multi-environment trials (METs). In this context, models developed for genomic selection (GS) that refers to the use of genome-wide information for predicting breeding values of selection candidates need to be adapted. One promising way to increase prediction accuracy in various environments is to combine ecophysiological and genetic modelling thanks to crop growth models (CGM) incorporating genetic parameters. The efficiency of this approach relies on the quality of the parameter estimates, which depends on the environments composing this MET used for calibration. The objective of this study was to determine a method to optimize the set of environments composing the MET for estimating genetic parameters in this context. A criterion called OptiMET was defined to this aim, and was evaluated on simulated and real data, with the example of wheat phenology. The MET defined with OptiMET allowed estimating the genetic parameters with lower error, leading to higher QTL detection power and higher prediction accuracies. MET defined with OptiMET was on average more efficient than random MET composed of twice as many environments, in terms of quality of the parameter estimates. OptiMET is thus a valuable tool to determine optimal experimental conditions to best exploit MET and the phenotyping tools that are currently developed.
Doekes, Harmen P; Veerkamp, Roel F; Bijma, Piter; Hiemstra, Sipke J; Windig, Jack J
2018-04-11
In recent decades, Holstein-Friesian (HF) selection schemes have undergone profound changes, including the introduction of optimal contribution selection (OCS; around 2000), a major shift in breeding goal composition (around 2000) and the implementation of genomic selection (GS; around 2010). These changes are expected to have influenced genetic diversity trends. Our aim was to evaluate genome-wide and region-specific diversity in HF artificial insemination (AI) bulls in the Dutch-Flemish breeding program from 1986 to 2015. Pedigree and genotype data (~ 75.5 k) of 6280 AI-bulls were used to estimate rates of genome-wide inbreeding and kinship and corresponding effective population sizes. Region-specific inbreeding trends were evaluated using regions of homozygosity (ROH). Changes in observed allele frequencies were compared to those expected under pure drift to identify putative regions under selection. We also investigated the direction of changes in allele frequency over time. Effective population size estimates for the 1986-2015 period ranged from 69 to 102. Two major breakpoints were observed in genome-wide inbreeding and kinship trends. Around 2000, inbreeding and kinship levels temporarily dropped. From 2010 onwards, they steeply increased, with pedigree-based, ROH-based and marker-based inbreeding rates as high as 1.8, 2.1 and 2.8% per generation, respectively. Accumulation of inbreeding varied substantially across the genome. A considerable fraction of markers showed changes in allele frequency that were greater than expected under pure drift. Putative selected regions harboured many quantitative trait loci (QTL) associated to a wide range of traits. In consecutive 5-year periods, allele frequencies changed more often in the same direction than in opposite directions, except when comparing the 1996-2000 and 2001-2005 periods. Genome-wide and region-specific diversity trends reflect major changes in the Dutch-Flemish HF breeding program. Introduction of OCS and the shift in breeding goal were followed by a drop in inbreeding and kinship and a shift in the direction of changes in allele frequency. After introduction of GS, rates of inbreeding and kinship increased substantially while allele frequencies continued to change in the same direction as before GS. These results provide insight in the effect of breeding practices on genomic diversity and emphasize the need for efficient management of genetic diversity in GS schemes.
Next-generation genome-scale models for metabolic engineering.
King, Zachary A; Lloyd, Colton J; Feist, Adam M; Palsson, Bernhard O
2015-12-01
Constraint-based reconstruction and analysis (COBRA) methods have become widely used tools for metabolic engineering in both academic and industrial laboratories. By employing a genome-scale in silico representation of the metabolic network of a host organism, COBRA methods can be used to predict optimal genetic modifications that improve the rate and yield of chemical production. A new generation of COBRA models and methods is now being developed--encompassing many biological processes and simulation strategies-and next-generation models enable new types of predictions. Here, three key examples of applying COBRA methods to strain optimization are presented and discussed. Then, an outlook is provided on the next generation of COBRA models and the new types of predictions they will enable for systems metabolic engineering. Copyright © 2014 Elsevier Ltd. All rights reserved.
Rozman, Vita; Kunej, Tanja
2018-05-10
Harnessing the genomics big data requires innovation in how we extract and interpret biologically relevant variants. Currently, there is no established catalog of prioritized missense variants associated with deleterious protein function phenotypes. We report in this study, to the best of our knowledge, the first genome-wide prioritization of sequence variants with the most deleterious effect on protein function (potentially deleterious variants [pDelVars]) in nine vertebrate species: human, cattle, horse, sheep, pig, dog, rat, mouse, and zebrafish. The analysis was conducted using the Ensembl/BioMart tool. Genes comprising pDelVars in the highest number of examined species were identified using a Python script. Multiple genomic alignments of the selected genes were built to identify interspecies orthologous potentially deleterious variants, which we defined as the "ortho-pDelVars." Genome-wide prioritization revealed that in humans, 0.12% of the known variants are predicted to be deleterious. In seven out of nine examined vertebrate species, the genes encoding the multiple PDZ domain crumbs cell polarity complex component (MPDZ) and the transforming acidic coiled-coil containing protein 2 (TACC2) comprise pDelVars. Five interspecies ortho-pDelVars were identified in three genes. These findings offer new ways to harness genomics big data by facilitating the identification of functional polymorphisms in humans and animal models and thus provide a future basis for optimization of protocols for whole genome prioritization of pDelVars and screening of orthologous sequence variants. The approach presented here can inform various postgenomic applications such as personalized medicine and multiomics study of health interventions (iatromics).
Xavier, Alencar; Jarquin, Diego; Howard, Reka; Ramasubramanian, Vishnu; Specht, James E; Graef, George L; Beavis, William D; Diers, Brian W; Song, Qijian; Cregan, Perry B; Nelson, Randall; Mian, Rouf; Shannon, J Grover; McHale, Leah; Wang, Dechun; Schapaugh, William; Lorenz, Aaron J; Xu, Shizhong; Muir, William M; Rainey, Katy M
2018-02-02
Genetic improvement toward optimized and stable agronomic performance of soybean genotypes is desirable for food security. Understanding how genotypes perform in different environmental conditions helps breeders develop sustainable cultivars adapted to target regions. Complex traits of importance are known to be controlled by a large number of genomic regions with small effects whose magnitude and direction are modulated by environmental factors. Knowledge of the constraints and undesirable effects resulting from genotype by environmental interactions is a key objective in improving selection procedures in soybean breeding programs. In this study, the genetic basis of soybean grain yield responsiveness to environmental factors was examined in a large soybean nested association population. For this, a genome-wide association to performance stability estimates generated from a Finlay-Wilkinson analysis and the inclusion of the interaction between marker genotypes and environmental factors was implemented. Genomic footprints were investigated by analysis and meta-analysis using a recently published multiparent model. Results indicated that specific soybean genomic regions were associated with stability, and that multiplicative interactions were present between environments and genetic background. Seven genomic regions in six chromosomes were identified as being associated with genotype-by-environment interactions. This study provides insight into genomic assisted breeding aimed at achieving a more stable agronomic performance of soybean, and documented opportunities to exploit genomic regions that were specifically associated with interactions involving environments and subpopulations. Copyright © 2018 Xavier et al.
Smith, Andrew H.; Jensen, Kevin P.; Li, Jin; Nunez, Yaira; Farrer, Lindsay A.; Hakonarson, Hakon; Cook-Sather, Scott D.; Kranzler, Henry R.; Gelernter, Joel
2017-01-01
Opioids are very effective analgesics, but they are also highly addictive. Methadone is used to treat opioid dependence (OD), acting as a selective agonist at the μ-opioid receptor encoded by the gene OPRM1. Determining the optimal methadone maintenance dose is time-consuming; currently, no biomarkers are available to guide treatment. In methadone-treated OD subjects drawn from a case and control sample, we conducted a genome-wide association study (GWAS) of usual daily methadone dose. In African-American (AA) OD subjects (n = 383), we identified a genome-wide significant association between therapeutic methadone dose (mean = 68.0 mg, standard deviation (SD) = 30.1 mg) and rs73568641 (P = 2.8 × 10−8), the nearest gene (306 kilobases) being OPRM1. Each minor (C) allele corresponded to an additional ~20 mg/day of oral methadone, an effect specific to AAs. In European-Americans (EAs) (n = 1,027), no genome-wide significant associations with methadone dose (mean = 77.8 mg, SD = 33.9 mg) were observed. In an independent set of opioid-naïve AA children being treated for surgical pain, rs73568641-C was associated with a higher required dose of morphine (n = 241, P = 3.9 × 10−2). Similarly, independent genomic loci previously shown to associate with higher opioid analgesic dose were associated with higher methadone dose in the OD sample (AA and EA: n = 1,410, genetic score P = 1.3 × 10−3). The present results in AAs indicate that genetic variants influencing opioid sensitivity across different clinical settings could contribute to precision pharmacotherapy for pain and addiction. PMID:28115739
Systems metabolic engineering: genome-scale models and beyond.
Blazeck, John; Alper, Hal
2010-07-01
The advent of high throughput genome-scale bioinformatics has led to an exponential increase in available cellular system data. Systems metabolic engineering attempts to use data-driven approaches--based on the data collected with high throughput technologies--to identify gene targets and optimize phenotypical properties on a systems level. Current systems metabolic engineering tools are limited for predicting and defining complex phenotypes such as chemical tolerances and other global, multigenic traits. The most pragmatic systems-based tool for metabolic engineering to arise is the in silico genome-scale metabolic reconstruction. This tool has seen wide adoption for modeling cell growth and predicting beneficial gene knockouts, and we examine here how this approach can be expanded for novel organisms. This review will highlight advances of the systems metabolic engineering approach with a focus on de novo development and use of genome-scale metabolic reconstructions for metabolic engineering applications. We will then discuss the challenges and prospects for this emerging field to enable model-based metabolic engineering. Specifically, we argue that current state-of-the-art systems metabolic engineering techniques represent a viable first step for improving product yield that still must be followed by combinatorial techniques or random strain mutagenesis to achieve optimal cellular systems.
Nana-Djeunga, Hugues C.; Kengne-Ouafo, Jonas A.; Pion, Sébastien D. S.; Bopda, Jean; Kamgno, Joseph; Wanji, Samuel; Che, Hua; Kuesel, Annette C.; Walker, Martin; Basáñez, Maria-Gloria; Boakye, Daniel A.; Osei-Atweneboana, Mike Y.; Boussinesq, Michel; Prichard, Roger K.; Grant, Warwick N.
2017-01-01
Background Treatment of onchocerciasis using mass ivermectin administration has reduced morbidity and transmission throughout Africa and Central/South America. Mass drug administration is likely to exert selection pressure on parasites, and phenotypic and genetic changes in several Onchocerca volvulus populations from Cameroon and Ghana—exposed to more than a decade of regular ivermectin treatment—have raised concern that sub-optimal responses to ivermectin's anti-fecundity effect are becoming more frequent and may spread. Methodology/Principal findings Pooled next generation sequencing (Pool-seq) was used to characterise genetic diversity within and between 108 adult female worms differing in ivermectin treatment history and response. Genome-wide analyses revealed genetic variation that significantly differentiated good responder (GR) and sub-optimal responder (SOR) parasites. These variants were not randomly distributed but clustered in ~31 quantitative trait loci (QTLs), with little overlap in putative QTL position and gene content between the two countries. Published candidate ivermectin SOR genes were largely absent in these regions; QTLs differentiating GR and SOR worms were enriched for genes in molecular pathways associated with neurotransmission, development, and stress responses. Finally, single worm genotyping demonstrated that geographic isolation and genetic change over time (in the presence of drug exposure) had a significantly greater role in shaping genetic diversity than the evolution of SOR. Conclusions/Significance This study is one of the first genome-wide association analyses in a parasitic nematode, and provides insight into the genomics of ivermectin response and population structure of O. volvulus. We argue that ivermectin response is a polygenically-determined quantitative trait (QT) whereby identical or related molecular pathways but not necessarily individual genes are likely to determine the extent of ivermectin response in different parasite populations. Furthermore, we propose that genetic drift rather than genetic selection of SOR is the underlying driver of population differentiation, which has significant implications for the emergence and potential spread of SOR within and between these parasite populations. PMID:28746337
Tripathi, Charu; Mishra, Harshita; Khurana, Himani; Dwivedi, Vatsala; Kamra, Komal; Negi, Ram K.; Lal, Rup
2017-01-01
Thermophilic environments represent an interesting niche. Among thermophiles, the genus Thermus is among the most studied genera. In this study, we have sequenced the genome of Thermus parvatiensis strain RL, a thermophile isolated from Himalayan hot water springs (temperature >96°C) using PacBio RSII SMRT technique. The small genome (2.01 Mbp) comprises a chromosome (1.87 Mbp) and a plasmid (143 Kbp), designated in this study as pTP143. Annotation revealed a high number of repair genes, a squeezed genome but containing highly plastic plasmid with transposases, integrases, mobile elements and hypothetical proteins (44%). We performed a comparative genomic study of the group Thermus with an aim of analysing the phylogenetic relatedness as well as niche specific attributes prevalent among the group. We compared the reference genome RL with 16 Thermus genomes to assess their phylogenetic relationships based on 16S rRNA gene sequences, average nucleotide identity (ANI), conserved marker genes (31 and 400), pan genome and tetranucleotide frequency. The core genome of the analyzed genomes contained 1,177 core genes and many singleton genes were detected in individual genomes, reflecting a conserved core but adaptive pan repertoire. We demonstrated the presence of metagenomic islands (chromosome:5, plasmid:5) by recruiting raw metagenomic data (from the same niche) against the genomic replicons of T. parvatiensis. We also dissected the CRISPR loci wide all genomes and found widespread presence of this system across Thermus genomes. Additionally, we performed a comparative analysis of competence loci wide Thermus genomes and found evidence for recent horizontal acquisition of the locus and continued dispersal among members reflecting that natural competence is a beneficial survival trait among Thermus members and its acquisition depicts unending evolution in order to accomplish optimal fitness. PMID:28798737
Nair, Shalima S; Luu, Phuc-Loi; Qu, Wenjia; Maddugoda, Madhavi; Huschtscha, Lily; Reddel, Roger; Chenevix-Trench, Georgia; Toso, Martina; Kench, James G; Horvath, Lisa G; Hayes, Vanessa M; Stricker, Phillip D; Hughes, Timothy P; White, Deborah L; Rasko, John E J; Wong, Justin J-L; Clark, Susan J
2018-05-28
Comprehensive genome-wide DNA methylation profiling is critical to gain insights into epigenetic reprogramming during development and disease processes. Among the different genome-wide DNA methylation technologies, whole genome bisulphite sequencing (WGBS) is considered the gold standard for assaying genome-wide DNA methylation at single base resolution. However, the high sequencing cost to achieve the optimal depth of coverage limits its application in both basic and clinical research. To achieve 15× coverage of the human methylome, using WGBS, requires approximately three lanes of 100-bp-paired-end Illumina HiSeq 2500 sequencing. It is important, therefore, for advances in sequencing technologies to be developed to enable cost-effective high-coverage sequencing. In this study, we provide an optimised WGBS methodology, from library preparation to sequencing and data processing, to enable 16-20× genome-wide coverage per single lane of HiSeq X Ten, HCS 3.3.76. To process and analyse the data, we developed a WGBS pipeline (METH10X) that is fast and can call SNPs. We performed WGBS on both high-quality intact DNA and degraded DNA from formalin-fixed paraffin-embedded tissue. First, we compared different library preparation methods on the HiSeq 2500 platform to identify the best method for sequencing on the HiSeq X Ten. Second, we optimised the PhiX and genome spike-ins to achieve higher quality and coverage of WGBS data on the HiSeq X Ten. Third, we performed integrated whole genome sequencing (WGS) and WGBS of the same DNA sample in a single lane of HiSeq X Ten to improve data output. Finally, we compared methylation data from the HiSeq 2500 and HiSeq X Ten and found high concordance (Pearson r > 0.9×). Together we provide a systematic, efficient and complete approach to perform and analyse WGBS on the HiSeq X Ten. Our protocol allows for large-scale WGBS studies at reasonable processing time and cost on the HiSeq X Ten platform.
Jaiswal, Alok; Peddinti, Gopal; Akimov, Yevhen; Wennerberg, Krister; Kuznetsov, Sergey; Tang, Jing; Aittokallio, Tero
2017-06-01
Genome-wide loss-of-function profiling is widely used for systematic identification of genetic dependencies in cancer cells; however, the poor reproducibility of RNA interference (RNAi) screens has been a major concern due to frequent off-target effects. Currently, a detailed understanding of the key factors contributing to the sub-optimal consistency is still a lacking, especially on how to improve the reliability of future RNAi screens by controlling for factors that determine their off-target propensity. We performed a systematic, quantitative analysis of the consistency between two genome-wide shRNA screens conducted on a compendium of cancer cell lines, and also compared several gene summarization methods for inferring gene essentiality from shRNA level data. We then devised novel concepts of seed essentiality and shRNA family, based on seed region sequences of shRNAs, to study in-depth the contribution of seed-mediated off-target effects to the consistency of the two screens. We further investigated two seed-sequence properties, seed pairing stability, and target abundance in terms of their capability to minimize the off-target effects in post-screening data analysis. Finally, we applied this novel methodology to identify genetic interactions and synthetic lethal partners of cancer drivers, and confirmed differential essentiality phenotypes by detailed CRISPR/Cas9 experiments. Using the novel concepts of seed essentiality and shRNA family, we demonstrate how genome-wide loss-of-function profiling of a common set of cancer cell lines can be actually made fairly reproducible when considering seed-mediated off-target effects. Importantly, by excluding shRNAs having higher propensity for off-target effects, based on their seed-sequence properties, one can remove noise from the genome-wide shRNA datasets. As a translational application case, we demonstrate enhanced reproducibility of genetic interaction partners of common cancer drivers, as well as identify novel synthetic lethal partners of a major oncogenic driver, PIK3CA, supported by a complementary CRISPR/Cas9 experiment. We provide practical guidelines for improved design and analysis of genome-wide loss-of-function profiling and demonstrate how this novel strategy can be applied towards improved mapping of genetic dependencies of cancer cells to aid development of targeted anticancer treatments.
Bonomo, Maria Grazia; Sico, Maria Anna; Grieco, Simona; Salzano, Giovanni
2009-01-01
Lactobacillus sakei is widely used as starter in the production process of Italian fermented sausages and its growth and survival are affected by various factors. We studied the differential expression of genome in response to different stresses by the fluorescent differential display (FDD) technique. This study resulted in the development and optimization of an innovative technique, with a high level of reproducibility and quality, which allows the identification of gene expression changes associated with different microbial behaviours under different growth conditions. PMID:22253979
New Applications for Phage Integrases
Fogg, Paul C.M.; Colloms, Sean; Rosser, Susan; Stark, Marshall; Smith, Margaret C.M.
2014-01-01
Within the last 25 years, bacteriophage integrases have rapidly risen to prominence as genetic tools for a wide range of applications from basic cloning to genome engineering. Serine integrases such as that from ϕC31 and its relatives have found an especially wide range of applications within diverse micro-organisms right through to multi-cellular eukaryotes. Here, we review the mechanisms of the two major families of integrases, the tyrosine and serine integrases, and the advantages and disadvantages of each type as they are applied in genome engineering and synthetic biology. In particular, we focus on the new areas of metabolic pathway construction and optimization, biocomputing, heterologous expression and multiplexed assembly techniques. Integrases are versatile and efficient tools that can be used in conjunction with the various extant molecular biology tools to streamline the synthetic biology production line. PMID:24857859
Optimal selection of markers for validation or replication from genome-wide association studies.
Greenwood, Celia M T; Rangrej, Jagadish; Sun, Lei
2007-07-01
With reductions in genotyping costs and the fast pace of improvements in genotyping technology, it is not uncommon for the individuals in a single study to undergo genotyping using several different platforms, where each platform may contain different numbers of markers selected via different criteria. For example, a set of cases and controls may be genotyped at markers in a small set of carefully selected candidate genes, and shortly thereafter, the same cases and controls may be used for a genome-wide single nucleotide polymorphism (SNP) association study. After such initial investigations, often, a subset of "interesting" markers is selected for validation or replication. Specifically, by validation, we refer to the investigation of associations between the selected subset of markers and the disease in independent data. However, it is not obvious how to choose the best set of markers for this validation. There may be a prior expectation that some sets of genotyping data are more likely to contain real associations. For example, it may be more likely for markers in plausible candidate genes to show disease associations than markers in a genome-wide scan. Hence, it would be desirable to select proportionally more markers from the candidate gene set. When a fixed number of markers are selected for validation, we propose an approach for identifying an optimal marker-selection configuration by basing the approach on minimizing the stratified false discovery rate. We illustrate this approach using a case-control study of colorectal cancer from Ontario, Canada, and we show that this approach leads to substantial reductions in the estimated false discovery rates in the Ontario dataset for the selected markers, as well as reductions in the expected false discovery rates for the proposed validation dataset. Copyright 2007 Wiley-Liss, Inc.
SNP selection and classification of genome-wide SNP data using stratified sampling random forests.
Wu, Qingyao; Ye, Yunming; Liu, Yang; Ng, Michael K
2012-09-01
For high dimensional genome-wide association (GWA) case-control data of complex disease, there are usually a large portion of single-nucleotide polymorphisms (SNPs) that are irrelevant with the disease. A simple random sampling method in random forest using default mtry parameter to choose feature subspace, will select too many subspaces without informative SNPs. Exhaustive searching an optimal mtry is often required in order to include useful and relevant SNPs and get rid of vast of non-informative SNPs. However, it is too time-consuming and not favorable in GWA for high-dimensional data. The main aim of this paper is to propose a stratified sampling method for feature subspace selection to generate decision trees in a random forest for GWA high-dimensional data. Our idea is to design an equal-width discretization scheme for informativeness to divide SNPs into multiple groups. In feature subspace selection, we randomly select the same number of SNPs from each group and combine them to form a subspace to generate a decision tree. The advantage of this stratified sampling procedure can make sure each subspace contains enough useful SNPs, but can avoid a very high computational cost of exhaustive search of an optimal mtry, and maintain the randomness of a random forest. We employ two genome-wide SNP data sets (Parkinson case-control data comprised of 408 803 SNPs and Alzheimer case-control data comprised of 380 157 SNPs) to demonstrate that the proposed stratified sampling method is effective, and it can generate better random forest with higher accuracy and lower error bound than those by Breiman's random forest generation method. For Parkinson data, we also show some interesting genes identified by the method, which may be associated with neurological disorders for further biological investigations.
Badoni, Saurabh; Das, Sweta; Sayal, Yogesh K.; Gopalakrishnan, S.; Singh, Ashok K.; Rao, Atmakuri R.; Agarwal, Pinky; Parida, Swarup K.; Tyagi, Akhilesh K.
2016-01-01
We developed genome-wide 84634 ISM (intron-spanning marker) and 16510 InDel-fragment length polymorphism-based ILP (intron-length polymorphism) markers from genes physically mapped on 12 rice chromosomes. These genic markers revealed much higher amplification-efficiency (80%) and polymorphic-potential (66%) among rice accessions even by a cost-effective agarose gel-based assay. A wider level of functional molecular diversity (17–79%) and well-defined precise admixed genetic structure was assayed by 3052 genome-wide markers in a structured population of indica, japonica, aromatic and wild rice. Six major grain weight QTLs (11.9–21.6% phenotypic variation explained) were mapped on five rice chromosomes of a high-density (inter-marker distance: 0.98 cM) genetic linkage map (IR 64 x Sonasal) anchored with 2785 known/candidate gene-derived ISM and ILP markers. The designing of multiple ISM and ILP markers (2 to 4 markers/gene) in an individual gene will broaden the user-preference to select suitable primer combination for efficient assaying of functional allelic variation/diversity and realistic estimation of differential gene expression profiles among rice accessions. The genomic information generated in our study is made publicly accessible through a user-friendly web-resource, “Oryza ISM-ILP marker” database. The known/candidate gene-derived ISM and ILP markers can be enormously deployed to identify functionally relevant trait-associated molecular tags by optimal-resource expenses, leading towards genomics-assisted crop improvement in rice. PMID:27032371
Viral genome structures, charge, and sequences are optimal for capsid assembly
NASA Astrophysics Data System (ADS)
Hagan, Michael
2014-03-01
For many viruses, the spontaneous assembly of a capsid shell around the nu-cleic acid (NA) genome is an essential step in the viral life cycle. Capsid formation is a multicomponent, out-of-equilibrium assembly process for which kinetic effects and thermodynamic constraints compete to determine the outcome. Understand-ing how viral components drive highly efficient assembly under these constraints could promote biomedical efforts to block viral propagation, and would elucidate the factors controlling assembly in a wide range of systems containing proteins and polyelectrolytes. This talk will describe coarse-grained models of capsid proteins and NAs with which we investigate the dynamics and thermodynamics of virus assembly. In con-trast to recent theoretical models, we find that capsids spontaneously `overcharge' that is, the NA length which is kinetically and thermodynamically optimal possess-es a negative charge greater than the positive charge of the capsid. When applied to specific virus capsids, the calculated optimal NA lengths closely correspond to the natural viral genome lengths. These results suggest that the features included in this model (i.e. electrostatics, excluded volume, and NA tertiary structure) play key roles in determining assembly thermodynamics and consequently exert selec-tive pressure on viral evolution. I will then discuss mechanisms by which se-quence-specific interactions between NAs and capsid proteins promote selective encapsidation of the viral genome. This work was supported by NIH R01GM108021 and the Brandeis MRSEC NSF-MRSEC-0820492.
Genome size of Alexandrium catenella and Gracilariopsis lemaneiformis estimated by flow cytometry
NASA Astrophysics Data System (ADS)
Du, Qingwei; Sui, Zhenghong; Chang, Lianpeng; Wei, Huihui; Liu, Yuan; Mi, Ping; Shang, Erlei; Zeeshan, Niaz; Que, Zhou
2016-08-01
Flow cytometry (FCM) technique has been widely applied to estimating the genome size of various higher plants. However, there is few report about its application in algae. In this study, an optimized procedure of FCM was exploited to estimate the genome size of two eukaryotic algae. For analyzing Alexandrium catenella, an important red tide species, the whole cell instead of isolated nucleus was studied, and chicken erythrocytes were used as an internal reference. The genome size of A. catenella was estimated to be 56.48 ± 4.14 Gb (1C), approximately nineteen times larger than that of human genome. For analyzing Gracilariopsis lemaneiformis, an important economical red alga, the purified nucleus was employed, and Arabidopsis thaliana and Chondrus crispus were used as internal references, respectively. The genome size of Gp. lemaneiformis was 97.35 ± 2.58 Mb (1C) and 112.73 ± 14.00 Mb (1C), respectively, depending on the different internal references. The results of this research will promote the related studies on the genomics and evolution of these two species.
Yao, Shi; Guo, Yan; Dong, Shan-Shan; Hao, Ruo-Han; Chen, Xiao-Feng; Chen, Yi-Xiao; Chen, Jia-Bin; Tian, Qing; Deng, Hong-Wen; Yang, Tie-Lin
2017-08-01
Despite genome-wide association studies (GWASs) have identified many susceptibility genes for osteoporosis, it still leaves a large part of missing heritability to be discovered. Integrating regulatory information and GWASs could offer new insights into the biological link between the susceptibility SNPs and osteoporosis. We generated five machine learning classifiers with osteoporosis-associated variants and regulatory features data. We gained the optimal classifier and predicted genome-wide SNPs to discover susceptibility regulatory variants. We further utilized Genetic Factors for Osteoporosis Consortium (GEFOS) and three in-house GWASs samples to validate the associations for predicted positive SNPs. The random forest classifier performed best among all machine learning methods with the F1 score of 0.8871. Using the optimized model, we predicted 37,584 candidate SNPs for osteoporosis. According to the meta-analysis results, a list of regulatory variants was significantly associated with osteoporosis after multiple testing corrections and contributed to the expression of known osteoporosis-associated protein-coding genes. In summary, combining GWASs and regulatory elements through machine learning could provide additional information for understanding the mechanism of osteoporosis. The regulatory variants we predicted will provide novel targets for etiology research and treatment of osteoporosis.
Multichromosomal median and halving problems under different genomic distances
Tannier, Eric; Zheng, Chunfang; Sankoff, David
2009-01-01
Background Genome median and genome halving are combinatorial optimization problems that aim at reconstructing ancestral genomes as well as the evolutionary events leading from the ancestor to extant species. Exploring complexity issues is a first step towards devising efficient algorithms. The complexity of the median problem for unichromosomal genomes (permutations) has been settled for both the breakpoint distance and the reversal distance. Although the multichromosomal case has often been assumed to be a simple generalization of the unichromosomal case, it is also a relaxation so that complexity in this context does not follow from existing results, and is open for all distances. Results We settle here the complexity of several genome median and halving problems, including a surprising polynomial result for the breakpoint median and guided halving problems in genomes with circular and linear chromosomes, showing that the multichromosomal problem is actually easier than the unichromosomal problem. Still other variants of these problems are NP-complete, including the DCJ double distance problem, previously mentioned as an open question. We list the remaining open problems. Conclusion This theoretical study clears up a wide swathe of the algorithmical study of genome rearrangements with multiple multichromosomal genomes. PMID:19386099
A robust TALENs system for highly efficient mammalian genome editing.
Feng, Yuanxi; Zhang, Siliang; Huang, Xin
2014-01-10
Recently, transcription activator-like effector nucleases (TALENs) have emerged as a highly effective tool for genomic editing. A pair of TALENs binds to two DNA recognition sites separated by a spacer sequence, and the dimerized FokI nucleases at the C terminal then cleave DNA in the spacer. Because of its modular design and capacity to precisely target almost any desired genomic locus, TALEN is a technology that can revolutionize the entire biomedical research field. Currently, for genomic editing in cultured cells, two plasmids encoding a pair of TALENs are co-transfected, followed by limited dilution to isolate cell colonies with the intended genomic manipulation. However, uncertain transfection efficiency becomes a bottleneck, especially in hard-to-transfect cells, reducing the overall efficiency of genome editing. We have developed a robust TALENs system in which each TALEN plasmid also encodes a fluorescence protein. Thus, cells transfected with both TALEN plasmids, a prerequisite for genomic editing, can be isolated by fluorescence-activated cell sorting. Our improved TALENs system can be applied to all cultured cells to achieve highly efficient genomic editing. Furthermore, an optimized procedure for genomic editing using TALENs is also presented. We expect our system to be widely adopted by the scientific community.
Long Read Alignment with Parallel MapReduce Cloud Platform
Al-Absi, Ahmed Abdulhakim; Kang, Dae-Ki
2015-01-01
Genomic sequence alignment is an important technique to decode genome sequences in bioinformatics. Next-Generation Sequencing technologies produce genomic data of longer reads. Cloud platforms are adopted to address the problems arising from storage and analysis of large genomic data. Existing genes sequencing tools for cloud platforms predominantly consider short read gene sequences and adopt the Hadoop MapReduce framework for computation. However, serial execution of map and reduce phases is a problem in such systems. Therefore, in this paper, we introduce Burrows-Wheeler Aligner's Smith-Waterman Alignment on Parallel MapReduce (BWASW-PMR) cloud platform for long sequence alignment. The proposed cloud platform adopts a widely accepted and accurate BWA-SW algorithm for long sequence alignment. A custom MapReduce platform is developed to overcome the drawbacks of the Hadoop framework. A parallel execution strategy of the MapReduce phases and optimization of Smith-Waterman algorithm are considered. Performance evaluation results exhibit an average speed-up of 6.7 considering BWASW-PMR compared with the state-of-the-art Bwasw-Cloud. An average reduction of 30% in the map phase makespan is reported across all experiments comparing BWASW-PMR with Bwasw-Cloud. Optimization of Smith-Waterman results in reducing the execution time by 91.8%. The experimental study proves the efficiency of BWASW-PMR for aligning long genomic sequences on cloud platforms. PMID:26839887
Long Read Alignment with Parallel MapReduce Cloud Platform.
Al-Absi, Ahmed Abdulhakim; Kang, Dae-Ki
2015-01-01
Genomic sequence alignment is an important technique to decode genome sequences in bioinformatics. Next-Generation Sequencing technologies produce genomic data of longer reads. Cloud platforms are adopted to address the problems arising from storage and analysis of large genomic data. Existing genes sequencing tools for cloud platforms predominantly consider short read gene sequences and adopt the Hadoop MapReduce framework for computation. However, serial execution of map and reduce phases is a problem in such systems. Therefore, in this paper, we introduce Burrows-Wheeler Aligner's Smith-Waterman Alignment on Parallel MapReduce (BWASW-PMR) cloud platform for long sequence alignment. The proposed cloud platform adopts a widely accepted and accurate BWA-SW algorithm for long sequence alignment. A custom MapReduce platform is developed to overcome the drawbacks of the Hadoop framework. A parallel execution strategy of the MapReduce phases and optimization of Smith-Waterman algorithm are considered. Performance evaluation results exhibit an average speed-up of 6.7 considering BWASW-PMR compared with the state-of-the-art Bwasw-Cloud. An average reduction of 30% in the map phase makespan is reported across all experiments comparing BWASW-PMR with Bwasw-Cloud. Optimization of Smith-Waterman results in reducing the execution time by 91.8%. The experimental study proves the efficiency of BWASW-PMR for aligning long genomic sequences on cloud platforms.
Haplotype assembly in polyploid genomes and identical by descent shared tracts.
Aguiar, Derek; Istrail, Sorin
2013-07-01
Genome-wide haplotype reconstruction from sequence data, or haplotype assembly, is at the center of major challenges in molecular biology and life sciences. For complex eukaryotic organisms like humans, the genome is vast and the population samples are growing so rapidly that algorithms processing high-throughput sequencing data must scale favorably in terms of both accuracy and computational efficiency. Furthermore, current models and methodologies for haplotype assembly (i) do not consider individuals sharing haplotypes jointly, which reduces the size and accuracy of assembled haplotypes, and (ii) are unable to model genomes having more than two sets of homologous chromosomes (polyploidy). Polyploid organisms are increasingly becoming the target of many research groups interested in the genomics of disease, phylogenetics, botany and evolution but there is an absence of theory and methods for polyploid haplotype reconstruction. In this work, we present a number of results, extensions and generalizations of compass graphs and our HapCompass framework. We prove the theoretical complexity of two haplotype assembly optimizations, thereby motivating the use of heuristics. Furthermore, we present graph theory-based algorithms for the problem of haplotype assembly using our previously developed HapCompass framework for (i) novel implementations of haplotype assembly optimizations (minimum error correction), (ii) assembly of a pair of individuals sharing a haplotype tract identical by descent and (iii) assembly of polyploid genomes. We evaluate our methods on 1000 Genomes Project, Pacific Biosciences and simulated sequence data. HapCompass is available for download at http://www.brown.edu/Research/Istrail_Lab/. Supplementary data are available at Bioinformatics online.
Genomics DNA Profiling in Elite Professional Soccer Players: A Pilot Study
Kambouris, M; Del Buono, A; Maffulli, N
2014-01-01
Functional variants in exonic regions have been associated with development of cardiovascular disease, diabetes and cancer. Athletic performance can be considered a multi-factorial complex phenotype. Genomic DNA was extracted from buccal swabs of seven soccer players from the Fulham football team. Single nucleotide polymorphism (SNPs) genotyping was undertaken. To achieve optimal athletic performance, predictive genomics DNA profiling for sports performance can be used to aid in sport selection and elaboration of personalized training and nutrition programs. Predictive DNA profiling may be able to detect athletes with potential or frank injuries, or screening and selection of future athletes, and can help them to maximize utilization of their potential and improve performance in sports. The aim of this study is to provide a wide scenario of specific genomic variants that an athlete carries, to implement which measures should be taken to maximize the athlete’s potential. PMID:24809029
Das, Shouvik; Upadhyaya, Hari D.; Bajaj, Deepak; Kujur, Alice; Badoni, Saurabh; Laxmi; Kumar, Vinod; Tripathi, Shailesh; Gowda, C. L. Laxmipathi; Sharma, Shivali; Singh, Sube; Tyagi, Akhilesh K.; Parida, Swarup K.
2015-01-01
A rapid high-resolution genome-wide strategy for molecular mapping of major QTL(s)/gene(s) regulating important agronomic traits is vital for in-depth dissection of complex quantitative traits and genetic enhancement in chickpea. The present study for the first time employed a NGS-based whole-genome QTL-seq strategy to identify one major genomic region harbouring a robust 100-seed weight QTL using an intra-specific 221 chickpea mapping population (desi cv. ICC 7184 × desi cv. ICC 15061). The QTL-seq-derived major SW QTL (CaqSW1.1) was further validated by single-nucleotide polymorphism (SNP) and simple sequence repeat (SSR) marker-based traditional QTL mapping (47.6% R2 at higher LOD >19). This reflects the reliability and efficacy of QTL-seq as a strategy for rapid genome-wide scanning and fine mapping of major trait regulatory QTLs in chickpea. The use of QTL-seq and classical QTL mapping in combination narrowed down the 1.37 Mb (comprising 177 genes) major SW QTL (CaqSW1.1) region into a 35 kb genomic interval on desi chickpea chromosome 1 containing six genes. One coding SNP (G/A)-carrying constitutive photomorphogenic9 (COP9) signalosome complex subunit 8 (CSN8) gene of these exhibited seed-specific expression, including pronounced differential up-/down-regulation in low and high seed weight mapping parents and homozygous individuals during seed development. The coding SNP mined in this potential seed weight-governing candidate CSN8 gene was found to be present exclusively in all cultivated species/genotypes, but not in any wild species/genotypes of primary, secondary and tertiary gene pools. This indicates the effect of strong artificial and/or natural selection pressure on target SW locus during chickpea domestication. The proposed QTL-seq-driven integrated genome-wide strategy has potential to delineate major candidate gene(s) harbouring a robust trait regulatory QTL rapidly with optimal use of resources. This will further assist us to extrapolate the molecular mechanism underlying complex quantitative traits at a genome-wide scale leading to fast-paced marker-assisted genetic improvement in diverse crop plants, including chickpea. PMID:25922536
Snyder-Mackler, Noah; Majoros, William H.; Yuan, Michael L.; Shaver, Amanda O.; Gordon, Jacob B.; Kopp, Gisela H.; Schlebusch, Stephen A.; Wall, Jeffrey D.; Alberts, Susan C.; Mukherjee, Sayan; Zhou, Xiang; Tung, Jenny
2016-01-01
Research on the genetics of natural populations was revolutionized in the 1990s by methods for genotyping noninvasively collected samples. However, these methods have remained largely unchanged for the past 20 years and lag far behind the genomics era. To close this gap, here we report an optimized laboratory protocol for genome-wide capture of endogenous DNA from noninvasively collected samples, coupled with a novel computational approach to reconstruct pedigree links from the resulting low-coverage data. We validated both methods using fecal samples from 62 wild baboons, including 48 from an independently constructed extended pedigree. We enriched fecal-derived DNA samples up to 40-fold for endogenous baboon DNA and reconstructed near-perfect pedigree relationships even with extremely low-coverage sequencing. We anticipate that these methods will be broadly applicable to the many research systems for which only noninvasive samples are available. The lab protocol and software (“WHODAD”) are freely available at www.tung-lab.org/protocols-and-software.html and www.xzlab.org/software.html, respectively. PMID:27098910
Daya, Michelle; van der Merwe, Lize; Galal, Ushma; Möller, Marlo; Salie, Muneeb; Chimusa, Emile R.; Galanter, Joshua M.; van Helden, Paul D.; Henn, Brenna M.; Gignoux, Chris R.; Hoal, Eileen
2013-01-01
Admixture is a well known confounder in genetic association studies. If genome-wide data is not available, as would be the case for candidate gene studies, ancestry informative markers (AIMs) are required in order to adjust for admixture. The predominant population group in the Western Cape, South Africa, is the admixed group known as the South African Coloured (SAC). A small set of AIMs that is optimized to distinguish between the five source populations of this population (African San, African non-San, European, South Asian, and East Asian) will enable researchers to cost-effectively reduce false-positive findings resulting from ignoring admixture in genetic association studies of the population. Using genome-wide data to find SNPs with large allele frequency differences between the source populations of the SAC, as quantified by Rosenberg et. al's -statistic, we developed a panel of AIMs by experimenting with various selection strategies. Subsets of different sizes were evaluated by measuring the correlation between ancestry proportions estimated by each AIM subset with ancestry proportions estimated using genome-wide data. We show that a panel of 96 AIMs can be used to assess ancestry proportions and to adjust for the confounding effect of the complex five-way admixture that occurred in the South African Coloured population. PMID:24376522
Boland, PM; Ruth, K; Matro, JM; Rainey, KL; Fang, CY; Wong, YN; Daly, MB; Hall, MJ
2014-01-01
Genomic tests are increasingly complex, less expensive, and more widely available with the advent of next-generation sequencing (NGS). We assessed knowledge and perceptions among genetic counselors pertaining to NGS genomic testing via an online survey. Associations between selected characteristics and perceptions were examined. Recent education on NGS testing was common, but practical experience limited. Perceived understanding of clinical NGS was modest, specifically concerning tumor testing. Greater perceived understanding of clinical NGS testing correlated with more time spent in cancer-related counseling, exposure to NGS testing, and NGS-focused education. Substantial disagreement about the role of counseling for tumor-based testing was seen. Finally, a majority of counselors agreed with the need for more education about clinical NGS testing, supporting this approach to optimizing implementation. PMID:25523111
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lian, Jiazhang; Mishra, Shekhar; Zhao, Huimin
Metabolic engineering aims to develop efficient cell factories by rewiring cellular metabolism. As one of the most commonly used cell factories, Saccharomyces cerevisiae has been extensively engineered to produce a wide variety of products at high levels from various feedstocks. In this paper, we summarize the recent development of metabolic engineering approaches to modulate yeast metabolism with representative examples. Particularly, we highlight new tools for biosynthetic pathway optimization (i.e. combinatorial transcriptional engineering and dynamic metabolic flux control) and genome engineering (i.e. clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR associated (Cas) system based genome engineering and RNA interference assisted genome evolution)more » to advance metabolic engineering in yeast. Lastly, we also discuss the challenges and perspectives for high throughput metabolic engineering.« less
Lian, Jiazhang; Mishra, Shekhar; Zhao, Huimin
2018-04-25
Metabolic engineering aims to develop efficient cell factories by rewiring cellular metabolism. As one of the most commonly used cell factories, Saccharomyces cerevisiae has been extensively engineered to produce a wide variety of products at high levels from various feedstocks. In this paper, we summarize the recent development of metabolic engineering approaches to modulate yeast metabolism with representative examples. Particularly, we highlight new tools for biosynthetic pathway optimization (i.e. combinatorial transcriptional engineering and dynamic metabolic flux control) and genome engineering (i.e. clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR associated (Cas) system based genome engineering and RNA interference assisted genome evolution)more » to advance metabolic engineering in yeast. Lastly, we also discuss the challenges and perspectives for high throughput metabolic engineering.« less
Genome Editing and Its Applications in Model Organisms.
Ma, Dongyuan; Liu, Feng
2015-12-01
Technological advances are important for innovative biological research. Development of molecular tools for DNA manipulation, such as zinc finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), and the clustered regularly-interspaced short palindromic repeat (CRISPR)/CRISPR-associated (Cas), has revolutionized genome editing. These approaches can be used to develop potential therapeutic strategies to effectively treat heritable diseases. In the last few years, substantial progress has been made in CRISPR/Cas technology, including technical improvements and wide application in many model systems. This review describes recent advancements in genome editing with a particular focus on CRISPR/Cas, covering the underlying principles, technological optimization, and its application in zebrafish and other model organisms, disease modeling, and gene therapy used for personalized medicine. Copyright © 2016 The Authors. Production and hosting by Elsevier Ltd.. All rights reserved.
Guide RNA selection for CRISPR-Cas9 transfections in Plasmodium falciparum.
Ribeiro, Jose M; Garriga, Meera; Potchen, Nicole; Crater, Anna K; Gupta, Ankit; Ito, Daisuke; Desai, Sanjay A
2018-06-12
CRISPR-Cas9 mediated genome editing is addressing key limitations in the transfection of malaria parasites. While this method has already simplified the needed molecular cloning and reduced the time required to generate mutants in the human pathogen Plasmodium falciparum, optimal selection of required guide RNAs and guidelines for successful transfections have not been well characterized, leading workers to use time-consuming trial and error approaches. We used a genome-wide computational approach to create a comprehensive and publicly accessible database of possible guide RNA sequences in the P. falciparum genome. For each guide, we report on-target efficiency and specificity scores as well as information about the genomic site relevant to optimal design of CRISPR-Cas9 transfections to modify, disrupt, or conditionally knockdown any gene. As many antimalarial drug and vaccine targets are encoded by multigene families, we also developed a new paralog specificity score that should facilitate modification of either a single family member of interest or multiple paralogs that serve overlapping roles. Finally, we tabulated features of successful transfections in our laboratory, providing broadly useful guidelines for parasite transfections. Molecular studies aimed at understanding parasite biology or characterizing drug and vaccine targets in P. falciparum should be facilitated by this comprehensive database. Published by Elsevier Ltd.
Challenges and Opportunities in Genome-Wide Environmental Interaction (GWEI) studies
Aschard, Hugues; Lutz, Sharon; Maus, Bärbel; Duell, Eric J.; Fingerlin, Tasha; Chatterjee, Nilanjan; Kraft, Peter; Van Steen, Kristel
2012-01-01
The interest in performing gene-environment interaction studies has seen a significant increase with the increase of advanced molecular genetics techniques. Practically, it became possible to investigate the role of environmental factors in disease risk and hence to investigate their role as genetic effect modifiers. The understanding that genetics is important in the uptake and metabolism of toxic substances is an example of how genetic profiles can modify important environmental risk factors to disease. Several rationales exist to set up gene-environment interaction studies and the technical challenges related to these studies – when the number of environmental or genetic risk factors is relatively small – has been described before. In the post-genomic era, it is now possible to study thousands of genes and their interaction with the environment. This brings along a whole range of new challenges and opportunities. Despite a continuing effort in developing efficient methods and optimal bioinformatics infrastructures to deal with the available wealth of data, the challenge remains how to best present and analyze Genome-Wide Environmental Interaction (GWEI) studies involving multiple genetic and environmental factors. Since GWEIs are performed at the intersection of statistical genetics, bioinformatics and epidemiology, usually similar problems need to be dealt with as for Genome-Wide Association gene-gene Interaction (GWAI) studies. However, additional complexities need to be considered which are typical for large-scale epidemiological studies, but are also related to “joining” two heterogeneous types of data in explaining complex disease trait variation or for prediction purposes. PMID:22760307
Machine learning derived risk prediction of anorexia nervosa.
Guo, Yiran; Wei, Zhi; Keating, Brendan J; Hakonarson, Hakon
2016-01-20
Anorexia nervosa (AN) is a complex psychiatric disease with a moderate to strong genetic contribution. In addition to conventional genome wide association (GWA) studies, researchers have been using machine learning methods in conjunction with genomic data to predict risk of diseases in which genetics play an important role. In this study, we collected whole genome genotyping data on 3940 AN cases and 9266 controls from the Genetic Consortium for Anorexia Nervosa (GCAN), the Wellcome Trust Case Control Consortium 3 (WTCCC3), Price Foundation Collaborative Group and the Children's Hospital of Philadelphia (CHOP), and applied machine learning methods for predicting AN disease risk. The prediction performance is measured by area under the receiver operating characteristic curve (AUC), indicating how well the model distinguishes cases from unaffected control subjects. Logistic regression model with the lasso penalty technique generated an AUC of 0.693, while Support Vector Machines and Gradient Boosted Trees reached AUC's of 0.691 and 0.623, respectively. Using different sample sizes, our results suggest that larger datasets are required to optimize the machine learning models and achieve higher AUC values. To our knowledge, this is the first attempt to assess AN risk based on genome wide genotype level data. Future integration of genomic, environmental and family-based information is likely to improve the AN risk evaluation process, eventually benefitting AN patients and families in the clinical setting.
Feltus, F Alex
2014-06-01
Understanding the control of any trait optimally requires the detection of causal genes, gene interaction, and mechanism of action to discover and model the biochemical pathways underlying the expressed phenotype. Functional genomics techniques, including RNA expression profiling via microarray and high-throughput DNA sequencing, allow for the precise genome localization of biological information. Powerful genetic approaches, including quantitative trait locus (QTL) and genome-wide association study mapping, link phenotype with genome positions, yet genetics is less precise in localizing the relevant mechanistic information encoded in DNA. The coupling of salient functional genomic signals with genetically mapped positions is an appealing approach to discover meaningful gene-phenotype relationships. Techniques used to define this genetic-genomic convergence comprise the field of systems genetics. This short review will address an application of systems genetics where RNA profiles are associated with genetically mapped genome positions of individual genes (eQTL mapping) or as gene sets (co-expression network modules). Both approaches can be applied for knowledge independent selection of candidate genes (and possible control mechanisms) underlying complex traits where multiple, likely unlinked, genomic regions might control specific complex traits. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Kuhn, Alexandre; Ong, Yao Min; Quake, Stephen R; Burkholder, William F
2015-07-08
Like other structural variants, transposable element insertions can be highly polymorphic across individuals. Their functional impact, however, remains poorly understood. Current genome-wide approaches for genotyping insertion-site polymorphisms based on targeted or whole-genome sequencing remain very expensive and can lack accuracy, hence new large-scale genotyping methods are needed. We describe a high-throughput method for genotyping transposable element insertions and other types of structural variants that can be assayed by breakpoint PCR. The method relies on next-generation sequencing of multiplex, site-specific PCR amplification products and read count-based genotype calls. We show that this method is flexible, efficient (it does not require rounds of optimization), cost-effective and highly accurate. This method can benefit a wide range of applications from the routine genotyping of animal and plant populations to the functional study of structural variants in humans.
Schmidt, Martin; Van Bel, Michiel; Woloszynska, Magdalena; Slabbinck, Bram; Martens, Cindy; De Block, Marc; Coppens, Frederik; Van Lijsebettens, Mieke
2017-07-06
Cytosine methylation in plant genomes is important for the regulation of gene transcription and transposon activity. Genome-wide methylomes are studied upon mutation of the DNA methyltransferases, adaptation to environmental stresses or during development. However, from basic biology to breeding programs, there is a need to monitor multiple samples to determine transgenerational methylation inheritance or differential cytosine methylation. Methylome data obtained by sodium hydrogen sulfite (bisulfite)-conversion and next-generation sequencing (NGS) provide genome-wide information on cytosine methylation. However, a profiling method that detects cytosine methylation state dispersed over the genome would allow high-throughput analysis of multiple plant samples with distinct epigenetic signatures. We use specific restriction endonucleases to enrich for cytosine coverage in a bisulfite and NGS-based profiling method, which was compared to whole-genome bisulfite sequencing of the same plant material. We established an effective methylome profiling method in plants, termed plant-reduced representation bisulfite sequencing (plant-RRBS), using optimized double restriction endonuclease digestion, fragment end repair, adapter ligation, followed by bisulfite conversion, PCR amplification and NGS. We report a performant laboratory protocol and a straightforward bioinformatics data analysis pipeline for plant-RRBS, applicable for any reference-sequenced plant species. As a proof of concept, methylome profiling was performed using an Oryza sativa ssp. indica pure breeding line and a derived epigenetically altered line (epiline). Plant-RRBS detects methylation levels at tens of millions of cytosine positions deduced from bisulfite conversion in multiple samples. To evaluate the method, the coverage of cytosine positions, the intra-line similarity and the differential cytosine methylation levels between the pure breeding line and the epiline were determined. Plant-RRBS reproducibly covers commonly up to one fourth of the cytosine positions in the rice genome when using MspI-DpnII within a group of five biological replicates of a line. The method predominantly detects cytosine methylation in putative promoter regions and not-annotated regions in rice. Plant-RRBS offers high-throughput and broad, genome-dispersed methylation detection by effective read number generation obtained from reproducibly covered genome fractions using optimized endonuclease combinations, facilitating comparative analyses of multi-sample studies for cytosine methylation and transgenerational stability in experimental material and plant breeding populations.
CRISPR editing in biological and biomedical investigation.
Huang, Jiaojiao; Wang, Yanfang; Zhao, Jianguo
2018-05-01
Recently, clustered regularly interspaced short palindromic repeats (CRISPR) based genomic editing technologies have armed researchers with powerful new tools to biological and biomedical investigations. To further improve and expand its functionality, natural, and engineered CRISPR associated nine proteins (Cas9s) have been investigated, various CRISPR delivery strategies have been tested and optimized, and multiple schemes have been developed to ensure precise mammalian genome editing. Benefiting from those in-depth understanding and further development of CRISPR, versatile CRISPR-based platforms for genome editing have been rapidly developed to advance investigations in biology and biomedicine. In biological research area, CRISPR has been widely adopted in both fundamental and applied research fields, such as accurate base editing, transcriptional regulation, and genome-wide screening. In biomedical research area, CRISPR has also shown its extensive applicability in the establishment of animal models for genetic disorders especially those large animals and non-human primates models, and gene therapy to combat virus infectious diseases, to correct monogenic disorders in vivo or in pluripotent cells. In this prospect article, after highlighting recent developments of CRISPR systems, we outline different applications and current limitations of CRISPR use in biological and biomedical investigation. Finally, we provide a perspective for future development and potential risks of this multifunctional technology. © 2017 Wiley Periodicals, Inc.
Computer vision and machine learning for robust phenotyping in genome-wide studies
Zhang, Jiaoping; Naik, Hsiang Sing; Assefa, Teshale; Sarkar, Soumik; Reddy, R. V. Chowda; Singh, Arti; Ganapathysubramanian, Baskar; Singh, Asheesh K.
2017-01-01
Traditional evaluation of crop biotic and abiotic stresses are time-consuming and labor-intensive limiting the ability to dissect the genetic basis of quantitative traits. A machine learning (ML)-enabled image-phenotyping pipeline for the genetic studies of abiotic stress iron deficiency chlorosis (IDC) of soybean is reported. IDC classification and severity for an association panel of 461 diverse plant-introduction accessions was evaluated using an end-to-end phenotyping workflow. The workflow consisted of a multi-stage procedure including: (1) optimized protocols for consistent image capture across plant canopies, (2) canopy identification and registration from cluttered backgrounds, (3) extraction of domain expert informed features from the processed images to accurately represent IDC expression, and (4) supervised ML-based classifiers that linked the automatically extracted features with expert-rating equivalent IDC scores. ML-generated phenotypic data were subsequently utilized for the genome-wide association study and genomic prediction. The results illustrate the reliability and advantage of ML-enabled image-phenotyping pipeline by identifying previously reported locus and a novel locus harboring a gene homolog involved in iron acquisition. This study demonstrates a promising path for integrating the phenotyping pipeline into genomic prediction, and provides a systematic framework enabling robust and quicker phenotyping through ground-based systems. PMID:28272456
Tong, Yunxia; Chen, Qiang; Nichols, Thomas E.; Rasetti, Roberta; Callicott, Joseph H.; Berman, Karen F.; Weinberger, Daniel R.; Mattay, Venkata S.
2016-01-01
A data-driven hypothesis-free genome-wide association (GWA) approach in imaging genetics studies allows screening the entire genome to discover novel genes that modulate brain structure, chemistry, and function. However, a whole brain voxel-wise analysis approach in such genome-wide based imaging genetic studies can be computationally intense and also likely has low statistical power since a stringent multiple comparisons correction is needed for searching over the entire genome and brain. In imaging genetics with functional magnetic resonance imaging (fMRI) phenotypes, since many experimental paradigms activate focal regions that can be pre-specified based on a priori knowledge, reducing the voxel-wise search to single-value summary measures within a priori ROIs could prove efficient and promising. The goal of this investigation is to evaluate the sensitivity and reliability of different single-value ROI summary measures and provide guidance in future work. Four different fMRI databases were tested and comparisons across different groups (patients with schizophrenia, their siblings, vs. normal control subjects; across genotype groups) were conducted. Our results show that four of these measures, particularly those that represent values from the top most-activated voxels within an ROI are more powerful at reliably detecting group differences and generating greater effect sizes than the others. PMID:26974435
Johnston, Susan E; Orell, Panu; Pritchard, Victoria L; Kent, Matthew P; Lien, Sigbjørn; Niemelä, Eero; Erkinaro, Jaakko; Primmer, Craig R
2014-07-01
Delaying sexual maturation can lead to larger body size and higher reproductive success, but carries an increased risk of death before reproducing. Classical life history theory predicts that trade-offs between reproductive success and survival should lead to the evolution of an optimal strategy in a given population. However, variation in mating strategies generally persists, and in general, there remains a poor understanding of genetic and physiological mechanisms underlying this variation. One extreme case of this is in the Atlantic salmon (Salmo salar), which can show variation in the age at which they return from their marine migration to spawn (i.e. their 'sea age'). This results in large size differences between strategies, with direct implications for individual fitness. Here, we used an Illumina Infinium SNP array to identify regions of the genome associated with variation in sea age in a large population of Atlantic salmon in Northern Europe, implementing individual-based genome-wide association studies (GWAS) and population-based FST outlier analyses. We identified several regions of the genome which vary in association with phenotype and/or selection between sea ages, with nearby genes having functions related to muscle development, metabolism, immune response and mate choice. In addition, we found that individuals of different sea ages belong to different, yet sympatric populations in this system, indicating that reproductive isolation may be driven by divergence between stable strategies. Overall, this study demonstrates how genome-wide methodologies can be integrated with samples collected from wild, structured populations to understand their ecology and evolution in a natural context. © 2014 John Wiley & Sons Ltd.
Sánchez-Sevilla, José F.; Horvath, Aniko; Botella, Miguel A.; Gaston, Amèlia; Folta, Kevin; Kilian, Andrzej; Denoyes, Beatrice; Amaya, Iraida
2015-01-01
Cultivated strawberry (Fragaria × ananassa) is a genetically complex allo-octoploid crop with 28 pairs of chromosomes (2n = 8x = 56) for which a genome sequence is not yet available. The diploid Fragaria vesca is considered the donor species of one of the octoploid sub-genomes and its available genome sequence can be used as a reference for genomic studies. A wide number of strawberry cultivars are stored in ex situ germplasm collections world-wide but a number of previous studies have addressed the genetic diversity present within a limited number of these collections. Here, we report the development and application of two platforms based on the implementation of Diversity Array Technology (DArT) markers for high-throughput genotyping in strawberry. The first DArT microarray was used to evaluate the genetic diversity of 62 strawberry cultivars that represent a wide range of variation based on phenotype, geographical and temporal origin and pedigrees. A total of 603 DArT markers were used to evaluate the diversity and structure of the population and their cluster analyses revealed that these markers were highly efficient in classifying the accessions in groups based on historical, geographical and pedigree-based cues. The second DArTseq platform took benefit of the complexity reduction method optimized for strawberry and the development of next generation sequencing technologies. The strawberry DArTseq was used to generate a total of 9,386 SNP markers in the previously developed ‘232’ × ‘1392’ mapping population, of which, 4,242 high quality markers were further selected to saturate this map after several filtering steps. The high-throughput platforms here developed for genotyping strawberry will facilitate genome-wide characterizations of large accessions sets and complement other available options. PMID:26675207
Identification of two integration sites in favor of transgene expression in Trichoderma reesei.
Qin, Lina; Jiang, Xianzhang; Dong, Zhiyang; Huang, Jianzhong; Chen, Xiuzhen
2018-01-01
The ascomycete fungus Trichoderma reesei was widely used as a biotechnological workhorse for production of cellulases and recombinant proteins due to its large capacity of protein secretion. Transgenesis by random integration of a gene of interest (GOI) into the genome of T. reesei can generate series of strains that express different levels of the indicated transgene. The insertion site of the GOI plays an important role in the ultimate production of the targeted proteins. However, so far no systematic studies have been made to identify transgene integration loci for optimal expression of the GOI in T. reesei . Currently, only the locus of exocellobiohydrolases I encoding gene ( cbh1) is widely used as a promising integration site to lead to high expression level of the GOI. No additional sites associated with efficient gene expression have been characterized. To search for gene integration sites that benefit for the secreted expression of GOI, the food-and-mouth disease virus 2A protein was applied for co-expression of an Aspergillus niger lipA gene and Discosoma sp. DsRed1 gene in T. reesei, by random integration of the expression cassette into the genome. We demonstrated that the fluorescent intensity of RFP (red fluorescent protein) inside of the cell was well correlated with the secreted lipase yields, based on which, we successfully developed a high-throughput screening method to screen strains with relatively higher secreted expression of the GOI (in this study, lipase). The copy number and the insertion sites of the transgene were investigated among the selected highly expressed strains. Eventually, in addition to cbh1 gene locus, two other genome insertion loci that efficiently facilitate gene expression in T. reesei were identified. We have successfully developed a high-throughput screening method to screen strains with optimal expression of the indicated secreted proteins in T. reesei . Moreover, we identified two optimal genome loci for transgene expression, which could provide new approach to modulate gene expression levels while retaining the indicated promoter and culture conditions.
Evaluation and Design of Genome-Wide CRISPR/SpCas9 Knockout Screens
Hart, Traver; Tong, Amy Hin Yan; Chan, Katie; Van Leeuwen, Jolanda; Seetharaman, Ashwin; Aregger, Michael; Chandrashekhar, Megha; Hustedt, Nicole; Seth, Sahil; Noonan, Avery; Habsid, Andrea; Sizova, Olga; Nedyalkova, Lyudmila; Climie, Ryan; Tworzyanski, Leanne; Lawson, Keith; Sartori, Maria Augusta; Alibeh, Sabriyeh; Tieu, David; Masud, Sanna; Mero, Patricia; Weiss, Alexander; Brown, Kevin R.; Usaj, Matej; Billmann, Maximilian; Rahman, Mahfuzur; Costanzo, Michael; Myers, Chad L.; Andrews, Brenda J.; Boone, Charles; Durocher, Daniel; Moffat, Jason
2017-01-01
The adaptation of CRISPR/SpCas9 technology to mammalian cell lines is transforming the study of human functional genomics. Pooled libraries of CRISPR guide RNAs (gRNAs) targeting human protein-coding genes and encoded in viral vectors have been used to systematically create gene knockouts in a variety of human cancer and immortalized cell lines, in an effort to identify whether these knockouts cause cellular fitness defects. Previous work has shown that CRISPR screens are more sensitive and specific than pooled-library shRNA screens in similar assays, but currently there exists significant variability across CRISPR library designs and experimental protocols. In this study, we reanalyze 17 genome-scale knockout screens in human cell lines from three research groups, using three different genome-scale gRNA libraries. Using the Bayesian Analysis of Gene Essentiality algorithm to identify essential genes, we refine and expand our previously defined set of human core essential genes from 360 to 684 genes. We use this expanded set of reference core essential genes, CEG2, plus empirical data from six CRISPR knockout screens to guide the design of a sequence-optimized gRNA library, the Toronto KnockOut version 3.0 (TKOv3) library. We then demonstrate the high effectiveness of the library relative to reference sets of essential and nonessential genes, as well as other screens using similar approaches. The optimized TKOv3 library, combined with the CEG2 reference set, provide an efficient, highly optimized platform for performing and assessing gene knockout screens in human cell lines. PMID:28655737
Genome-Wide Association of the Laboratory-Based Nicotine Metabolite Ratio in Three Ancestries.
Baurley, James W; Edlund, Christopher K; Pardamean, Carissa I; Conti, David V; Krasnow, Ruth; Javitz, Harold S; Hops, Hyman; Swan, Gary E; Benowitz, Neal L; Bergen, Andrew W
2016-09-01
Metabolic enzyme variation and other patient and environmental characteristics influence smoking behaviors, treatment success, and risk of related disease. Population-specific variation in metabolic genes contributes to challenges in developing and optimizing pharmacogenetic interventions. We applied a custom genome-wide genotyping array for addiction research (Smokescreen), to three laboratory-based studies of nicotine metabolism with oral or venous administration of labeled nicotine and cotinine, to model nicotine metabolism in multiple populations. The trans-3'-hydroxycotinine/cotinine ratio, the nicotine metabolite ratio (NMR), was the nicotine metabolism measure analyzed. Three hundred twelve individuals of self-identified European, African, and Asian American ancestry were genotyped and included in ancestry-specific genome-wide association scans (GWAS) and a meta-GWAS analysis of the NMR. We modeled natural-log transformed NMR with covariates: principal components of genetic ancestry, age, sex, body mass index, and smoking status. African and Asian American NMRs were statistically significantly (P values ≤ 5E-5) lower than European American NMRs. Meta-GWAS analysis identified 36 genome-wide significant variants over a 43 kilobase pair region at CYP2A6 with minimum P = 2.46E-18 at rs12459249, proximal to CYP2A6. Additional minima were located in intron 4 (rs56113850, P = 6.61E-18) and in the CYP2A6-CYP2A7 intergenic region (rs34226463, P = 1.45E-12). Most (34/36) genome-wide significant variants suggested reduced CYP2A6 activity; functional mechanisms were identified and tested in knowledge-bases. Conditional analysis resulted in intergenic variants of possible interest (P values < 5E-5). This meta-GWAS of the NMR identifies CYP2A6 variants, replicates the top-ranked single nucleotide polymorphism from a recent Finnish meta-GWAS of the NMR, identifies functional mechanisms, and provides pan-continental population biomarkers for nicotine metabolism. This multiple ancestry meta-GWAS of the laboratory study-based NMR provides novel evidence and replication for genome-wide association of CYP2A6 single nucleotide and insertion-deletion polymorphisms. We identify three regions of genome-wide significance: proximal, intronic, and distal to CYP2A6. We replicate the top-ranking single nucleotide polymorphism from a recent GWAS of the NMR in Finnish smokers, identify a functional mechanism for this intronic variant from in silico analyses of RNA-seq data that is consistent with CYP2A6 expression measured in postmortem lung and liver, and provide additional support for the intergenic region between CYP2A6 and CYP2A7. © The Author 2016. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco.
Genome-Wide Association of the Laboratory-Based Nicotine Metabolite Ratio in Three Ancestries
Baurley, James W.; Edlund, Christopher K.; Pardamean, Carissa I.; Conti, David V.; Krasnow, Ruth; Javitz, Harold S.; Hops, Hyman; Swan, Gary E.; Benowitz, Neal L.
2016-01-01
Introduction: Metabolic enzyme variation and other patient and environmental characteristics influence smoking behaviors, treatment success, and risk of related disease. Population-specific variation in metabolic genes contributes to challenges in developing and optimizing pharmacogenetic interventions. We applied a custom genome-wide genotyping array for addiction research (Smokescreen), to three laboratory-based studies of nicotine metabolism with oral or venous administration of labeled nicotine and cotinine, to model nicotine metabolism in multiple populations. The trans-3′-hydroxycotinine/cotinine ratio, the nicotine metabolite ratio (NMR), was the nicotine metabolism measure analyzed. Methods: Three hundred twelve individuals of self-identified European, African, and Asian American ancestry were genotyped and included in ancestry-specific genome-wide association scans (GWAS) and a meta-GWAS analysis of the NMR. We modeled natural-log transformed NMR with covariates: principal components of genetic ancestry, age, sex, body mass index, and smoking status. Results: African and Asian American NMRs were statistically significantly (P values ≤ 5E-5) lower than European American NMRs. Meta-GWAS analysis identified 36 genome-wide significant variants over a 43 kilobase pair region at CYP2A6 with minimum P = 2.46E-18 at rs12459249, proximal to CYP2A6. Additional minima were located in intron 4 (rs56113850, P = 6.61E-18) and in the CYP2A6-CYP2A7 intergenic region (rs34226463, P = 1.45E-12). Most (34/36) genome-wide significant variants suggested reduced CYP2A6 activity; functional mechanisms were identified and tested in knowledge-bases. Conditional analysis resulted in intergenic variants of possible interest (P values < 5E-5). Conclusions: This meta-GWAS of the NMR identifies CYP2A6 variants, replicates the top-ranked single nucleotide polymorphism from a recent Finnish meta-GWAS of the NMR, identifies functional mechanisms, and provides pan-continental population biomarkers for nicotine metabolism. Implications: This multiple ancestry meta-GWAS of the laboratory study-based NMR provides novel evidence and replication for genome-wide association of CYP2A6 single nucleotide and insertion–deletion polymorphisms. We identify three regions of genome-wide significance: proximal, intronic, and distal to CYP2A6. We replicate the top-ranking single nucleotide polymorphism from a recent GWAS of the NMR in Finnish smokers, identify a functional mechanism for this intronic variant from in silico analyses of RNA-seq data that is consistent with CYP2A6 expression measured in postmortem lung and liver, and provide additional support for the intergenic region between CYP2A6 and CYP2A7. PMID:27113016
Efficient CRISPR/Cas9-based genome editing in carrot cells.
Klimek-Chodacka, Magdalena; Oleszkiewicz, Tomasz; Lowder, Levi G; Qi, Yiping; Baranski, Rafal
2018-04-01
The first report presenting successful and efficient carrot genome editing using CRISPR/Cas9 system. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)/CRISPR-associated (Cas9) is a powerful genome editing tool that has been widely adopted in model organisms recently, but has not been used in carrot-a model species for in vitro culture studies and an important health-promoting crop grown worldwide. In this study, for the first time, we report application of the CRISPR/Cas9 system for efficient targeted mutagenesis of the carrot genome. Multiplexing CRISPR/Cas9 vectors expressing two single-guide RNA (gRNAs) targeting the carrot flavanone-3-hydroxylase (F3H) gene were tested for blockage of the anthocyanin biosynthesis in a model purple-colored callus using Agrobacterium-mediated genetic transformation. This approach allowed fast and visual comparison of three codon-optimized Cas9 genes and revealed that the most efficient one in generating F3H mutants was the Arabidopsis codon-optimized AteCas9 gene with up to 90% efficiency. Knockout of F3H gene resulted in the discoloration of calli, validating the functional role of this gene in the anthocyanin biosynthesis in carrot as well as providing a visual marker for screening successfully edited events. Most resulting mutations were small Indels, but long chromosome fragment deletions of 116-119 nt were also generated with simultaneous cleavage mediated by two gRNAs. The results demonstrate successful site-directed mutagenesis in carrot with CRISPR/Cas9 and the usefulness of a model callus culture to validate genome editing systems. Given that the carrot genome has been sequenced recently, our timely study sheds light on the promising application of genome editing tools for boosting basic and translational research in this important vegetable crop.
Li, Ting; Wang, Wei; Gong, Shunyou; Sun, Honghong; Zhang, Huqin; Yang, An-Gang; Chen, Youhai H; Li, Xinyuan
2018-05-19
The interplay between inflammation and metabolism is widely recognized, yet the underlying molecular mechanisms remain poorly characterized. Using experimental database mining and genome-wide gene expression profiling methods, we found that in contrast to other TNFAIP8 family members, TNFAIP8L2 (TIPE2) was preferentially expressed in human myeloid cell types. In addition, Tnfaip8l2 expression drastically decreased in lipopolysaccharide (LPS)-stimulated macrophages. Consequently, Tnfaip8l2 deficiency led to heightened expression of genes that were enriched for leukocyte activation and lipid biosynthesis pathways. Furthermore, mitochondrial respiration rate was increased in Tnfaip8l2-deficient macrophages, as measured by Seahorse metabolic analyzer. Taken together, these results indicate that Tnfaip8l2 serves as a "brake" for immunometabolism, which needs to be released for optimized metabolic reprogramming as well as mounting effective inflammatory responses. The unique anti-inflammatory and metabolic-modulatory function of TNFAIP8L2 renders it a novel therapeutic target for cardiovascular diseases and cancer. Copyright © 2018 Elsevier Ltd. All rights reserved.
Trescher, Saskia; Münchmeyer, Jannes; Leser, Ulf
2017-03-27
Gene regulation is one of the most important cellular processes, indispensable for the adaptability of organisms and closely interlinked with several classes of pathogenesis and their progression. Elucidation of regulatory mechanisms can be approached by a multitude of experimental methods, yet integration of the resulting heterogeneous, large, and noisy data sets into comprehensive and tissue or disease-specific cellular models requires rigorous computational methods. Recently, several algorithms have been proposed which model genome-wide gene regulation as sets of (linear) equations over the activity and relationships of transcription factors, genes and other factors. Subsequent optimization finds those parameters that minimize the divergence of predicted and measured expression intensities. In various settings, these methods produced promising results in terms of estimating transcription factor activity and identifying key biomarkers for specific phenotypes. However, despite their common root in mathematical optimization, they vastly differ in the types of experimental data being integrated, the background knowledge necessary for their application, the granularity of their regulatory model, the concrete paradigm used for solving the optimization problem and the data sets used for evaluation. Here, we review five recent methods of this class in detail and compare them with respect to several key properties. Furthermore, we quantitatively compare the results of four of the presented methods based on publicly available data sets. The results show that all methods seem to find biologically relevant information. However, we also observe that the mutual result overlaps are very low, which contradicts biological intuition. Our aim is to raise further awareness of the power of these methods, yet also to identify common shortcomings and necessary extensions enabling focused research on the critical points.
Cheng, Han; Koning, Katie; O'Hearn, Aileen; Wang, Minxiu; Rumschlag-Booms, Emily; Varhegyi, Elizabeth; Rong, Lijun
2015-11-24
Genome-wide RNAi screening has been widely used to identify host proteins involved in replication and infection of different viruses, and numerous host factors are implicated in the replication cycles of these viruses, demonstrating the power of this approach. However, discrepancies on target identification of the same viruses by different groups suggest that high throughput RNAi screening strategies need to be carefully designed, developed and optimized prior to the large scale screening. Two genome-wide RNAi screens were performed in parallel against the entry of pseudotyped Marburg viruses and avian influenza virus H5N1 utilizing an HIV-1 based surrogate system, to identify host factors which are important for virus entry. A comparative analysis approach was employed in data analysis, which alleviated systematic positional effects and reduced the false positive number of virus-specific hits. The parallel nature of the strategy allows us to easily identify the host factors for a specific virus with a greatly reduced number of false positives in the initial screen, which is one of the major problems with high throughput screening. The power of this strategy is illustrated by a genome-wide RNAi screen for identifying the host factors important for Marburg virus and/or avian influenza virus H5N1 as described in this study. This strategy is particularly useful for highly pathogenic viruses since pseudotyping allows us to perform high throughput screens in the biosafety level 2 (BSL-2) containment instead of the BSL-3 or BSL-4 for the infectious viruses, with alleviated safety concerns. The screening strategy together with the unique comparative analysis approach makes the data more suitable for hit selection and enables us to identify virus-specific hits with a much lower false positive rate.
Optimal False Discovery Rate Control for Dependent Data
Xie, Jichun; Cai, T. Tony; Maris, John; Li, Hongzhe
2013-01-01
This paper considers the problem of optimal false discovery rate control when the test statistics are dependent. An optimal joint oracle procedure, which minimizes the false non-discovery rate subject to a constraint on the false discovery rate is developed. A data-driven marginal plug-in procedure is then proposed to approximate the optimal joint procedure for multivariate normal data. It is shown that the marginal procedure is asymptotically optimal for multivariate normal data with a short-range dependent covariance structure. Numerical results show that the marginal procedure controls false discovery rate and leads to a smaller false non-discovery rate than several commonly used p-value based false discovery rate controlling methods. The procedure is illustrated by an application to a genome-wide association study of neuroblastoma and it identifies a few more genetic variants that are potentially associated with neuroblastoma than several p-value-based false discovery rate controlling procedures. PMID:23378870
FORESEE: Fully Outsourced secuRe gEnome Study basEd on homomorphic Encryption
2015-01-01
Background The increasing availability of genome data motivates massive research studies in personalized treatment and precision medicine. Public cloud services provide a flexible way to mitigate the storage and computation burden in conducting genome-wide association studies (GWAS). However, data privacy has been widely concerned when sharing the sensitive information in a cloud environment. Methods We presented a novel framework (FORESEE: Fully Outsourced secuRe gEnome Study basEd on homomorphic Encryption) to fully outsource GWAS (i.e., chi-square statistic computation) using homomorphic encryption. The proposed framework enables secure divisions over encrypted data. We introduced two division protocols (i.e., secure errorless division and secure approximation division) with a trade-off between complexity and accuracy in computing chi-square statistics. Results The proposed framework was evaluated for the task of chi-square statistic computation with two case-control datasets from the 2015 iDASH genome privacy protection challenge. Experimental results show that the performance of FORESEE can be significantly improved through algorithmic optimization and parallel computation. Remarkably, the secure approximation division provides significant performance gain, but without missing any significance SNPs in the chi-square association test using the aforementioned datasets. Conclusions Unlike many existing HME based studies, in which final results need to be computed by the data owner due to the lack of the secure division operation, the proposed FORESEE framework support complete outsourcing to the cloud and output the final encrypted chi-square statistics. PMID:26733391
FORESEE: Fully Outsourced secuRe gEnome Study basEd on homomorphic Encryption.
Zhang, Yuchen; Dai, Wenrui; Jiang, Xiaoqian; Xiong, Hongkai; Wang, Shuang
2015-01-01
The increasing availability of genome data motivates massive research studies in personalized treatment and precision medicine. Public cloud services provide a flexible way to mitigate the storage and computation burden in conducting genome-wide association studies (GWAS). However, data privacy has been widely concerned when sharing the sensitive information in a cloud environment. We presented a novel framework (FORESEE: Fully Outsourced secuRe gEnome Study basEd on homomorphic Encryption) to fully outsource GWAS (i.e., chi-square statistic computation) using homomorphic encryption. The proposed framework enables secure divisions over encrypted data. We introduced two division protocols (i.e., secure errorless division and secure approximation division) with a trade-off between complexity and accuracy in computing chi-square statistics. The proposed framework was evaluated for the task of chi-square statistic computation with two case-control datasets from the 2015 iDASH genome privacy protection challenge. Experimental results show that the performance of FORESEE can be significantly improved through algorithmic optimization and parallel computation. Remarkably, the secure approximation division provides significant performance gain, but without missing any significance SNPs in the chi-square association test using the aforementioned datasets. Unlike many existing HME based studies, in which final results need to be computed by the data owner due to the lack of the secure division operation, the proposed FORESEE framework support complete outsourcing to the cloud and output the final encrypted chi-square statistics.
Manavalan, Balachandran; Shin, Tae Hwan; Lee, Gwang
2018-01-05
DNase I hypersensitive sites (DHSs) are genomic regions that provide important information regarding the presence of transcriptional regulatory elements and the state of chromatin. Therefore, identifying DHSs in uncharacterized DNA sequences is crucial for understanding their biological functions and mechanisms. Although many experimental methods have been proposed to identify DHSs, they have proven to be expensive for genome-wide application. Therefore, it is necessary to develop computational methods for DHS prediction. In this study, we proposed a support vector machine (SVM)-based method for predicting DHSs, called DHSpred (DNase I Hypersensitive Site predictor in human DNA sequences), which was trained with 174 optimal features. The optimal combination of features was identified from a large set that included nucleotide composition and di- and trinucleotide physicochemical properties, using a random forest algorithm. DHSpred achieved a Matthews correlation coefficient and accuracy of 0.660 and 0.871, respectively, which were 3% higher than those of control SVM predictors trained with non-optimized features, indicating the efficiency of the feature selection method. Furthermore, the performance of DHSpred was superior to that of state-of-the-art predictors. An online prediction server has been developed to assist the scientific community, and is freely available at: http://www.thegleelab.org/DHSpred.html.
Manavalan, Balachandran; Shin, Tae Hwan; Lee, Gwang
2018-01-01
DNase I hypersensitive sites (DHSs) are genomic regions that provide important information regarding the presence of transcriptional regulatory elements and the state of chromatin. Therefore, identifying DHSs in uncharacterized DNA sequences is crucial for understanding their biological functions and mechanisms. Although many experimental methods have been proposed to identify DHSs, they have proven to be expensive for genome-wide application. Therefore, it is necessary to develop computational methods for DHS prediction. In this study, we proposed a support vector machine (SVM)-based method for predicting DHSs, called DHSpred (DNase I Hypersensitive Site predictor in human DNA sequences), which was trained with 174 optimal features. The optimal combination of features was identified from a large set that included nucleotide composition and di- and trinucleotide physicochemical properties, using a random forest algorithm. DHSpred achieved a Matthews correlation coefficient and accuracy of 0.660 and 0.871, respectively, which were 3% higher than those of control SVM predictors trained with non-optimized features, indicating the efficiency of the feature selection method. Furthermore, the performance of DHSpred was superior to that of state-of-the-art predictors. An online prediction server has been developed to assist the scientific community, and is freely available at: http://www.thegleelab.org/DHSpred.html PMID:29416743
Hou, Liping; Bergen, Sarah E.; Akula, Nirmala; Song, Jie; Hultman, Christina M.; Landén, Mikael; Adli, Mazda; Alda, Martin; Ardau, Raffaella; Arias, Bárbara; Aubry, Jean-Michel; Backlund, Lena; Badner, Judith A.; Barrett, Thomas B.; Bauer, Michael; Baune, Bernhard T.; Bellivier, Frank; Benabarre, Antonio; Bengesser, Susanne; Berrettini, Wade H.; Bhattacharjee, Abesh Kumar; Biernacka, Joanna M.; Birner, Armin; Bloss, Cinnamon S.; Brichant-Petitjean, Clara; Bui, Elise T.; Byerley, William; Cervantes, Pablo; Chillotti, Caterina; Cichon, Sven; Colom, Francesc; Coryell, William; Craig, David W.; Cruceanu, Cristiana; Czerski, Piotr M.; Davis, Tony; Dayer, Alexandre; Degenhardt, Franziska; Del Zompo, Maria; DePaulo, J. Raymond; Edenberg, Howard J.; Étain, Bruno; Falkai, Peter; Foroud, Tatiana; Forstner, Andreas J.; Frisén, Louise; Frye, Mark A.; Fullerton, Janice M.; Gard, Sébastien; Garnham, Julie S.; Gershon, Elliot S.; Goes, Fernando S.; Greenwood, Tiffany A.; Grigoroiu-Serbanescu, Maria; Hauser, Joanna; Heilbronner, Urs; Heilmann-Heimbach, Stefanie; Herms, Stefan; Hipolito, Maria; Hitturlingappa, Shashi; Hoffmann, Per; Hofmann, Andrea; Jamain, Stephane; Jiménez, Esther; Kahn, Jean-Pierre; Kassem, Layla; Kelsoe, John R.; Kittel-Schneider, Sarah; Kliwicki, Sebastian; Koller, Daniel L.; König, Barbara; Lackner, Nina; Laje, Gonzalo; Lang, Maren; Lavebratt, Catharina; Lawson, William B.; Leboyer, Marion; Leckband, Susan G.; Liu, Chunyu; Maaser, Anna; Mahon, Pamela B.; Maier, Wolfgang; Maj, Mario; Manchia, Mirko; Martinsson, Lina; McCarthy, Michael J.; McElroy, Susan L.; McInnis, Melvin G.; McKinney, Rebecca; Mitchell, Philip B.; Mitjans, Marina; Mondimore, Francis M.; Monteleone, Palmiero; Mühleisen, Thomas W.; Nievergelt, Caroline M.; Nöthen, Markus M.; Novák, Tomas; Nurnberger, John I.; Nwulia, Evaristus A.; Ösby, Urban; Pfennig, Andrea; Potash, James B.; Propping, Peter; Reif, Andreas; Reininghaus, Eva; Rice, John; Rietschel, Marcella; Rouleau, Guy A.; Rybakowski, Janusz K.; Schalling, Martin; Scheftner, William A.; Schofield, Peter R.; Schork, Nicholas J.; Schulze, Thomas G.; Schumacher, Johannes; Schweizer, Barbara W.; Severino, Giovanni; Shekhtman, Tatyana; Shilling, Paul D.; Simhandl, Christian; Slaney, Claire M.; Smith, Erin N.; Squassina, Alessio; Stamm, Thomas; Stopkova, Pavla; Streit, Fabian; Strohmaier, Jana; Szelinger, Szabolcs; Tighe, Sarah K.; Tortorella, Alfonso; Turecki, Gustavo; Vieta, Eduard; Volkert, Julia; Witt, Stephanie H.; Wright, Adam; Zandi, Peter P.; Zhang, Peng; Zollner, Sebastian; McMahon, Francis J.
2016-01-01
Bipolar disorder (BD) is a genetically complex mental illness characterized by severe oscillations of mood and behaviour. Genome-wide association studies (GWAS) have identified several risk loci that together account for a small portion of the heritability. To identify additional risk loci, we performed a two-stage meta-analysis of >9 million genetic variants in 9,784 bipolar disorder patients and 30,471 controls, the largest GWAS of BD to date. In this study, to increase power we used ∼2,000 lithium-treated cases with a long-term diagnosis of BD from the Consortium on Lithium Genetics, excess controls, and analytic methods optimized for markers on the X-chromosome. In addition to four known loci, results revealed genome-wide significant associations at two novel loci: an intergenic region on 9p21.3 (rs12553324, P = 5.87 × 10 − 9; odds ratio (OR) = 1.12) and markers within ERBB2 (rs2517959, P = 4.53 × 10 − 9; OR = 1.13). No significant X-chromosome associations were detected and X-linked markers explained very little BD heritability. The results add to a growing list of common autosomal variants involved in BD and illustrate the power of comparing well-characterized cases to an excess of controls in GWAS. PMID:27329760
Jayakody, Lahiru N; Tsuge, Keisuke; Suzuki, Akihiro; Shimoi, Hitoshi; Kitagaki, Hiroshi
2013-01-01
Because of the growing market for sports drinks, prevention of yeast contamination of these beverages is of significant concern. This research was performed to achieve insight into the physiology of yeast growing in sports drinks through a genome-wide approach to prevent microbial spoilage of sports drinks. The genome-wide gene expression profile of Saccharomyces cerevisiae growing in the representative sports drink was investigated. Genes that were relevant to sulphate ion starvation response were upregulated in the yeast cells growing in the drink. These results suggest that yeast cells are suffering from deficiency of extracellular sulphate ions during growth in the sports drink. Indeed, the concentration of sulphate ions was far lower in the sports drink than in a medium that allows the optimal growth of yeast. To prove the starvation of sulphate ions of yeast, several ions were added to the beverage and its effects were investigated. The addition of sulphate ions, but not chloride ions or sodium ions, to the beverage stimulated yeast growth in the beverage in a dose-dependent manner. Moreover, the addition of sulphate ions to the sports drink increased the biosynthesis of sulphur-containing amino acids in yeast cells and hydrogen sulphide in the beverage. These results indicate that sulphate ion concentration should be regulated to prevent microbial spoilage of sports drinks.
Haplotag: Software for Haplotype-Based Genotyping-by-Sequencing Analysis
Tinker, Nicholas A.; Bekele, Wubishet A.; Hattori, Jiro
2016-01-01
Genotyping-by-sequencing (GBS), and related methods, are based on high-throughput short-read sequencing of genomic complexity reductions followed by discovery of single nucleotide polymorphisms (SNPs) within sequence tags. This provides a powerful and economical approach to whole-genome genotyping, facilitating applications in genomics, diversity analysis, and molecular breeding. However, due to the complexity of analyzing large data sets, applications of GBS may require substantial time, expertise, and computational resources. Haplotag, the novel GBS software described here, is freely available, and operates with minimal user-investment on widely available computer platforms. Haplotag is unique in fulfilling the following set of criteria: (1) operates without a reference genome; (2) can be used in a polyploid species; (3) provides a discovery mode, and a production mode; (4) discovers polymorphisms based on a model of tag-level haplotypes within sequenced tags; (5) reports SNPs as well as haplotype-based genotypes; and (6) provides an intuitive visual “passport” for each inferred locus. Haplotag is optimized for use in a self-pollinating plant species. PMID:26818073
Cyanobacterial Biofuels: Strategies and Developments on Network and Modeling.
Klanchui, Amornpan; Raethong, Nachon; Prommeenate, Peerada; Vongsangnak, Wanwipa; Meechai, Asawin
Cyanobacteria, the phototrophic microorganisms, have attracted much attention recently as a promising source for environmentally sustainable biofuels production. However, barriers for commercial markets of cyanobacteria-based biofuels concern the economic feasibility. Miscellaneous strategies for improving the production performance of cyanobacteria have thus been developed. Among these, the simple ad hoc strategies resulting in failure to optimize fully cell growth coupled with desired product yield are explored. With the advancement of genomics and systems biology, a new paradigm toward systems metabolic engineering has been recognized. In particular, a genome-scale metabolic network reconstruction and modeling is a crucial systems-based tool for whole-cell-wide investigation and prediction. In this review, the cyanobacterial genome-scale metabolic models, which offer a system-level understanding of cyanobacterial metabolism, are described. The main process of metabolic network reconstruction and modeling of cyanobacteria are summarized. Strategies and developments on genome-scale network and modeling through the systems metabolic engineering approach are advanced and employed for efficient cyanobacterial-based biofuels production.
Sugano, Shigeo S; Suzuki, Hiroko; Shimokita, Eisuke; Chiba, Hirofumi; Noji, Sumihare; Osakabe, Yuriko; Osakabe, Keishi
2017-04-28
Mushroom-forming basidiomycetes produce a wide range of metabolites and have great value not only as food but also as an important global natural resource. Here, we demonstrate CRISPR/Cas9-based genome editing in the model species Coprinopsis cinerea. Using a high-throughput reporter assay with cryopreserved protoplasts, we identified a novel promoter, CcDED1 pro , with seven times stronger activity in this assay than the conventional promoter GPD2. To develop highly efficient genome editing using CRISPR/Cas9 in C. cinerea, we used the CcDED1 pro to express Cas9 and a U6-snRNA promoter from C. cinerea to express gRNA. Finally, CRISPR/Cas9-mediated GFP mutagenesis was performed in a stable GFP expression line. Individual genome-edited lines were isolated, and loss of GFP function was detected in hyphae and fruiting body primordia. This novel method of high-throughput CRISPR/Cas9-based genome editing using cryopreserved protoplasts should be a powerful tool in the study of edible mushrooms.
Repurposing CRISPR/Cas9 for in situ functional assays.
Malina, Abba; Mills, John R; Cencic, Regina; Yan, Yifei; Fraser, James; Schippers, Laura M; Paquet, Marilène; Dostie, Josée; Pelletier, Jerry
2013-12-01
RNAi combined with next-generation sequencing has proven to be a powerful and cost-effective genetic screening platform in mammalian cells. Still, this technology has its limitations and is incompatible with in situ mutagenesis screens on a genome-wide scale. Using p53 as a proof-of-principle target, we readapted the CRISPR (clustered regularly interspaced short palindromic repeats)/Cas9 (CRISPR associated 9) genome-editing system to demonstrate the feasibility of this methodology for targeted gene disruption positive selection assays. By using novel "all-in-one" lentiviral and retroviral delivery vectors heterologously expressing both a codon-optimized Cas9 and its synthetic guide RNA (sgRNA), we show robust selection for the CRISPR-modified Trp53 locus following drug treatment. Furthermore, by linking Cas9 expression to GFP fluorescence, we use an "all-in-one" system to track disrupted Trp53 in chemoresistant lymphomas in the Eμ-myc mouse model. Deep sequencing analysis of the tumor-derived endogenous Cas9-modified Trp53 locus revealed a wide spectrum of mutants that were enriched with seemingly limited off-target effects. Taken together, these results establish Cas9 genome editing as a powerful and practical approach for positive in situ genetic screens.
Repurposing CRISPR/Cas9 for in situ functional assays
Malina, Abba; Mills, John R.; Cencic, Regina; Yan, Yifei; Fraser, James; Schippers, Laura M.; Paquet, Marilène; Dostie, Josée; Pelletier, Jerry
2013-01-01
RNAi combined with next-generation sequencing has proven to be a powerful and cost-effective genetic screening platform in mammalian cells. Still, this technology has its limitations and is incompatible with in situ mutagenesis screens on a genome-wide scale. Using p53 as a proof-of-principle target, we readapted the CRISPR (clustered regularly interspaced short palindromic repeats)/Cas9 (CRISPR associated 9) genome-editing system to demonstrate the feasibility of this methodology for targeted gene disruption positive selection assays. By using novel “all-in-one” lentiviral and retroviral delivery vectors heterologously expressing both a codon-optimized Cas9 and its synthetic guide RNA (sgRNA), we show robust selection for the CRISPR-modified Trp53 locus following drug treatment. Furthermore, by linking Cas9 expression to GFP fluorescence, we use an “all-in-one” system to track disrupted Trp53 in chemoresistant lymphomas in the Eμ-myc mouse model. Deep sequencing analysis of the tumor-derived endogenous Cas9-modified Trp53 locus revealed a wide spectrum of mutants that were enriched with seemingly limited off-target effects. Taken together, these results establish Cas9 genome editing as a powerful and practical approach for positive in situ genetic screens. PMID:24298059
Zou, Meng; Liu, Zhaoqi; Zhang, Xiang-Sun; Wang, Yong
2015-10-15
In prognosis and survival studies, an important goal is to identify multi-biomarker panels with predictive power using molecular characteristics or clinical observations. Such analysis is often challenged by censored, small-sample-size, but high-dimensional genomic profiles or clinical data. Therefore, sophisticated models and algorithms are in pressing need. In this study, we propose a novel Area Under Curve (AUC) optimization method for multi-biomarker panel identification named Nearest Centroid Classifier for AUC optimization (NCC-AUC). Our method is motived by the connection between AUC score for classification accuracy evaluation and Harrell's concordance index in survival analysis. This connection allows us to convert the survival time regression problem to a binary classification problem. Then an optimization model is formulated to directly maximize AUC and meanwhile minimize the number of selected features to construct a predictor in the nearest centroid classifier framework. NCC-AUC shows its great performance by validating both in genomic data of breast cancer and clinical data of stage IB Non-Small-Cell Lung Cancer (NSCLC). For the genomic data, NCC-AUC outperforms Support Vector Machine (SVM) and Support Vector Machine-based Recursive Feature Elimination (SVM-RFE) in classification accuracy. It tends to select a multi-biomarker panel with low average redundancy and enriched biological meanings. Also NCC-AUC is more significant in separation of low and high risk cohorts than widely used Cox model (Cox proportional-hazards regression model) and L1-Cox model (L1 penalized in Cox model). These performance gains of NCC-AUC are quite robust across 5 subtypes of breast cancer. Further in an independent clinical data, NCC-AUC outperforms SVM and SVM-RFE in predictive accuracy and is consistently better than Cox model and L1-Cox model in grouping patients into high and low risk categories. In summary, NCC-AUC provides a rigorous optimization framework to systematically reveal multi-biomarker panel from genomic and clinical data. It can serve as a useful tool to identify prognostic biomarkers for survival analysis. NCC-AUC is available at http://doc.aporc.org/wiki/NCC-AUC. ywang@amss.ac.cn Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Pradeepkiran, Jangampalli Adi; Sainath, Sri Bhashyam; Kumar, Konidala Kranthi; Bhaskar, Matcha
2015-01-01
Brucella melitensis 16M is a Gram-negative coccobacillus that infects both animals and humans. It causes a disease known as brucellosis, which is characterized by acute febrile illness in humans and causes abortions in livestock. To prevent and control brucellosis, identification of putative drug targets is crucial. The present study aimed to identify drug targets in B. melitensis 16M by using a subtractive genomic approach. We used available database repositories (Database of Essential Genes, Kyoto Encyclopedia of Genes and Genomes Automatic Annotation Server, and Kyoto Encyclopedia of Genes and Genomes) to identify putative genes that are nonhomologous to humans and essential for pathogen B. melitensis 16M. The results revealed that among 3 Mb genome size of pathogen, 53 putative characterized and 13 uncharacterized hypothetical genes were identified; further, from Basic Local Alignment Search Tool protein analysis, one hypothetical protein showed a close resemblance (50%) to Silicibacter pomeroyi DUF1285 family protein (2RE3). A further homology model of the target was constructed using MODELLER 9.12 and optimized through variable target function method by molecular dynamics optimization with simulating annealing. The stereochemical quality of the restrained model was evaluated by PROCHECK, VERIFY-3D, ERRAT, and WHATIF servers. Furthermore, structure-based virtual screening was carried out against the predicted active site of the respective protein using the glycerol structural analogs from the PubChem database. We identified five best inhibitors with strong affinities, stable interactions, and also with reliable drug-like properties. Hence, these leads might be used as the most effective inhibitors of modeled protein. The outcome of the present work of virtual screening of putative gene targets might facilitate design of potential drugs for better treatment against brucellosis. PMID:25834405
Boland, P M; Ruth, K; Matro, J M; Rainey, K L; Fang, C Y; Wong, Y N; Daly, M B; Hall, M J
2015-12-01
Genomic tests are increasingly complex, less expensive, and more widely available with the advent of next-generation sequencing (NGS). We assessed knowledge and perceptions among genetic counselors pertaining to NGS genomic testing via an online survey. Associations between selected characteristics and perceptions were examined. Recent education on NGS testing was common, but practical experience limited. Perceived understanding of clinical NGS was modest, specifically concerning tumor testing. Greater perceived understanding of clinical NGS testing correlated with more time spent in cancer-related counseling, exposure to NGS testing, and NGS-focused education. Substantial disagreement about the role of counseling for tumor-based testing was seen. Finally, a majority of counselors agreed with the need for more education about clinical NGS testing, supporting this approach to optimizing implementation. © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Infrastructure for Personalized Medicine at Partners HealthCare
Weiss, Scott T.; Shin, Meini Sumbada
2016-01-01
Partners HealthCare Personalized Medicine (PPM) is a center within the Partners HealthCare system (founded by Massachusetts General Hospital and Brigham and Women’s Hospital) whose mission is to utilize genetics and genomics to improve the care of patients in a cost effective manner. PPM consists of five interconnected components: (1) Laboratory for Molecular Medicine (LMM), a CLIA laboratory performing genetic testing for patients world-wide; (2) Translational Genomics Core (TGC), a core laboratory providing genomic platforms for Partners investigators; (3) Partners Biobank, a biobank of samples (DNA, plasma and serum) for 50,000 Consented Partners patients; (4) Biobank Portal, an IT infrastructure and viewer to bring together genotypes, samples, phenotypes (validated diagnoses, radiology, and clinical chemistry) from the electronic medical record to Partners investigators. These components are united by (5) a common IT system that brings researchers, clinicians, and patients together for optimal research and patient care. PMID:26927187
Goya, Stephanie; Valinotto, Laura E; Tittarelli, Estefania; Rojo, Gabriel L; Nabaes Jodar, Mercedes S; Greninger, Alexander L; Zaiat, Jonathan J; Marti, Marcelo A; Mistchenko, Alicia S; Viegas, Mariana
2018-01-01
Over the last decade, the number of viral genome sequences deposited in available databases has grown exponentially. However, sequencing methodology vary widely and many published works have relied on viral enrichment by viral culture or nucleic acid amplification with specific primers rather than through unbiased techniques such as metagenomics. The genome of RNA viruses is highly variable and these enrichment methodologies may be difficult to achieve or may bias the results. In order to obtain genomic sequences of human respiratory syncytial virus (HRSV) from positive nasopharyngeal aspirates diverse methodologies were evaluated and compared. A total of 29 nearly complete and complete viral genomes were obtained. The best performance was achieved with a DNase I treatment to the RNA directly extracted from the nasopharyngeal aspirate (NPA), sequence-independent single-primer amplification (SISPA) and library preparation performed with Nextera XT DNA Library Prep Kit with manual normalization. An average of 633,789 and 1,674,845 filtered reads per library were obtained with MiSeq and NextSeq 500 platforms, respectively. The higher output of NextSeq 500 was accompanied by the increasing of duplicated reads percentage generated during SISPA (from an average of 1.5% duplicated viral reads in MiSeq to an average of 74% in NextSeq 500). HRSV genome recovery was not affected by the presence or absence of duplicated reads but the computational demand during the analysis was increased. Considering that only samples with viral load ≥ E+06 copies/ml NPA were tested, no correlation between sample viral loads and number of total filtered reads was observed, nor with the mapped viral reads. The HRSV genomes showed a mean coverage of 98.46% with the best methodology. In addition, genomes of human metapneumovirus (HMPV), human rhinovirus (HRV) and human parainfluenza virus types 1-3 (HPIV1-3) were also obtained with the selected optimal methodology.
Garcia-Vallvé, Santiago; Guasch, Laura; Tomas-Hernández, Sarah; del Bas, Josep Maria; Ollendorff, Vincent; Arola, Lluís; Pujadas, Gerard; Mulero, Miquel
2015-07-23
Thiazolidinediones (TZDs), such as rosiglitazone and pioglitazone, are peroxisome proliferator-activated receptor γ (PPARγ) full agonists that have been widely used in the treatment of type 2 diabetes mellitus. Despite the demonstrated beneficial effect of reducing glucose levels in the plasma, TZDs also induce several adverse effects. Consequently, the search for new compounds with potent antidiabetic effects but fewer undesired effects is an active field of research. Interestingly, the novel proposed mechanisms for the antidiabetic activity of PPARγ agonists, consisting of PPARγ Ser273 phosphorylation inhibition, ligand and receptor mutual dynamics, and the presence of an alternate binding site, have recently changed the view regarding the optimal characteristics for the screening of novel PPARγ ligands. Furthermore, transcriptional genomics could bring essential information about the genome-wide effects of PPARγ ligands. Consequently, facing the new mechanistic scenario proposed for these compounds is essential for resolving the paradoxes among their agonistic function, antidiabetic activities, and side effects and should allow the rational development of better and safer PPARγ-mediated antidiabetic drugs.
Next-generation libraries for robust RNA interference-based genome-wide screens
Kampmann, Martin; Horlbeck, Max A.; Chen, Yuwen; Tsai, Jordan C.; Bassik, Michael C.; Gilbert, Luke A.; Villalta, Jacqueline E.; Kwon, S. Chul; Chang, Hyeshik; Kim, V. Narry; Weissman, Jonathan S.
2015-01-01
Genetic screening based on loss-of-function phenotypes is a powerful discovery tool in biology. Although the recent development of clustered regularly interspaced short palindromic repeats (CRISPR)-based screening approaches in mammalian cell culture has enormous potential, RNA interference (RNAi)-based screening remains the method of choice in several biological contexts. We previously demonstrated that ultracomplex pooled short-hairpin RNA (shRNA) libraries can largely overcome the problem of RNAi off-target effects in genome-wide screens. Here, we systematically optimize several aspects of our shRNA library, including the promoter and microRNA context for shRNA expression, selection of guide strands, and features relevant for postscreen sample preparation for deep sequencing. We present next-generation high-complexity libraries targeting human and mouse protein-coding genes, which we grouped into 12 sublibraries based on biological function. A pilot screen suggests that our next-generation RNAi library performs comparably to current CRISPR interference (CRISPRi)-based approaches and can yield complementary results with high sensitivity and high specificity. PMID:26080438
Lenz, Tobias L.; Mueller, Birte; Trillmich, Fritz; Wolf, Jochen B. W.
2013-01-01
It is still debated whether main individual fitness differences in natural populations can be attributed to genome-wide effects or to particular loci of outstanding functional importance such as the major histocompatibility complex (MHC). In a long-term monitoring project on Galápagos sea lions (Zalophus wollebaeki), we collected comprehensive fitness and mating data for a total of 506 individuals. Controlling for genome-wide inbreeding, we find strong associations between the MHC locus and nearly all fitness traits. The effect was mainly attributable to MHC sequence divergence and could be decomposed into contributions of own and maternal genotypes. In consequence, the population seems to have evolved a pool of highly divergent alleles conveying near-optimal MHC divergence even by random mating. Our results demonstrate that a single locus can significantly contribute to fitness in the wild and provide conclusive evidence for the ‘divergent allele advantage’ hypothesis, a special form of balancing selection with interesting evolutionary implications. PMID:23677346
Evaluation of whole genome amplified DNA to decrease material expenditure and increase quality.
Bækvad-Hansen, Marie; Bybjerg-Grauholm, Jonas; Poulsen, Jesper B; Hansen, Christine S; Hougaard, David M; Hollegaard, Mads V
2017-06-01
The overall aim of this study is to evaluate whole genome amplification of DNA extracted from dried blood spot samples. We wish to explore ways of optimizing the amplification process, while decreasing the amount of input material and inherently the cost. Our primary focus of optimization is on the amount of input material, the amplification reaction volume, the number of replicates and amplification time and temperature. Increasing the quality of the amplified DNA and the subsequent results of array genotyping is a secondary aim of this project. This study is based on DNA extracted from dried blood spot samples. The extracted DNA was subsequently whole genome amplified using the REPLIg kit and genotyped on the PsychArray BeadChip (assessing > 570,000 SNPs genome wide). We used Genome Studio to evaluate the quality of the genotype data by call rates and log R ratios. The whole genome amplification process is robust and does not vary between replicates. Altering amplification time, temperature or number of replicates did not affect our results. We found that spot size i.e. amount of input material could be reduced without compromising the quality of the array genotyping data. We also showed that whole genome amplification reaction volumes can be reduced by a factor of 4, without compromising the DNA quality. Whole genome amplified DNA samples from dried blood spots is well suited for array genotyping and produces robust and reliable genotype data. However, the amplification process introduces additional noise to the data, making detection of structural variants such as copy number variants difficult. With this study, we explore ways of optimizing the amplification protocol in order to reduce noise and increase data quality. We found, that the amplification process was very robust, and that changes in amplification time or temperature did not alter the genotyping calls or quality of the array data. Adding additional replicates of each sample also lead to insignificant changes in the array data. Thus, the amount of noise introduced by the amplification process was consistent regardless of changes made to the amplification protocol. We also explored ways of decreasing material expenditure by reducing the spot size or the amplification reaction volume. The reduction did not affect the quality of the genotyping data.
Optimal use of tandem biotin and V5 tags in ChIP assays
Kolodziej, Katarzyna E; Pourfarzad, Farzin; de Boer, Ernie; Krpic, Sanja; Grosveld, Frank; Strouboulis, John
2009-01-01
Background Chromatin immunoprecipitation (ChIP) assays coupled to genome arrays (Chip-on-chip) or massive parallel sequencing (ChIP-seq) lead to the genome wide identification of binding sites of chromatin associated proteins. However, the highly variable quality of antibodies and the availability of epitopes in crosslinked chromatin can compromise genomic ChIP outcomes. Epitope tags have often been used as more reliable alternatives. In addition, we have employed protein in vivo biotinylation tagging as a very high affinity alternative to antibodies. In this paper we describe the optimization of biotinylation tagging for ChIP and its coupling to a known epitope tag in providing a reliable and efficient alternative to antibodies. Results Using the biotin tagged erythroid transcription factor GATA-1 as example, we describe several optimization steps for the application of the high affinity biotin streptavidin system in ChIP. We find that the omission of SDS during sonication, the use of fish skin gelatin as blocking agent and choice of streptavidin beads can lead to significantly improved ChIP enrichments and lower background compared to antibodies. We also show that the V5 epitope tag performs equally well under the conditions worked out for streptavidin ChIP and that it may suffer less from the effects of formaldehyde crosslinking. Conclusion The combined use of the very high affinity biotin tag with the less sensitive to crosslinking V5 tag provides for a flexible ChIP platform with potential implications in ChIP sequencing outcomes. PMID:19196479
Weng, Ziqing; Wolc, Anna; Shen, Xia; Fernando, Rohan L; Dekkers, Jack C M; Arango, Jesus; Settar, Petek; Fulton, Janet E; O'Sullivan, Neil P; Garrick, Dorian J
2016-03-19
Genomic estimated breeding values (GEBV) based on single nucleotide polymorphism (SNP) genotypes are widely used in animal improvement programs. It is typically assumed that the larger the number of animals is in the training set, the higher is the prediction accuracy of GEBV. The aim of this study was to quantify genomic prediction accuracy depending on the number of ancestral generations included in the training set, and to determine the optimal number of training generations for different traits in an elite layer breeding line. Phenotypic records for 16 traits on 17,793 birds were used. All parents and some selection candidates from nine non-overlapping generations were genotyped for 23,098 segregating SNPs. An animal model with pedigree relationships (PBLUP) and the BayesB genomic prediction model were applied to predict EBV or GEBV at each validation generation (progeny of the most recent training generation) based on varying numbers of immediately preceding ancestral generations. Prediction accuracy of EBV or GEBV was assessed as the correlation between EBV and phenotypes adjusted for fixed effects, divided by the square root of trait heritability. The optimal number of training generations that resulted in the greatest prediction accuracy of GEBV was determined for each trait. The relationship between optimal number of training generations and heritability was investigated. On average, accuracies were higher with the BayesB model than with PBLUP. Prediction accuracies of GEBV increased as the number of closely-related ancestral generations included in the training set increased, but reached an asymptote or slightly decreased when distant ancestral generations were used in the training set. The optimal number of training generations was 4 or more for high heritability traits but less than that for low heritability traits. For less heritable traits, limiting the training datasets to individuals closely related to the validation population resulted in the best predictions. The effect of adding distant ancestral generations in the training set on prediction accuracy differed between traits and the optimal number of necessary training generations is associated with the heritability of traits.
The Glyphosate-Based Herbicide Roundup Does not Elevate Genome-Wide Mutagenesis of Escherichia coli.
Tincher, Clayton; Long, Hongan; Behringer, Megan; Walker, Noah; Lynch, Michael
2017-10-05
Mutations induced by pollutants may promote pathogen evolution, for example by accelerating mutations conferring antibiotic resistance. Generally, evaluating the genome-wide mutagenic effects of long-term sublethal pollutant exposure at single-nucleotide resolution is extremely difficult. To overcome this technical barrier, we use the mutation accumulation/whole-genome sequencing (MA/WGS) method as a mutagenicity test, to quantitatively evaluate genome-wide mutagenesis of Escherichia coli after long-term exposure to a wide gradient of the glyphosate-based herbicide (GBH) Roundup Concentrate Plus. The genome-wide mutation rate decreases as GBH concentration increases, suggesting that even long-term GBH exposure does not compromise the genome stability of bacteria. Copyright © 2017 Tincher et al.
Dillon, Marcus M; Sung, Way; Sebra, Robert; Lynch, Michael; Cooper, Vaughn S
2017-01-01
The vast diversity in nucleotide composition and architecture among bacterial genomes may be partly explained by inherent biases in the rates and spectra of spontaneous mutations. Bacterial genomes with multiple chromosomes are relatively unusual but some are relevant to human health, none more so than the causative agent of cholera, Vibrio cholerae Here, we present the genome-wide mutation spectra in wild-type and mismatch repair (MMR) defective backgrounds of two Vibrio species, the low-%GC squid symbiont V. fischeri and the pathogen V. cholerae, collected under conditions that greatly minimize the efficiency of natural selection. In apparent contrast to their high diversity in nature, both wild-type V. fischeri and V. cholerae have among the lowest rates for base-substitution mutations (bpsms) and insertion-deletion mutations (indels) that have been measured, below 10 - 3 /genome/generation. Vibrio fischeri and V. cholerae have distinct mutation spectra, but both are AT-biased and produce a surprising number of multi-nucleotide indels. Furthermore, the loss of a functional MMR system caused the mutation spectra of these species to converge, implying that the MMR system itself contributes to species-specific mutation patterns. Bpsm and indel rates varied among genome regions, but do not explain the more rapid evolutionary rates of genes on chromosome 2, which likely result from weaker purifying selection. More generally, the very low mutation rates of Vibrio species correlate inversely with their immense population sizes and suggest that selection may not only have maximized replication fidelity but also optimized other polygenic traits relative to the constraints of genetic drift. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Genome-wide screening and identification of antigens for rickettsial vaccine development
USDA-ARS?s Scientific Manuscript database
The capacity to identify immunogens for vaccine development by genome-wide screening has been markedly enhanced by the availability of complete microbial genome sequences coupled to rapid proteomic and bioinformatic analysis. Critical to this genome-wide screening is in vivo testing in the context o...
Deshmukh, Rupesh K; Sonah, Humira; Bélanger, Richard R
2016-01-01
Aquaporins (AQPs) are channel-forming integral membrane proteins that facilitate the movement of water and many other small molecules. Compared to animals, plants contain a much higher number of AQPs in their genome. Homology-based identification of AQPs in sequenced species is feasible because of the high level of conservation of protein sequences across plant species. Genome-wide characterization of AQPs has highlighted several important aspects such as distribution, genetic organization, evolution and conserved features governing solute specificity. From a functional point of view, the understanding of AQP transport system has expanded rapidly with the help of transcriptomics and proteomics data. The efficient analysis of enormous amounts of data generated through omic scale studies has been facilitated through computational advancements. Prediction of protein tertiary structures, pore architecture, cavities, phosphorylation sites, heterodimerization, and co-expression networks has become more sophisticated and accurate with increasing computational tools and pipelines. However, the effectiveness of computational approaches is based on the understanding of physiological and biochemical properties, transport kinetics, solute specificity, molecular interactions, sequence variations, phylogeny and evolution of aquaporins. For this purpose, tools like Xenopus oocyte assays, yeast expression systems, artificial proteoliposomes, and lipid membranes have been efficiently exploited to study the many facets that influence solute transport by AQPs. In the present review, we discuss genome-wide identification of AQPs in plants in relation with recent advancements in analytical tools, and their availability and technological challenges as they apply to AQPs. An exhaustive review of omics resources available for AQP research is also provided in order to optimize their efficient utilization. Finally, a detailed catalog of computational tools and analytical pipelines is offered as a resource for AQP research.
Bouvet, J-M; Makouanzi, G; Cros, D; Vigneron, Ph
2016-01-01
Hybrids are broadly used in plant breeding and accurate estimation of variance components is crucial for optimizing genetic gain. Genome-wide information may be used to explore models designed to assess the extent of additive and non-additive variance and test their prediction accuracy for the genomic selection. Ten linear mixed models, involving pedigree- and marker-based relationship matrices among parents, were developed to estimate additive (A), dominance (D) and epistatic (AA, AD and DD) effects. Five complementary models, involving the gametic phase to estimate marker-based relationships among hybrid progenies, were developed to assess the same effects. The models were compared using tree height and 3303 single-nucleotide polymorphism markers from 1130 cloned individuals obtained via controlled crosses of 13 Eucalyptus urophylla females with 9 Eucalyptus grandis males. Akaike information criterion (AIC), variance ratios, asymptotic correlation matrices of estimates, goodness-of-fit, prediction accuracy and mean square error (MSE) were used for the comparisons. The variance components and variance ratios differed according to the model. Models with a parent marker-based relationship matrix performed better than those that were pedigree-based, that is, an absence of singularities, lower AIC, higher goodness-of-fit and accuracy and smaller MSE. However, AD and DD variances were estimated with high s.es. Using the same criteria, progeny gametic phase-based models performed better in fitting the observations and predicting genetic values. However, DD variance could not be separated from the dominance variance and null estimates were obtained for AA and AD effects. This study highlighted the advantages of progeny models using genome-wide information. PMID:26328760
Deshmukh, Rupesh K.; Sonah, Humira; Bélanger, Richard R.
2016-01-01
Aquaporins (AQPs) are channel-forming integral membrane proteins that facilitate the movement of water and many other small molecules. Compared to animals, plants contain a much higher number of AQPs in their genome. Homology-based identification of AQPs in sequenced species is feasible because of the high level of conservation of protein sequences across plant species. Genome-wide characterization of AQPs has highlighted several important aspects such as distribution, genetic organization, evolution and conserved features governing solute specificity. From a functional point of view, the understanding of AQP transport system has expanded rapidly with the help of transcriptomics and proteomics data. The efficient analysis of enormous amounts of data generated through omic scale studies has been facilitated through computational advancements. Prediction of protein tertiary structures, pore architecture, cavities, phosphorylation sites, heterodimerization, and co-expression networks has become more sophisticated and accurate with increasing computational tools and pipelines. However, the effectiveness of computational approaches is based on the understanding of physiological and biochemical properties, transport kinetics, solute specificity, molecular interactions, sequence variations, phylogeny and evolution of aquaporins. For this purpose, tools like Xenopus oocyte assays, yeast expression systems, artificial proteoliposomes, and lipid membranes have been efficiently exploited to study the many facets that influence solute transport by AQPs. In the present review, we discuss genome-wide identification of AQPs in plants in relation with recent advancements in analytical tools, and their availability and technological challenges as they apply to AQPs. An exhaustive review of omics resources available for AQP research is also provided in order to optimize their efficient utilization. Finally, a detailed catalog of computational tools and analytical pipelines is offered as a resource for AQP research. PMID:28066459
Walsh, Christopher T
2017-07-01
Antibiotics are a therapeutic class that, once deployed, select for resistant bacterial pathogens and so shorten their useful life cycles. As a consequence new versions of antibiotics are constantly needed. Among the antibiotic natural products, morphed peptide scaffolds, converting conformationally mobile, short-lived linear peptides into compact, rigidified small molecule frameworks, act on a wide range of bacterial targets. Advances in bacterial genome mining, biosynthetic gene cluster prediction and expression, and mass spectroscopic structure analysis suggests many more peptides, modified both in side chains and peptide backbones, await discovery. Such molecules may turn up new bacterial targets and be starting points for combinatorial or semisynthetic manipulations to optimize activity and pharmacology parameters.
Talkowski, Michael E; Ernst, Carl; Heilbut, Adrian; Chiang, Colby; Hanscom, Carrie; Lindgren, Amelia; Kirby, Andrew; Liu, Shangtao; Muddukrishna, Bhavana; Ohsumi, Toshiro K; Shen, Yiping; Borowsky, Mark; Daly, Mark J; Morton, Cynthia C; Gusella, James F
2011-04-08
The contribution of balanced chromosomal rearrangements to complex disorders remains unclear because they are not detected routinely by genome-wide microarrays and clinical localization is imprecise. Failure to consider these events bypasses a potentially powerful complement to single nucleotide polymorphism and copy-number association approaches to complex disorders, where much of the heritability remains unexplained. To capitalize on this genetic resource, we have applied optimized sequencing and analysis strategies to test whether these potentially high-impact variants can be mapped at reasonable cost and throughput. By using a whole-genome multiplexing strategy, rearrangement breakpoints could be delineated at a fraction of the cost of standard sequencing. For rearrangements already mapped regionally by karyotyping and fluorescence in situ hybridization, a targeted approach enabled capture and sequencing of multiple breakpoints simultaneously. Importantly, this strategy permitted capture and unique alignment of up to 97% of repeat-masked sequences in the targeted regions. Genome-wide analyses estimate that only 3.7% of bases should be routinely omitted from genomic DNA capture experiments. Illustrating the power of these approaches, the rearrangement breakpoints were rapidly defined to base pair resolution and revealed unexpected sequence complexity, such as co-occurrence of inversion and translocation as an underlying feature of karyotypically balanced alterations. These findings have implications ranging from genome annotation to de novo assemblies and could enable sequencing screens for structural variations at a cost comparable to that of microarrays in standard clinical practice. Copyright © 2011 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
2011-09-01
Almasy, L, Blangero, J. (2009) Human QTL linkage mapping. Genetica 136:333-340. Amos, CI. (2007) Successful design and conduct of genome-wide...quantitative trait loci. Genetica 136:237-243. Skol AD, Scott LJ, Abecasis GR, Boehnke M. (2006) Joint analysis is more efficient than replication
Dittmar, John C.; Pierce, Steven; Rothstein, Rodney; Reid, Robert J. D.
2013-01-01
Genome-wide experiments often measure quantitative differences between treated and untreated cells to identify affected strains. For these studies, statistical models are typically used to determine significance cutoffs. We developed a method termed “CLIK” (Cutoff Linked to Interaction Knowledge) that overlays biological knowledge from the interactome on screen results to derive a cutoff. The method takes advantage of the fact that groups of functionally related interacting genes often respond similarly to experimental conditions and, thus, cluster in a ranked list of screen results. We applied CLIK analysis to five screens of the yeast gene disruption library and found that it defined a significance cutoff that differed from traditional statistics. Importantly, verification experiments revealed that the CLIK cutoff correlated with the position in the rank order where the rate of true positives drops off significantly. In addition, the gene sets defined by CLIK analysis often provide further biological perspectives. For example, applying CLIK analysis retrospectively to a screen for cisplatin sensitivity allowed us to identify the importance of the Hrq1 helicase in DNA crosslink repair. Furthermore, we demonstrate the utility of CLIK to determine optimal treatment conditions by analyzing genome-wide screens at multiple rapamycin concentrations. We show that CLIK is an extremely useful tool for evaluating screen quality, determining screen cutoffs, and comparing results between screens. Furthermore, because CLIK uses previously annotated interaction data to determine biologically informed cutoffs, it provides additional insights into screen results, which supplement traditional statistical approaches. PMID:23589890
Implementation of Quality Management in Core Service Laboratories
Creavalle, T.; Haque, K.; Raley, C.; Subleski, M.; Smith, M.W.; Hicks, B.
2010-01-01
CF-28 The Genetics and Genomics group of the Advanced Technology Program of SAIC-Frederick exists to bring innovative genomic expertise, tools and analysis to NCI and the scientific community. The Sequencing Facility (SF) provides next generation short read (Illumina) sequencing capacity to investigators using a streamlined production approach. The Laboratory of Molecular Technology (LMT) offers a wide range of genomics core services including microarray expression analysis, miRNA analysis, array comparative genome hybridization, long read (Roche) next generation sequencing, quantitative real time PCR, transgenic genotyping, Sanger sequencing, and clinical mutation detection services to investigators from across the NIH. As the technology supporting this genomic research becomes more complex, the need for basic quality processes within all aspects of the core service groups becomes critical. The Quality Management group works alongside members of these labs to establish or improve processes supporting operations control (equipment, reagent and materials management), process improvement (reengineering/optimization, automation, acceptance criteria for new technologies and tech transfer), and quality assurance and customer support (controlled documentation/SOPs, training, service deficiencies and continual improvement efforts). Implementation and expansion of quality programs within unregulated environments demonstrates SAIC-Frederick's dedication to providing the highest quality products and services to the NIH community.
Vertebrate codon bias indicates a highly GC-rich ancestral genome.
Nabiyouni, Maryam; Prakash, Ashwin; Fedorov, Alexei
2013-04-25
Two factors are thought to have contributed to the origin of codon usage bias in eukaryotes: 1) genome-wide mutational forces that shape overall GC-content and create context-dependent nucleotide bias, and 2) positive selection for codons that maximize efficient and accurate translation. Particularly in vertebrates, these two explanations contradict each other and cloud the origin of codon bias in the taxon. On the one hand, mutational forces fail to explain GC-richness (~60%) of third codon positions, given the GC-poor overall genomic composition among vertebrates (~40%). On the other hand, positive selection cannot easily explain strict regularities in codon preferences. Large-scale bioinformatic assessment, of nucleotide composition of coding and non-coding sequences in vertebrates and other taxa, suggests a simple possible resolution for this contradiction. Specifically, we propose that the last common vertebrate ancestor had a GC-rich genome (~65% GC). The data suggest that whole-genome mutational bias is the major driving force for generating codon bias. As the bias becomes prominent, it begins to affect translation and can result in positive selection for optimal codons. The positive selection can, in turn, significantly modulate codon preferences. Copyright © 2013 Elsevier B.V. All rights reserved.
Yu, Joon-Ho; Jamal, Seema M; Tabor, Holly K; Bamshad, Michael J
2013-09-01
Researchers and clinicians face the practical and ethical challenge of if and how to offer for return the wide and varied scope of results available from individual exome sequencing and whole-genome sequencing. We argue that rather than viewing individual exome sequencing and whole-genome sequencing as a test for which results need to be "returned," that the technology should instead be framed as a dynamic resource of information from which results should be "managed" over the lifetime of an individual. We further suggest that individual exome sequencing and whole-genome sequencing results management is optimized using a self-guided approach that enables individuals to self-select among results offered for return in a convenient, confidential, personalized context that is responsive to their value system. This approach respects autonomy, allows individuals to maximize potential benefits of genomic information (beneficence) and minimize potential harms (nonmaleficence), and also preserves their right to an open future to the extent they desire or think is appropriate. We describe key challenges and advantages of such a self-guided management system and offer guidance on implementation using an information systems approach.
Brumm, Phillip J.; Gowda, Krishne; Robb, Frank T.; Mead, David A.
2016-01-01
Here we report the complete genome sequence of the chemoorganotrophic, extremely thermophilic bacterium, Dictyoglomus turgidum, which is a Gram negative, strictly anaerobic bacterium. D. turgidum and D. thermophilum together form the Dictyoglomi phylum. The two Dictyoglomus genomes are highly syntenic, and both are distantly related to Caldicellulosiruptor spp. D. turgidum is able to grow on a wide variety of polysaccharide substrates due to significant genomic commitment to glycosyl hydrolases, 16 of which were cloned and expressed in our study. The GH5, GH10, and GH42 enzymes characterized in this study suggest that D. turgidum can utilize most plant-based polysaccharides except crystalline cellulose. The DNA polymerase I enzyme was also expressed and characterized. The pure enzyme showed improved amplification of long PCR targets compared to Taq polymerase. The genome contains a full complement of DNA modifying enzymes, and an unusually high copy number (4) of a new, ancestral family of polB type nucleotidyltransferases designated as MNT (minimal nucleotidyltransferases). Considering its optimal growth at 72°C, D. turgidum has an anomalously low G+C content of 39.9% that may account for the presence of reverse gyrase, usually associated with hyperthermophiles. PMID:28066333
GETPrime 2.0: gene- and transcript-specific qPCR primers for 13 species including polymorphisms
David, Fabrice P.A.; Rougemont, Jacques; Deplancke, Bart
2017-01-01
GETPrime (http://bbcftools.epfl.ch/getprime) is a database with a web frontend providing gene- and transcript-specific, pre-computed qPCR primer pairs. The primers have been optimized for genome-wide specificity and for allowing the selective amplification of one or several splice variants of most known genes. To ease selection, primers have also been ranked according to defined criteria such as genome-wide specificity (with BLAST), amplicon size, and isoform coverage. Here, we report a major upgrade (2.0) of the database: eight new species (yeast, chicken, macaque, chimpanzee, rat, platypus, pufferfish, and Anolis carolinensis) now complement the five already included in the previous version (human, mouse, zebrafish, fly, and worm). Furthermore, the genomic reference has been updated to Ensembl v81 (while keeping earlier versions for backward compatibility) as a result of re-designing the back-end database and automating the import of relevant sections of the Ensembl database in species-independent fashion. This also allowed us to map known polymorphisms to the primers (on average three per primer for human), with the aim of reducing experimental error when targeting specific strains or individuals. Another consequence is that the inclusion of future Ensembl releases and other species has now become a relatively straightforward task. PMID:28053161
Clark, Stephen J; Smallwood, Sébastien A; Lee, Heather J; Krueger, Felix; Reik, Wolf; Kelsey, Gavin
2017-03-01
DNA methylation (DNAme) is an important epigenetic mark in diverse species. Our current understanding of DNAme is based on measurements from bulk cell samples, which obscures intercellular differences and prevents analyses of rare cell types. Thus, the ability to measure DNAme in single cells has the potential to make important contributions to the understanding of several key biological processes, such as embryonic development, disease progression and aging. We have recently reported a method for generating genome-wide DNAme maps from single cells, using single-cell bisulfite sequencing (scBS-seq), allowing the quantitative measurement of DNAme at up to 50% of CpG dinucleotides throughout the mouse genome. Here we present a detailed protocol for scBS-seq that includes our most recent developments to optimize recovery of CpGs, mapping efficiency and success rate; reduce hands-on time; and increase sample throughput with the option of using an automated liquid handler. We provide step-by-step instructions for each stage of the method, comprising cell lysis and bisulfite (BS) conversion, preamplification and adaptor tagging, library amplification, sequencing and, lastly, alignment and methylation calling. An individual with relevant molecular biology expertise can complete library preparation within 3 d. Subsequent computational steps require 1-3 d for someone with bioinformatics expertise.
USDA-ARS?s Scientific Manuscript database
SNP effects estimated in genomic selection programs allow for the prediction of direct genomic values (DGV) both at genome-wide and chromosomal level. As a consequence, genome-wide (G_GW) or chromosomal (G_CHR) correlation matrices between genomic predictions for different traits can be calculated. ...
Optimization of genome editing through CRISPR-Cas9 engineering.
Zhang, Jian-Hua; Adikaram, Poorni; Pandey, Mritunjay; Genis, Allison; Simonds, William F
2016-04-01
CRISPR (Clustered Regularly-Interspaced Short Palindromic Repeats)-Cas9 (CRISPR associated protein 9) has rapidly become the most promising genome editing tool with great potential to revolutionize medicine. Through guidance of a 20 nucleotide RNA (gRNA), CRISPR-Cas9 finds and cuts target protospacer DNA precisely 3 base pairs upstream of a PAM (Protospacer Adjacent Motif). The broken DNA ends are repaired by either NHEJ (Non-Homologous End Joining) resulting in small indels, or by HDR (Homology Directed Repair) for precise gene or nucleotide replacement. Theoretically, CRISPR-Cas9 could be used to modify any genomic sequences, thereby providing a simple, easy, and cost effective means of genome wide gene editing. However, the off-target activity of CRISPR-Cas9 that cuts DNA sites with imperfect matches with gRNA have been of significant concern because clinical applications require 100% accuracy. Additionally, CRISPR-Cas9 has unpredictable efficiency among different DNA target sites and the PAM requirements greatly restrict its genome editing frequency. A large number of efforts have been made to address these impeding issues, but much more is needed to fully realize the medical potential of CRISPR-Cas9. In this article, we summarize the existing problems and current advances of the CRISPR-Cas9 technology and provide perspectives for the ultimate perfection of Cas9-mediated genome editing.
Novel efficient genome-wide SNP panels for the conservation of the highly endangered Iberian lynx.
Kleinman-Ruiz, Daniel; Martínez-Cruz, Begoña; Soriano, Laura; Lucena-Perez, Maria; Cruz, Fernando; Villanueva, Beatriz; Fernández, Jesús; Godoy, José A
2017-07-21
The Iberian lynx (Lynx pardinus) has been acknowledged as the most endangered felid species in the world. An intense contraction and fragmentation during the twentieth century left less than 100 individuals split in two isolated and genetically eroded populations by 2002. Genetic monitoring and management so far have been based on 36 STRs, but their limited variability and the more complex situation of current populations demand more efficient molecular markers. The recent characterization of the Iberian lynx genome identified more than 1.6 million SNPs, of which 1536 were selected and genotyped in an extended Iberian lynx sample. We validated 1492 SNPs and analysed their heterozygosity, Hardy-Weinberg equilibrium, and linkage disequilibrium. We then selected a panel of 343 minimally linked autosomal SNPs from which we extracted subsets optimized for four different typical tasks in conservation applications: individual identification, parentage assignment, relatedness estimation, and admixture classification, and compared their power to currently used STR panels. We ascribed 21 SNPs to chromosome X based on their segregation patterns, and identified one additional marker that showed significant differentiation between sexes. For all applications considered, panels of autosomal SNPs showed higher power than the currently used STR set with only a very modest increase in the number of markers. These novel panels of highly informative genome-wide SNPs provide more powerful, efficient, and flexible tools for the genetic management and non-invasive monitoring of Iberian lynx populations. This example highlights an important outcome of whole-genome studies in genetically threatened species.
Optimal design of low-density SNP arrays for genomic prediction: algorithm and applications
USDA-ARS?s Scientific Manuscript database
Low-density (LD) single nucleotide polymorphism (SNP) arrays provide a cost-effective solution for genomic prediction and selection, but algorithms and computational tools are needed for their optimal design. A multiple-objective, local optimization (MOLO) algorithm was developed for design of optim...
Genome-Wide Chromosomal Targets of Oncogenic Transcription Factors
2008-04-01
axis. (a) Comparison between STAGE and ChIP-chip when the same sample was analyzed by both methods. The gray line indicates all predicted STAGE targets...numbers of single-hit tags (Y-axis) were plotted against the frequen- cies of those tags in the random ( gray bars) and experimental (black bars) tag...size of 500 bp gave an optimal separation between random and real data. Data shown is for a window size of 500 bp. The gray bars indicate log10 of the
Wei, Li; Xu, Jian
2018-06-01
Epigenetic factors such as histone modifications play integral roles in plant development and stress response, yet their implications in algae remain poorly understood. In the industrial oleaginous microalgae Nannochloropsis spp., the lack of an efficient methodology for chromatin immunoprecipitation (ChIP), which determines the specific genomic location of various histone modifications, has hindered probing the epigenetic basis of their photosynthetic carbon conversion and storage as oil. Here, a detailed ChIP protocol was developed for Nannochloropsis oceanica, which represents a reliable approach for the analysis of histone modifications, chromatin state, and transcription factor-binding sites at the epigenetic level. Using ChIP-qPCR, genes related to photosynthetic carbon fixation in this microalga were systematically assessed. Furthermore, a ChIP-Seq protocol was established and optimized, which generated a genome-wide profile of histone modification events, using histone mark H3K9Ac as an example. These results are the first step for appreciation of the chromatin landscape in industrial oleaginous microalgae and for epigenetics-based microalgal feedstock development. © 2018 Phycological Society of America.
A novel harmony search-K means hybrid algorithm for clustering gene expression data
Nazeer, KA Abdul; Sebastian, MP; Kumar, SD Madhu
2013-01-01
Recent progress in bioinformatics research has led to the accumulation of huge quantities of biological data at various data sources. The DNA microarray technology makes it possible to simultaneously analyze large number of genes across different samples. Clustering of microarray data can reveal the hidden gene expression patterns from large quantities of expression data that in turn offers tremendous possibilities in functional genomics, comparative genomics, disease diagnosis and drug development. The k- ¬means clustering algorithm is widely used for many practical applications. But the original k-¬means algorithm has several drawbacks. It is computationally expensive and generates locally optimal solutions based on the random choice of the initial centroids. Several methods have been proposed in the literature for improving the performance of the k-¬means algorithm. A meta-heuristic optimization algorithm named harmony search helps find out near-global optimal solutions by searching the entire solution space. Low clustering accuracy of the existing algorithms limits their use in many crucial applications of life sciences. In this paper we propose a novel Harmony Search-K means Hybrid (HSKH) algorithm for clustering the gene expression data. Experimental results show that the proposed algorithm produces clusters with better accuracy in comparison with the existing algorithms. PMID:23390351
A novel harmony search-K means hybrid algorithm for clustering gene expression data.
Nazeer, Ka Abdul; Sebastian, Mp; Kumar, Sd Madhu
2013-01-01
Recent progress in bioinformatics research has led to the accumulation of huge quantities of biological data at various data sources. The DNA microarray technology makes it possible to simultaneously analyze large number of genes across different samples. Clustering of microarray data can reveal the hidden gene expression patterns from large quantities of expression data that in turn offers tremendous possibilities in functional genomics, comparative genomics, disease diagnosis and drug development. The k- ¬means clustering algorithm is widely used for many practical applications. But the original k-¬means algorithm has several drawbacks. It is computationally expensive and generates locally optimal solutions based on the random choice of the initial centroids. Several methods have been proposed in the literature for improving the performance of the k-¬means algorithm. A meta-heuristic optimization algorithm named harmony search helps find out near-global optimal solutions by searching the entire solution space. Low clustering accuracy of the existing algorithms limits their use in many crucial applications of life sciences. In this paper we propose a novel Harmony Search-K means Hybrid (HSKH) algorithm for clustering the gene expression data. Experimental results show that the proposed algorithm produces clusters with better accuracy in comparison with the existing algorithms.
Esfahani, Mohammad Shahrokh; Dougherty, Edward R
2015-01-01
Phenotype classification via genomic data is hampered by small sample sizes that negatively impact classifier design. Utilization of prior biological knowledge in conjunction with training data can improve both classifier design and error estimation via the construction of the optimal Bayesian classifier. In the genomic setting, gene/protein signaling pathways provide a key source of biological knowledge. Although these pathways are neither complete, nor regulatory, with no timing associated with them, they are capable of constraining the set of possible models representing the underlying interaction between molecules. The aim of this paper is to provide a framework and the mathematical tools to transform signaling pathways to prior probabilities governing uncertainty classes of feature-label distributions used in classifier design. Structural motifs extracted from the signaling pathways are mapped to a set of constraints on a prior probability on a Multinomial distribution. Being the conjugate prior for the Multinomial distribution, we propose optimization paradigms to estimate the parameters of a Dirichlet distribution in the Bayesian setting. The performance of the proposed methods is tested on two widely studied pathways: mammalian cell cycle and a p53 pathway model.
Memory management in genome-wide association studies
2009-01-01
Genome-wide association is a powerful tool for the identification of genes that underlie common diseases. Genome-wide association studies generate billions of genotypes and pose significant computational challenges for most users including limited computer memory. We applied a recently developed memory management tool to two analyses of North American Rheumatoid Arthritis Consortium studies and measured the performance in terms of central processing unit and memory usage. We conclude that our memory management approach is simple, efficient, and effective for genome-wide association studies. PMID:20018047
Palaiokostas, Christos; Cariou, Sophie; Bestin, Anastasia; Bruant, Jean-Sebastien; Haffray, Pierrick; Morin, Thierry; Cabon, Joëlle; Allal, François; Vandeputte, Marc; Houston, Ross D
2018-06-08
European sea bass (Dicentrarchus labrax) is one of the most important species for European aquaculture. Viral nervous necrosis (VNN), commonly caused by the redspotted grouper nervous necrosis virus (RGNNV), can result in high levels of morbidity and mortality, mainly during the larval and juvenile stages of cultured sea bass. In the absence of efficient therapeutic treatments, selective breeding for host resistance offers a promising strategy to control this disease. Our study aimed at investigating genetic resistance to VNN and genomic-based approaches to improve disease resistance by selective breeding. A population of 1538 sea bass juveniles from a factorial cross between 48 sires and 17 dams was challenged with RGNNV with mortalities and survivors being recorded and sampled for genotyping by the RAD sequencing approach. We used genome-wide genotype data from 9195 single nucleotide polymorphisms (SNPs) for downstream analysis. Estimates of heritability of survival on the underlying scale for the pedigree and genomic relationship matrices were 0.27 (HPD interval 95%: 0.14-0.40) and 0.43 (0.29-0.57), respectively. Classical genome-wide association analysis detected genome-wide significant quantitative trait loci (QTL) for resistance to VNN on chromosomes (unassigned scaffolds in the case of 'chromosome' 25) 3, 20 and 25 (P < 1e06). Weighted genomic best linear unbiased predictor provided additional support for the QTL on chromosome 3 and suggested that it explained 4% of the additive genetic variation. Genomic prediction approaches were tested to investigate the potential of using genome-wide SNP data to estimate breeding values for resistance to VNN and showed that genomic prediction resulted in a 13% increase in successful classification of resistant and susceptible animals compared to pedigree-based methods, with Bayes A and Bayes B giving the highest predictive ability. Genome-wide significant QTL were identified but each with relatively small effects on the trait. Tests of genomic prediction suggested that incorporating genome-wide SNP data is likely to result in higher accuracy of estimated breeding values for resistance to VNN. RAD sequencing is an effective method for generating such genome-wide SNPs, and our findings highlight the potential of genomic selection to breed farmed European sea bass with improved resistance to VNN.
The pig genome project has plenty to squeal about.
Fan, B; Gorbach, D M; Rothschild, M F
2011-01-01
Significant progress on pig genetics and genomics research has been witnessed in recent years due to the integration of advanced molecular biology techniques, bioinformatics and computational biology, and the collaborative efforts of researchers in the swine genomics community. Progress on expanding the linkage map has slowed down, but the efforts have created a higher-resolution physical map integrating the clone map and BAC end sequence. The number of QTL mapped is still growing and most of the updated QTL mapping results are available through PigQTLdb. Additionally, expression studies using high-throughput microarrays and other gene expression techniques have made significant advancements. The number of identified non-coding RNAs is rapidly increasing and their exact regulatory functions are being explored. A publishable draft (build 10) of the swine genome sequence was available for the pig genomics community by the end of December 2010. Build 9 of the porcine genome is currently available with Ensembl annotation; manual annotation is ongoing. These drafts provide useful tools for such endeavors as comparative genomics and SNP scans for fine QTL mapping. A recent community-wide effort to create a 60K porcine SNP chip has greatly facilitated whole-genome association analyses, haplotype block construction and linkage disequilibrium mapping, which can contribute to whole-genome selection. The future 'systems biology' that integrates and optimizes the information from all research levels can enhance the pig community's understanding of the full complexity of the porcine genome. These recent technological advances and where they may lead are reviewed. Copyright © 2011 S. Karger AG, Basel.
Performances of Different Fragment Sizes for Reduced Representation Bisulfite Sequencing in Pigs.
Yuan, Xiao-Long; Zhang, Zhe; Pan, Rong-Yang; Gao, Ning; Deng, Xi; Li, Bin; Zhang, Hao; Sangild, Per Torp; Li, Jia-Qi
2017-01-01
Reduced representation bisulfite sequencing (RRBS) has been widely used to profile genome-scale DNA methylation in mammalian genomes. However, the applications and technical performances of RRBS with different fragment sizes have not been systematically reported in pigs, which serve as one of the important biomedical models for humans. The aims of this study were to evaluate capacities of RRBS libraries with different fragment sizes to characterize the porcine genome. We found that the Msp I-digested segments between 40 and 220 bp harbored a high distribution peak at 74 bp, which were highly overlapped with the repetitive elements and might reduce the unique mapping alignment. The RRBS library of 110-220 bp fragment size had the highest unique mapping alignment and the lowest multiple alignment. The cost-effectiveness of the 40-110 bp, 110-220 bp and 40-220 bp fragment sizes might decrease when the dataset size was more than 70, 50 and 110 million reads for these three fragment sizes, respectively. Given a 50-million dataset size, the average sequencing depth of the detected CpG sites in the 110-220 bp fragment size appeared to be deeper than in the 40-110 bp and 40-220 bp fragment sizes, and these detected CpG sties differently located in gene- and CpG island-related regions. In this study, our results demonstrated that selections of fragment sizes could affect the numbers and sequencing depth of detected CpG sites as well as the cost-efficiency. No single solution of RRBS is optimal in all circumstances for investigating genome-scale DNA methylation. This work provides the useful knowledge on designing and executing RRBS for investigating the genome-wide DNA methylation in tissues from pigs.
Metzger, Julia; Ohnesorge, Bernhard; Distl, Ottmar
2012-01-01
Equine guttural pouch tympany (GPT) is a hereditary condition affecting foals in their first months of life. Complex segregation analyses in Arabian and German warmblood horses showed the involvement of a major gene as very likely. Genome-wide linkage and association analyses including a high density marker set of single nucleotide polymorphisms (SNPs) were performed to map the genomic region harbouring the potential major gene for GPT. A total of 85 Arabian and 373 German warmblood horses were genotyped on the Illumina equine SNP50 beadchip. Non-parametric multipoint linkage analyses showed genome-wide significance on horse chromosomes (ECA) 3 for German warmblood at 16–26 Mb and 34–55 Mb and for Arabian on ECA15 at 64–65 Mb. Genome-wide association analyses confirmed the linked regions for both breeds. In Arabian, genome-wide association was detected at 64 Mb within the region with the highest linkage peak on ECA15. For German warmblood, signals for genome-wide association were close to the peak region of linkage at 52 Mb on ECA3. The odds ratio for the SNP with the highest genome-wide association was 0.12 for the Arabian. In conclusion, the refinement of the regions with the Illumina equine SNP50 beadchip is an important step to unravel the responsible mutations for GPT. PMID:22848553
The Systems Biology Markup Language (SBML) Level 3 Package: Flux Balance Constraints.
Olivier, Brett G; Bergmann, Frank T
2015-09-04
Constraint-based modeling is a well established modelling methodology used to analyze and study biological networks on both a medium and genome scale. Due to their large size, genome scale models are typically analysed using constraint-based optimization techniques. One widely used method is Flux Balance Analysis (FBA) which, for example, requires a modelling description to include: the definition of a stoichiometric matrix, an objective function and bounds on the values that fluxes can obtain at steady state. The Flux Balance Constraints (FBC) Package extends SBML Level 3 and provides a standardized format for the encoding, exchange and annotation of constraint-based models. It includes support for modelling concepts such as objective functions, flux bounds and model component annotation that facilitates reaction balancing. The FBC package establishes a base level for the unambiguous exchange of genome-scale, constraint-based models, that can be built upon by the community to meet future needs (e. g. by extending it to cover dynamic FBC models).
The Systems Biology Markup Language (SBML) Level 3 Package: Flux Balance Constraints.
Olivier, Brett G; Bergmann, Frank T
2015-06-01
Constraint-based modeling is a well established modelling methodology used to analyze and study biological networks on both a medium and genome scale. Due to their large size, genome scale models are typically analysed using constraint-based optimization techniques. One widely used method is Flux Balance Analysis (FBA) which, for example, requires a modelling description to include: the definition of a stoichiometric matrix, an objective function and bounds on the values that fluxes can obtain at steady state. The Flux Balance Constraints (FBC) Package extends SBML Level 3 and provides a standardized format for the encoding, exchange and annotation of constraint-based models. It includes support for modelling concepts such as objective functions, flux bounds and model component annotation that facilitates reaction balancing. The FBC package establishes a base level for the unambiguous exchange of genome-scale, constraint-based models, that can be built upon by the community to meet future needs (e. g. by extending it to cover dynamic FBC models).
Genome-wide Fitness Profiles Reveal a Requirement for Autophagy During Yeast Fermentation
Piggott, Nina; Cook, Michael A.; Tyers, Mike; Measday, Vivien
2011-01-01
The ability of cells to respond to environmental changes and adapt their metabolism enables cell survival under stressful conditions. The budding yeast Saccharomyces cerevisiae (S. cerevisiae) is particularly well adapted to the harsh conditions of anaerobic wine fermentation. However, S. cerevisiae gene function has not been previously systematically interrogated under conditions of industrial fermentation. We performed a genome-wide study of essential and nonessential S. cerevisiae gene requirements during grape juice fermentation to identify deletion strains that are either depleted or enriched within the viable fermentative population. Genes that function in autophagy and ubiquitin-proteasome degradation are required for optimal survival during fermentation, whereas genes that function in ribosome assembly and peroxisome biogenesis impair fitness during fermentation. We also uncover fermentation phenotypes for 139 uncharacterized genes with no previously known cellular function. We demonstrate that autophagy is induced early in wine fermentation in a nitrogen-replete environment, suggesting that autophagy may be triggered by other forms of stress that arise during fermentation. These results provide insights into the complex fermentation process and suggest possible means for improvement of industrial fermentation strains. PMID:22384346
Chan, Robin F.; Shabalin, Andrey A.; Xie, Lin Y.; Adkins, Daniel E.; Zhao, Min; Turecki, Gustavo; Clark, Shaunna L.; Aberg, Karolina A.
2017-01-01
Abstract Methylome-wide association studies are typically performed using microarray technologies that only assay a very small fraction of the CG methylome and entirely miss two forms of methylation that are common in brain and likely of particular relevance for neuroscience and psychiatric disorders. The alternative is to use whole genome bisulfite (WGB) sequencing but this approach is not yet practically feasible with sample sizes required for adequate statistical power. We argue for revisiting methylation enrichment methods that, provided optimal protocols are used, enable comprehensive, adequately powered and cost-effective genome-wide investigations of the brain methylome. To support our claim we use data showing that enrichment methods approximate the sensitivity obtained with WGB methods and with slightly better specificity. However, this performance is achieved at <5% of the reagent costs. Furthermore, because many more samples can be sequenced simultaneously, projects can be completed about 15 times faster. Currently the only viable option available for comprehensive brain methylome studies, enrichment methods may be critical for moving the field forward. PMID:28334972
A novel sgRNA selection system for CRISPR-Cas9 in mammalian cells.
Zhang, Haiwei; Zhang, Xixi; Fan, Cunxian; Xie, Qun; Xu, Chengxian; Zhao, Qun; Liu, Yongbo; Wu, Xiaoxia; Zhang, Haibing
2016-03-18
CRISPR-Cas9 mediated genome editing system has been developed as a powerful tool for elucidating the function of genes through genetic engineering in multiple cells and organisms. This system takes advantage of a single guide RNA (sgRNA) to direct the Cas9 endonuclease to a specific DNA site to generate mutant alleles. Since the targeting efficiency of sgRNAs to distinct DNA loci can vary widely, there remains a need for a rapid, simple and efficient sgRNA selection method to overcome this limitation of the CRISPR-Cas9 system. Here we report a novel system to select sgRNA with high efficacy for DNA sequence modification by a luciferase assay. Using this sgRNAs selection system, we further demonstrated successful examples of one sgRNA for generating one gene knockout cell lines where the targeted genes are shown to be functionally defective. This system provides a potential application to optimize the sgRNAs in different species and to generate a powerful CRISPR-Cas9 genome-wide screening system with minimum amounts of sgRNAs. Copyright © 2016 Elsevier Inc. All rights reserved.
Dreger, Dayna L; Rimbault, Maud; Davis, Brian W; Bhatnagar, Adrienne; Parker, Heidi G; Ostrander, Elaine A
2016-12-01
In the decade following publication of the draft genome sequence of the domestic dog, extraordinary advances with application to several fields have been credited to the canine genetic system. Taking advantage of closed breeding populations and the subsequent selection for aesthetic and behavioral characteristics, researchers have leveraged the dog as an effective natural model for the study of complex traits, such as disease susceptibility, behavior and morphology, generating unique contributions to human health and biology. When designing genetic studies using purebred dogs, it is essential to consider the unique demography of each population, including estimation of effective population size and timing of population bottlenecks. The analytical design approach for genome-wide association studies (GWAS) and analysis of whole-genome sequence (WGS) experiments are inextricable from demographic data. We have performed a comprehensive study of genomic homozygosity, using high-depth WGS data for 90 individuals, and Illumina HD SNP data from 800 individuals representing 80 breeds. These data were coupled with extensive pedigree data analyses for 11 breeds that, together, allowed us to compute breed structure, demography, and molecular measures of genome diversity. Our comparative analyses characterize the extent, formation and implication of breed-specific diversity as it relates to population structure. These data demonstrate the relationship between breed-specific genome dynamics and population architecture, and provide important considerations influencing the technological and cohort design of association and other genomic studies. © 2016. Published by The Company of Biologists Ltd.
Personalized medicine in thrombosis: back to the future
Nagalla, Srikanth
2016-01-01
Most physicians believe they practiced personalized medicine prior to the genomics era that followed the sequencing of the human genome. The focus of personalized medicine has been primarily genomic medicine, wherein it is hoped that the nucleotide dissimilarities among different individuals would provide clinicians with more precise understanding of physiology, more refined diagnoses, better disease risk assessment, earlier detection and monitoring, and tailored treatments to the individual patient. However, to date, the “genomic bench” has not worked itself to the clinical thrombosis bedside. In fact, traditional plasma-based hemostasis-thrombosis laboratory testing, by assessing functional pathways of coagulation, may better help manage venous thrombotic disease than a single DNA variant with a small effect size. There are some new and exciting discoveries in the genetics of platelet reactivity pertaining to atherothrombotic disease. Despite a plethora of genetic/genomic data on platelet reactivity, there are relatively little actionable pharmacogenetic data with antiplatelet agents. Nevertheless, it is crucial for genome-wide DNA/RNA sequencing to continue in research settings for causal gene discovery, pharmacogenetic purposes, and gene-gene and gene-environment interactions. The potential of genomics to advance medicine will require integration of personal data that are obtained in the patient history: environmental exposures, diet, social data, etc. Furthermore, without the ritual of obtaining this information, we will have depersonalized medicine, which lacks the precision needed for the research required to eventually incorporate genomics into routine, optimal, and value-added clinical care. PMID:26847245
Dreger, Dayna L.; Rimbault, Maud; Davis, Brian W.; Bhatnagar, Adrienne; Parker, Heidi G.
2016-01-01
ABSTRACT In the decade following publication of the draft genome sequence of the domestic dog, extraordinary advances with application to several fields have been credited to the canine genetic system. Taking advantage of closed breeding populations and the subsequent selection for aesthetic and behavioral characteristics, researchers have leveraged the dog as an effective natural model for the study of complex traits, such as disease susceptibility, behavior and morphology, generating unique contributions to human health and biology. When designing genetic studies using purebred dogs, it is essential to consider the unique demography of each population, including estimation of effective population size and timing of population bottlenecks. The analytical design approach for genome-wide association studies (GWAS) and analysis of whole-genome sequence (WGS) experiments are inextricable from demographic data. We have performed a comprehensive study of genomic homozygosity, using high-depth WGS data for 90 individuals, and Illumina HD SNP data from 800 individuals representing 80 breeds. These data were coupled with extensive pedigree data analyses for 11 breeds that, together, allowed us to compute breed structure, demography, and molecular measures of genome diversity. Our comparative analyses characterize the extent, formation and implication of breed-specific diversity as it relates to population structure. These data demonstrate the relationship between breed-specific genome dynamics and population architecture, and provide important considerations influencing the technological and cohort design of association and other genomic studies. PMID:27874836
Decoding the genome with an integrative analysis tool: combinatorial CRM Decoder.
Kang, Keunsoo; Kim, Joomyeong; Chung, Jae Hoon; Lee, Daeyoup
2011-09-01
The identification of genome-wide cis-regulatory modules (CRMs) and characterization of their associated epigenetic features are fundamental steps toward the understanding of gene regulatory networks. Although integrative analysis of available genome-wide information can provide new biological insights, the lack of novel methodologies has become a major bottleneck. Here, we present a comprehensive analysis tool called combinatorial CRM decoder (CCD), which utilizes the publicly available information to identify and characterize genome-wide CRMs in a species of interest. CCD first defines a set of the epigenetic features which is significantly associated with a set of known CRMs as a code called 'trace code', and subsequently uses the trace code to pinpoint putative CRMs throughout the genome. Using 61 genome-wide data sets obtained from 17 independent mouse studies, CCD successfully catalogued ∼12 600 CRMs (five distinct classes) including polycomb repressive complex 2 target sites as well as imprinting control regions. Interestingly, we discovered that ∼4% of the identified CRMs belong to at least two different classes named 'multi-functional CRM', suggesting their functional importance for regulating spatiotemporal gene expression. From these examples, we show that CCD can be applied to any potential genome-wide datasets and therefore will shed light on unveiling genome-wide CRMs in various species.
A Discovery Genome-Wide Association Study of Entrepreneurship
ERIC Educational Resources Information Center
Quaye, Lydia; Nicolaou, Nicos; Shane, Scott; Mangino, Massimo
2012-01-01
To identify specific genetic variants influencing the phenotype of entrepreneurship, we conducted a genome-wide association study (GWAS) with 3,933 Caucasian females from the TwinsUK Adult Twin Registry. Following stringent genotype quality control, GWAF (genome-wide association analyses for family data) software was used to assess the association…
GWAMA: software for genome-wide association meta-analysis.
Mägi, Reedik; Morris, Andrew P
2010-05-28
Despite the recent success of genome-wide association studies in identifying novel loci contributing effects to complex human traits, such as type 2 diabetes and obesity, much of the genetic component of variation in these phenotypes remains unexplained. One way to improving power to detect further novel loci is through meta-analysis of studies from the same population, increasing the sample size over any individual study. Although statistical software analysis packages incorporate routines for meta-analysis, they are ill equipped to meet the challenges of the scale and complexity of data generated in genome-wide association studies. We have developed flexible, open-source software for the meta-analysis of genome-wide association studies. The software incorporates a variety of error trapping facilities, and provides a range of meta-analysis summary statistics. The software is distributed with scripts that allow simple formatting of files containing the results of each association study and generate graphical summaries of genome-wide meta-analysis results. The GWAMA (Genome-Wide Association Meta-Analysis) software has been developed to perform meta-analysis of summary statistics generated from genome-wide association studies of dichotomous phenotypes or quantitative traits. Software with source files, documentation and example data files are freely available online at http://www.well.ox.ac.uk/GWAMA.
Evaluating Imputation Algorithms for Low-Depth Genotyping-By-Sequencing (GBS) Data
2016-01-01
Well-powered genomic studies require genome-wide marker coverage across many individuals. For non-model species with few genomic resources, high-throughput sequencing (HTS) methods, such as Genotyping-By-Sequencing (GBS), offer an inexpensive alternative to array-based genotyping. Although affordable, datasets derived from HTS methods suffer from sequencing error, alignment errors, and missing data, all of which introduce noise and uncertainty to variant discovery and genotype calling. Under such circumstances, meaningful analysis of the data is difficult. Our primary interest lies in the issue of how one can accurately infer or impute missing genotypes in HTS-derived datasets. Many of the existing genotype imputation algorithms and software packages were primarily developed by and optimized for the human genetics community, a field where a complete and accurate reference genome has been constructed and SNP arrays have, in large part, been the common genotyping platform. We set out to answer two questions: 1) can we use existing imputation methods developed by the human genetics community to impute missing genotypes in datasets derived from non-human species and 2) are these methods, which were developed and optimized to impute ascertained variants, amenable for imputation of missing genotypes at HTS-derived variants? We selected Beagle v.4, a widely used algorithm within the human genetics community with reportedly high accuracy, to serve as our imputation contender. We performed a series of cross-validation experiments, using GBS data collected from the species Manihot esculenta by the Next Generation (NEXTGEN) Cassava Breeding Project. NEXTGEN currently imputes missing genotypes in their datasets using a LASSO-penalized, linear regression method (denoted ‘glmnet’). We selected glmnet to serve as a benchmark imputation method for this reason. We obtained estimates of imputation accuracy by masking a subset of observed genotypes, imputing, and calculating the sample Pearson correlation between observed and imputed genotype dosages at the site and individual level; computation time served as a second metric for comparison. We then set out to examine factors affecting imputation accuracy, such as levels of missing data, read depth, minor allele frequency (MAF), and reference panel composition. PMID:27537694
Evaluating Imputation Algorithms for Low-Depth Genotyping-By-Sequencing (GBS) Data.
Chan, Ariel W; Hamblin, Martha T; Jannink, Jean-Luc
2016-01-01
Well-powered genomic studies require genome-wide marker coverage across many individuals. For non-model species with few genomic resources, high-throughput sequencing (HTS) methods, such as Genotyping-By-Sequencing (GBS), offer an inexpensive alternative to array-based genotyping. Although affordable, datasets derived from HTS methods suffer from sequencing error, alignment errors, and missing data, all of which introduce noise and uncertainty to variant discovery and genotype calling. Under such circumstances, meaningful analysis of the data is difficult. Our primary interest lies in the issue of how one can accurately infer or impute missing genotypes in HTS-derived datasets. Many of the existing genotype imputation algorithms and software packages were primarily developed by and optimized for the human genetics community, a field where a complete and accurate reference genome has been constructed and SNP arrays have, in large part, been the common genotyping platform. We set out to answer two questions: 1) can we use existing imputation methods developed by the human genetics community to impute missing genotypes in datasets derived from non-human species and 2) are these methods, which were developed and optimized to impute ascertained variants, amenable for imputation of missing genotypes at HTS-derived variants? We selected Beagle v.4, a widely used algorithm within the human genetics community with reportedly high accuracy, to serve as our imputation contender. We performed a series of cross-validation experiments, using GBS data collected from the species Manihot esculenta by the Next Generation (NEXTGEN) Cassava Breeding Project. NEXTGEN currently imputes missing genotypes in their datasets using a LASSO-penalized, linear regression method (denoted 'glmnet'). We selected glmnet to serve as a benchmark imputation method for this reason. We obtained estimates of imputation accuracy by masking a subset of observed genotypes, imputing, and calculating the sample Pearson correlation between observed and imputed genotype dosages at the site and individual level; computation time served as a second metric for comparison. We then set out to examine factors affecting imputation accuracy, such as levels of missing data, read depth, minor allele frequency (MAF), and reference panel composition.
Vive la résistance: genome-wide selection against introduced alleles in invasive hybrid zones
Kovach, Ryan P.; Hand, Brian K.; Hohenlohe, Paul A.; Cosart, Ted F.; Boyer, Matthew C.; Neville, Helen H.; Muhlfeld, Clint C.; Amish, Stephen J.; Carim, Kellie; Narum, Shawn R.; Lowe, Winsor H.; Allendorf, Fred W.; Luikart, Gordon
2016-01-01
Evolutionary and ecological consequences of hybridization between native and invasive species are notoriously complicated because patterns of selection acting on non-native alleles can vary throughout the genome and across environments. Rapid advances in genomics now make it feasible to assess locus-specific and genome-wide patterns of natural selection acting on invasive introgression within and among natural populations occupying diverse environments. We quantified genome-wide patterns of admixture across multiple independent hybrid zones of native westslope cutthroat trout and invasive rainbow trout, the world's most widely introduced fish, by genotyping 339 individuals from 21 populations using 9380 species-diagnostic loci. A significantly greater proportion of the genome appeared to be under selection favouring native cutthroat trout (rather than rainbow trout), and this pattern was pervasive across the genome (detected on most chromosomes). Furthermore, selection against invasive alleles was consistent across populations and environments, even in those where rainbow trout were predicted to have a selective advantage (warm environments). These data corroborate field studies showing that hybrids between these species have lower fitness than the native taxa, and show that these fitness differences are due to selection favouring many native genes distributed widely throughout the genome.
Family-Based Genome-Wide Association Scan of Attention-Deficit/Hyperactivity Disorder
ERIC Educational Resources Information Center
Mick, Eric; Todorov, Alexandre; Smalley, Susan; Hu, Xiaolan; Loo, Sandra; Todd, Richard D.; Biederman, Joseph; Byrne, Deirdre; Dechairo, Bryan; Guiney, Allan; McCracken, James; McGough, James; Nelson, Stanley F.; Reiersen, Angela M.; Wilens, Timothy E.; Wozniak, Janet; Neale, Benjamin M.; Faraone, Stephen V.
2010-01-01
Objective: Genes likely play a substantial role in the etiology of attention-deficit/hyperactivity disorder (ADHD). However, the genetic architecture of the disorder is unknown, and prior genome-wide association studies (GWAS) have not identified a genome-wide significant association. We have conducted a third, independent, multisite GWAS of…
Case-Control Genome-Wide Association Study of Attention-Deficit/Hyperactivity Disorder
ERIC Educational Resources Information Center
Neale, Benjamin M.; Medland, Sarah; Ripke, Stephan; Anney, Richard J. L.; Asherson, Philip; Buitelaar, Jan; Franke, Barbara; Gill, Michael; Kent, Lindsey; Holmans, Peter; Middleton, Frank; Thapar, Anita; Lesch, Klaus-Peter; Faraone, Stephen V.; Daly, Mark; Nguyen, Thuy Trang; Schafer, Helmut; Steinhausen, Hans-Christoph; Reif, Andreas; Renner, Tobias J.; Romanos, Marcel; Romanos, Jasmin; Warnke, Andreas; Walitza, Susanne; Freitag, Christine; Meyer, Jobst; Palmason, Haukur; Rothenberger, Aribert; Hawi, Ziarih; Sergeant, Joseph; Roeyers, Herbert; Mick, Eric; Biederman, Joseph
2010-01-01
Objective: Although twin and family studies have shown attention-deficit/hyperactivity disorder (ADHD) to be highly heritable, genetic variants influencing the trait at a genome-wide significant level have yet to be identified. Thus additional genome-wide association studies (GWAS) are needed. Method: We used case-control analyses of 896 cases…
Meta-Analysis of Genome-Wide Association Studies of Attention-Deficit/Hyperactivity Disorder
ERIC Educational Resources Information Center
Neale, Benjamin M.; Medland, Sarah E.; Ripke, Stephan; Asherson, Philip; Franke, Barbara; Lesch, Klaus-Peter; Faraone, Stephen V.; Nguyen, Thuy Trang; Schafer, Helmut; Holmans, Peter; Daly, Mark; Steinhausen, Hans-Christoph; Freitag, Christine; Reif, Andreas; Renner, Tobias J.; Romanos, Marcel; Romanos, Jasmin; Walitza, Susanne; Warnke, Andreas; Meyer, Jobst; Palmason, Haukur; Buitelaar, Jan; Vasquez, Alejandro Arias; Lambregts-Rommelse, Nanda; Gill, Michael; Anney, Richard J. L.; Langely, Kate; O'Donovan, Michael; Williams, Nigel; Owen, Michael; Thapar, Anita; Kent, Lindsey; Sergeant, Joseph; Roeyers, Herbert; Mick, Eric; Biederman, Joseph; Doyle, Alysa; Smalley, Susan; Loo, Sandra; Hakonarson, Hakon; Elia, Josephine; Todorov, Alexandre; Miranda, Ana; Mulas, Fernando; Ebstein, Richard P.; Rothenberger, Aribert; Banaschewski, Tobias; Oades, Robert D.; Sonuga-Barke, Edmund; McGough, James; Nisenbaum, Laura; Middleton, Frank; Hu, Xiaolan; Nelson, Stan
2010-01-01
Objective: Although twin and family studies have shown attention-deficit/hyperactivity disorder (ADHD) to be highly heritable, genetic variants influencing the trait at a genome-wide significant level have yet to be identified. As prior genome-wide association studies (GWAS) have not yielded significant results, we conducted a meta-analysis of…
GUIDE-Seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases
Nguyen, Nhu T.; Liebers, Matthew; Topkar, Ved V.; Thapar, Vishal; Wyvekens, Nicolas; Khayter, Cyd; Iafrate, A. John; Le, Long P.; Aryee, Martin J.; Joung, J. Keith
2014-01-01
CRISPR RNA-guided nucleases (RGNs) are widely used genome-editing reagents, but methods to delineate their genome-wide off-target cleavage activities have been lacking. Here we describe an approach for global detection of DNA double-stranded breaks (DSBs) introduced by RGNs and potentially other nucleases. This method, called Genome-wide Unbiased Identification of DSBs Enabled by Sequencing (GUIDE-Seq), relies on capture of double-stranded oligodeoxynucleotides into breaks Application of GUIDE-Seq to thirteen RGNs in two human cell lines revealed wide variability in RGN off-target activities and unappreciated characteristics of off-target sequences. The majority of identified sites were not detected by existing computational methods or ChIP-Seq. GUIDE-Seq also identified RGN-independent genomic breakpoint ‘hotspots’. Finally, GUIDE-Seq revealed that truncated guide RNAs exhibit substantially reduced RGN-induced off-target DSBs. Our experiments define the most rigorous framework for genome-wide identification of RGN off-target effects to date and provide a method for evaluating the safety of these nucleases prior to clinical use. PMID:25513782
Deep learning in pharmacogenomics: from gene regulation to patient stratification.
Kalinin, Alexandr A; Higgins, Gerald A; Reamaroon, Narathip; Soroushmehr, Sayedmohammadreza; Allyn-Feuer, Ari; Dinov, Ivo D; Najarian, Kayvan; Athey, Brian D
2018-05-01
This Perspective provides examples of current and future applications of deep learning in pharmacogenomics, including: identification of novel regulatory variants located in noncoding domains of the genome and their function as applied to pharmacoepigenomics; patient stratification from medical records; and the mechanistic prediction of drug response, targets and their interactions. Deep learning encapsulates a family of machine learning algorithms that has transformed many important subfields of artificial intelligence over the last decade, and has demonstrated breakthrough performance improvements on a wide range of tasks in biomedicine. We anticipate that in the future, deep learning will be widely used to predict personalized drug response and optimize medication selection and dosing, using knowledge extracted from large and complex molecular, epidemiological, clinical and demographic datasets.
Fast and Accurate Approximation to Significance Tests in Genome-Wide Association Studies
Zhang, Yu; Liu, Jun S.
2011-01-01
Genome-wide association studies commonly involve simultaneous tests of millions of single nucleotide polymorphisms (SNP) for disease association. The SNPs in nearby genomic regions, however, are often highly correlated due to linkage disequilibrium (LD, a genetic term for correlation). Simple Bonferonni correction for multiple comparisons is therefore too conservative. Permutation tests, which are often employed in practice, are both computationally expensive for genome-wide studies and limited in their scopes. We present an accurate and computationally efficient method, based on Poisson de-clumping heuristics, for approximating genome-wide significance of SNP associations. Compared with permutation tests and other multiple comparison adjustment approaches, our method computes the most accurate and robust p-value adjustments for millions of correlated comparisons within seconds. We demonstrate analytically that the accuracy and the efficiency of our method are nearly independent of the sample size, the number of SNPs, and the scale of p-values to be adjusted. In addition, our method can be easily adopted to estimate false discovery rate. When applied to genome-wide SNP datasets, we observed highly variable p-value adjustment results evaluated from different genomic regions. The variation in adjustments along the genome, however, are well conserved between the European and the African populations. The p-value adjustments are significantly correlated with LD among SNPs, recombination rates, and SNP densities. Given the large variability of sequence features in the genome, we further discuss a novel approach of using SNP-specific (local) thresholds to detect genome-wide significant associations. This article has supplementary material online. PMID:22140288
Brumm, Phillip J.; Gowda, Krishne; Robb, Frank T.; ...
2016-12-20
In this study we report the complete genome sequence of the chemoorganotrophic, extremely thermophilic bacterium, Dictyoglomus turgidum, which is a Gram negative, strictly anaerobic bacterium. D. turgidum and D. thermophilum together form the Dictyoglomi phylum. The two Dictyoglomus genomes are highly syntenic, and both are distantly related to Caldicellulosiruptor spp. D. turgidum is able to grow on a wide variety of polysaccharide substrates due to significant genomic commitment to glycosyl hydrolases, 16 of which were cloned and expressed in our study. The GH5, GH10, and GH42 enzymes characterized in this study suggest that D. turgidum can utilize most plant-based polysaccharidesmore » except crystalline cellulose. The DNA polymerase I enzyme was also expressed and characterized. The pure enzyme showed improved amplification of long PCR targets compared to Taq polymerase. The genome contains a full complement of DNA modifying enzymes, and an unusually high copy number (4) of a new, ancestral family of polB type nucleotidyltransferases designated as MNT (minimal nucleotidyltransferases). Considering its optimal growth at 72°C, D. turgidum has an anomalously low G+C content of 39.9% that may account for the presence of reverse gyrase, usually associated with hyperthermophiles.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brumm, Phillip J.; Gowda, Krishne; Robb, Frank T.
In this study we report the complete genome sequence of the chemoorganotrophic, extremely thermophilic bacterium, Dictyoglomus turgidum, which is a Gram negative, strictly anaerobic bacterium. D. turgidum and D. thermophilum together form the Dictyoglomi phylum. The two Dictyoglomus genomes are highly syntenic, and both are distantly related to Caldicellulosiruptor spp. D. turgidum is able to grow on a wide variety of polysaccharide substrates due to significant genomic commitment to glycosyl hydrolases, 16 of which were cloned and expressed in our study. The GH5, GH10, and GH42 enzymes characterized in this study suggest that D. turgidum can utilize most plant-based polysaccharidesmore » except crystalline cellulose. The DNA polymerase I enzyme was also expressed and characterized. The pure enzyme showed improved amplification of long PCR targets compared to Taq polymerase. The genome contains a full complement of DNA modifying enzymes, and an unusually high copy number (4) of a new, ancestral family of polB type nucleotidyltransferases designated as MNT (minimal nucleotidyltransferases). Considering its optimal growth at 72°C, D. turgidum has an anomalously low G+C content of 39.9% that may account for the presence of reverse gyrase, usually associated with hyperthermophiles.« less
Optimization of genome editing through CRISPR-Cas9 engineering
Zhang, Jian-Hua; Adikaram, Poorni; Pandey, Mritunjay; Genis, Allison; Simonds, William F.
2016-01-01
ABSTRACT CRISPR (Clustered Regularly-Interspaced Short Palindromic Repeats)-Cas9 (CRISPR associated protein 9) has rapidly become the most promising genome editing tool with great potential to revolutionize medicine. Through guidance of a 20 nucleotide RNA (gRNA), CRISPR-Cas9 finds and cuts target protospacer DNA precisely 3 base pairs upstream of a PAM (Protospacer Adjacent Motif). The broken DNA ends are repaired by either NHEJ (Non-Homologous End Joining) resulting in small indels, or by HDR (Homology Directed Repair) for precise gene or nucleotide replacement. Theoretically, CRISPR-Cas9 could be used to modify any genomic sequences, thereby providing a simple, easy, and cost effective means of genome wide gene editing. However, the off-target activity of CRISPR-Cas9 that cuts DNA sites with imperfect matches with gRNA have been of significant concern because clinical applications require 100% accuracy. Additionally, CRISPR-Cas9 has unpredictable efficiency among different DNA target sites and the PAM requirements greatly restrict its genome editing frequency. A large number of efforts have been made to address these impeding issues, but much more is needed to fully realize the medical potential of CRISPR-Cas9. In this article, we summarize the existing problems and current advances of the CRISPR-Cas9 technology and provide perspectives for the ultimate perfection of Cas9-mediated genome editing. PMID:27340770
CRISPR Editing in Biological and Biomedical Investigation.
Ju, Xing-Da; Xu, Jing; Sun, Zhong Sheng
2018-01-01
The CRISPR (clustered regularly interspaced short palindromic repeat)-Cas (CRISPR-associated protein) system, a prokaryotic RNA-based adaptive immune system against viral infection, is emerging as a powerful genome editing tool in broad research areas. To further improve and expand its functionality, various CRISPR delivery strategies have been tested and optimized, and key CRISPR system components such as Cas protein have been engineered with different purposes. Benefiting from more in-depth understanding and further development of CRISPR, versatile CRISPR-based platforms for genome editing have been rapidly developed to advance investigations in biology and biomedicine. In biological research area, CRISPR has been widely adopted in both fundamental and applied research fields, such as genomic and epigenomic modification, genome-wide screening, cell and animal research, agriculture transforming, livestock breeding, food manufacture, industrial biotechnology, and gene drives in disease agents control. In biomedical research area, CRISPR has also shown its extensive applicability in the establishment of animal models for genetic disorders, generation of tissue donors, implementation of antimicrobial and antiviral studies, identification and assessment of new drugs, and even treatment for clinical diseases. However, there are still several problems to consider, and the biggest concerns are the off-target effects and ethical issues of this technology. In this prospect article, after highlighting recent development of CRISPR systems, we outline different applications and current limitations of CRISPR in biological and biomedical investigation. Finally, we provide a perspective on future development and potential risks of this multifunctional technology. J. Cell. Biochem. 119: 52-61, 2018. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Gao, Hui; Zhao, Chunyan
2018-01-01
Chromatin immunoprecipitation (ChIP) has become the most effective and widely used tool to study the interactions between specific proteins or modified forms of proteins and a genomic DNA region. Combined with genome-wide profiling technologies, such as microarray hybridization (ChIP-on-chip) or massively parallel sequencing (ChIP-seq), ChIP could provide a genome-wide mapping of in vivo protein-DNA interactions in various organisms. Here, we describe a protocol of ChIP-on-chip that uses tiling microarray to obtain a genome-wide profiling of ChIPed DNA.
GETPrime 2.0: gene- and transcript-specific qPCR primers for 13 species including polymorphisms.
David, Fabrice P A; Rougemont, Jacques; Deplancke, Bart
2017-01-04
GETPrime (http://bbcftools.epfl.ch/getprime) is a database with a web frontend providing gene- and transcript-specific, pre-computed qPCR primer pairs. The primers have been optimized for genome-wide specificity and for allowing the selective amplification of one or several splice variants of most known genes. To ease selection, primers have also been ranked according to defined criteria such as genome-wide specificity (with BLAST), amplicon size, and isoform coverage. Here, we report a major upgrade (2.0) of the database: eight new species (yeast, chicken, macaque, chimpanzee, rat, platypus, pufferfish, and Anolis carolinensis) now complement the five already included in the previous version (human, mouse, zebrafish, fly, and worm). Furthermore, the genomic reference has been updated to Ensembl v81 (while keeping earlier versions for backward compatibility) as a result of re-designing the back-end database and automating the import of relevant sections of the Ensembl database in species-independent fashion. This also allowed us to map known polymorphisms to the primers (on average three per primer for human), with the aim of reducing experimental error when targeting specific strains or individuals. Another consequence is that the inclusion of future Ensembl releases and other species has now become a relatively straightforward task. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Ahmad, Meraj; Sinha, Anubhav; Ghosh, Sreya; Kumar, Vikrant; Davila, Sonia; Yajnik, Chittaranjan S; Chandak, Giriraj R
2017-07-27
Imputation is a computational method based on the principle of haplotype sharing allowing enrichment of genome-wide association study datasets. It depends on the haplotype structure of the population and density of the genotype data. The 1000 Genomes Project led to the generation of imputation reference panels which have been used globally. However, recent studies have shown that population-specific panels provide better enrichment of genome-wide variants. We compared the imputation accuracy using 1000 Genomes phase 3 reference panel and a panel generated from genome-wide data on 407 individuals from Western India (WIP). The concordance of imputed variants was cross-checked with next-generation re-sequencing data on a subset of genomic regions. Further, using the genome-wide data from 1880 individuals, we demonstrate that WIP works better than the 1000 Genomes phase 3 panel and when merged with it, significantly improves the imputation accuracy throughout the minor allele frequency range. We also show that imputation using only South Asian component of the 1000 Genomes phase 3 panel works as good as the merged panel, making it computationally less intensive job. Thus, our study stresses that imputation accuracy using 1000 Genomes phase 3 panel can be further improved by including population-specific reference panels from South Asia.
Cericola, Fabio; Jahoor, Ahmed; Orabi, Jihad; Andersen, Jeppe R; Janss, Luc L; Jensen, Just
2017-01-01
Wheat breeding programs generate a large amount of variation which cannot be completely explored because of limited phenotyping throughput. Genomic prediction (GP) has been proposed as a new tool which provides breeding values estimations without the need of phenotyping all the material produced but only a subset of it named training population (TP). However, genotyping of all the accessions under analysis is needed and, therefore, optimizing TP dimension and genotyping strategy is pivotal to implement GP in commercial breeding schemes. Here, we explored the optimum TP size and we integrated pedigree records and genome wide association studies (GWAS) results to optimize the genotyping strategy. A total of 988 advanced wheat breeding lines were genotyped with the Illumina 15K SNPs wheat chip and phenotyped across several years and locations for yield, lodging, and starch content. Cross-validation using the largest possible TP size and all the SNPs available after editing (~11k), yielded predictive abilities (rGP) ranging between 0.5-0.6. In order to explore the Training population size, rGP were computed using progressively smaller TP. These exercises showed that TP of around 700 lines were enough to yield the highest observed rGP. Moreover, rGP were calculated by randomly reducing the SNPs number. This showed that around 1K markers were enough to reach the highest observed rGP. GWAS was used to identify markers associated with the traits analyzed. A GWAS-based selection of SNPs resulted in increased rGP when compared with random selection and few hundreds SNPs were sufficient to obtain the highest observed rGP. For each of these scenarios, advantages of adding the pedigree information were shown. Our results indicate that moderate TP sizes were enough to yield high rGP and that pedigree information and GWAS results can be used to greatly optimize the genotyping strategy.
[Progress in omics research of Aspergillus niger].
Sui, Yufei; Ouyang, Liming; Lu, Hongzhong; Zhuang, Yingping; Zhang, Siliang
2016-08-25
Aspergillus niger, as an important industrial fermentation strain, is widely applied in the production of organic acids and industrial enzymes. With the development of diverse omics technologies, the data of genome, transcriptome, proteome and metabolome of A. niger are increasing continuously, which declared the coming era of big data for the research in fermentation process of A. niger. The data analysis from single omics and the comparison of multi-omics, to the integrations of multi-omics based on the genome-scale metabolic network model largely extends the intensive and systematic understanding of the efficient production mechanism of A. niger. It also provides possibilities for the reasonable global optimization of strain performance by genetic modification and process regulation. We reviewed and summarized progress in omics research of A. niger, and proposed the development direction of omics research on this cell factory.
The Cancer Genomics Hub (CGHub): overcoming cancer through the power of torrential data
Wilks, Christopher; Cline, Melissa S.; Weiler, Erich; Diehkans, Mark; Craft, Brian; Martin, Christy; Murphy, Daniel; Pierce, Howdy; Black, John; Nelson, Donavan; Litzinger, Brian; Hatton, Thomas; Maltbie, Lori; Ainsworth, Michael; Allen, Patrick; Rosewood, Linda; Mitchell, Elizabeth; Smith, Bradley; Warner, Jim; Groboske, John; Telc, Haifang; Wilson, Daniel; Sanford, Brian; Schmidt, Hannes; Haussler, David; Maltbie, Daniel
2014-01-01
The Cancer Genomics Hub (CGHub) is the online repository of the sequencing programs of the National Cancer Institute (NCI), including The Cancer Genomics Atlas (TCGA), the Cancer Cell Line Encyclopedia (CCLE) and the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) projects, with data from 25 different types of cancer. The CGHub currently contains >1.4 PB of data, has grown at an average rate of 50 TB a month and serves >100 TB per week. The architecture of CGHub is designed to support bulk searching and downloading through a Web-accessible application programming interface, enforce patient genome confidentiality in data storage and transmission and optimize for efficiency in access and transfer. In this article, we describe the design of these three components, present performance results for our transfer protocol, GeneTorrent, and finally report on the growth of the system in terms of data stored and transferred, including estimated limits on the current architecture. Our experienced-based estimates suggest that centralizing storage and computational resources is more efficient than wide distribution across many satellite labs. Database URL: https://cghub.ucsc.edu PMID:25267794
Simulation-based comprehensive benchmarking of RNA-seq aligners
Baruzzo, Giacomo; Hayer, Katharina E; Kim, Eun Ji; Di Camillo, Barbara; FitzGerald, Garret A; Grant, Gregory R
2018-01-01
Alignment is the first step in most RNA-seq analysis pipelines, and the accuracy of downstream analyses depends heavily on it. Unlike most steps in the pipeline, alignment is particularly amenable to benchmarking with simulated data. We performed a comprehensive benchmarking of 14 common splice-aware aligners for base, read, and exon junction-level accuracy and compared default with optimized parameters. We found that performance varied by genome complexity, and accuracy and popularity were poorly correlated. The most widely cited tool underperforms for most metrics, particularly when using default settings. PMID:27941783
StereoGene: rapid estimation of genome-wide correlation of continuous or interval feature data.
Stavrovskaya, Elena D; Niranjan, Tejasvi; Fertig, Elana J; Wheelan, Sarah J; Favorov, Alexander V; Mironov, Andrey A
2017-10-15
Genomics features with similar genome-wide distributions are generally hypothesized to be functionally related, for example, colocalization of histones and transcription start sites indicate chromatin regulation of transcription factor activity. Therefore, statistical algorithms to perform spatial, genome-wide correlation among genomic features are required. Here, we propose a method, StereoGene, that rapidly estimates genome-wide correlation among pairs of genomic features. These features may represent high-throughput data mapped to reference genome or sets of genomic annotations in that reference genome. StereoGene enables correlation of continuous data directly, avoiding the data binarization and subsequent data loss. Correlations are computed among neighboring genomic positions using kernel correlation. Representing the correlation as a function of the genome position, StereoGene outputs the local correlation track as part of the analysis. StereoGene also accounts for confounders such as input DNA by partial correlation. We apply our method to numerous comparisons of ChIP-Seq datasets from the Human Epigenome Atlas and FANTOM CAGE to demonstrate its wide applicability. We observe the changes in the correlation between epigenomic features across developmental trajectories of several tissue types consistent with known biology and find a novel spatial correlation of CAGE clusters with donor splice sites and with poly(A) sites. These analyses provide examples for the broad applicability of StereoGene for regulatory genomics. The StereoGene C ++ source code, program documentation, Galaxy integration scripts and examples are available from the project homepage http://stereogene.bioinf.fbb.msu.ru/. favorov@sensi.org. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Wain, Louise V.; Pedroso, Inti; Landers, John E.; Breen, Gerome; Shaw, Christopher E.; Leigh, P. Nigel; Brown, Robert H.
2009-01-01
Background The genetic contribution to sporadic amyotrophic lateral sclerosis (ALS) has not been fully elucidated. There are increasing efforts to characterise the role of copy number variants (CNVs) in human diseases; two previous studies concluded that CNVs may influence risk of sporadic ALS, with multiple rare CNVs more important than common CNVs. A little-explored issue surrounding genome-wide CNV association studies is that of post-calling filtering and merging of raw CNV calls. We undertook simulations to define filter thresholds and considered optimal ways of merging overlapping CNV calls for association testing, taking into consideration possibly overlapping or nested, but distinct, CNVs and boundary estimation uncertainty. Methodology and Principal Findings In this study we screened Illumina 300K SNP genotyping data from 730 ALS cases and 789 controls for copy number variation. Following quality control filters using thresholds defined by simulation, a total of 11321 CNV calls were made across 575 cases and 621 controls. Using region-based and gene-based association analyses, we identified several loci showing nominally significant association. However, the choice of criteria for combining calls for association testing has an impact on the ranking of the results by their significance. Several loci which were previously reported as being associated with ALS were identified here. However, of another 15 genes previously reported as exhibiting ALS-specific copy number variation, only four exhibited copy number variation in this study. Potentially interesting novel loci, including EEF1D, a translation elongation factor involved in the delivery of aminoacyl tRNAs to the ribosome (a process which has previously been implicated in genetic studies of spinal muscular atrophy) were identified but must be treated with caution due to concerns surrounding genomic location and platform suitability. Conclusions and Significance Interpretation of CNV association findings must take into account the effects of filtering and combining CNV calls when based on early genome-wide genotyping platforms and modest study sizes. PMID:19997636
GWATCH: a web platform for automated gene association discovery analysis.
Svitin, Anton; Malov, Sergey; Cherkasov, Nikolay; Geerts, Paul; Rotkevich, Mikhail; Dobrynin, Pavel; Shevchenko, Andrey; Guan, Li; Troyer, Jennifer; Hendrickson, Sher; Dilks, Holli Hutcheson; Oleksyk, Taras K; Donfield, Sharyne; Gomperts, Edward; Jabs, Douglas A; Sezgin, Efe; Van Natta, Mark; Harrigan, P Richard; Brumme, Zabrina L; O'Brien, Stephen J
2014-01-01
As genome-wide sequence analyses for complex human disease determinants are expanding, it is increasingly necessary to develop strategies to promote discovery and validation of potential disease-gene associations. Here we present a dynamic web-based platform - GWATCH - that automates and facilitates four steps in genetic epidemiological discovery: 1) Rapid gene association search and discovery analysis of large genome-wide datasets; 2) Expanded visual display of gene associations for genome-wide variants (SNPs, indels, CNVs), including Manhattan plots, 2D and 3D snapshots of any gene region, and a dynamic genome browser illustrating gene association chromosomal regions; 3) Real-time validation/replication of candidate or putative genes suggested from other sources, limiting Bonferroni genome-wide association study (GWAS) penalties; 4) Open data release and sharing by eliminating privacy constraints (The National Human Genome Research Institute (NHGRI) Institutional Review Board (IRB), informed consent, The Health Insurance Portability and Accountability Act (HIPAA) of 1996 etc.) on unabridged results, which allows for open access comparative and meta-analysis. GWATCH is suitable for both GWAS and whole genome sequence association datasets. We illustrate the utility of GWATCH with three large genome-wide association studies for HIV-AIDS resistance genes screened in large multicenter cohorts; however, association datasets from any study can be uploaded and analyzed by GWATCH.
Pharmacogenomics in neurology: current state and future steps.
Chan, Andrew; Pirmohamed, Munir; Comabella, Manuel
2011-11-01
In neurology, as in any other clinical specialty, there is a need to develop treatment strategies that allow stratification of therapies to optimize efficacy and minimize toxicity. Pharmacogenomics is one such method for therapy optimization: it aims to elucidate the relationship between human genome sequence variation and differential drug responses. Approaches have focused on candidate approaches investigating absorption-, distribution-, metabolism, and elimination (ADME)-related genes (pharmacokinetic pathways), and potential drug targets (pharmacodynamic pathways). To date, however, only few genetic variants have been incorporated into clinical algorithms. Unfortunately, a large number of studies have thrown up contradictory results due to a number of deficiencies, including small sample sizes, inadequate phenotyping, and genotyping strategies. Thus, there still exists an urgent need to establish biomarkers that could help to select for patients with an optimal benefit to risk relationship. Here we review recent advances, and limitations, in pharmacogenomics for agents used in neuroimmunology, neurodegenerative diseases, ischemic stroke, epilepsy, and primary headaches. Further work is still required in all of these areas, which really needs to progress on several fronts, including better standardized phenotyping, appropriate sample sizes through multicenter collaborations and judicious use of new technological advances such as genome-wide approaches, next generation sequencing and systems biology. In time, this is likely to lead to improvements in the benefit-harm balance of neurological therapies, cost efficiency, and identification of new drugs. Copyright © 2011 American Neurological Association.
The solution space of sorting by DCJ.
Braga, Marília D V; Stoye, Jens
2010-09-01
In genome rearrangements, the double cut and join (DCJ) operation, introduced by Yancopoulos et al. in 2005, allows one to represent most rearrangement events that could happen in multichromosomal genomes, such as inversions, translocations, fusions, and fissions. No restriction on the genome structure considering linear and circular chromosomes is imposed. An advantage of this general model is that it leads to considerable algorithmic simplifications compared to other genome rearrangement models. Recently, several works concerning the DCJ operation have been published, and in particular, an algorithm was proposed to find an optimal DCJ sequence for sorting one genome into another one. Here we study the solution space of this problem and give an easy-to-compute formula that corresponds to the exact number of optimal DCJ sorting sequences for a particular subset of instances of the problem. We also give an algorithm to count the number of optimal sorting sequences for any instance of the problem. Another interesting result is the demonstration of the possibility of obtaining one optimal sorting sequence by properly replacing any pair of consecutive operations in another optimal sequence. As a consequence, any optimal sorting sequence can be obtained from one other by applying such replacements successively, but the problem of finding the shortest number of replacements between two sorting sequences is still open.
USDA-ARS?s Scientific Manuscript database
We generated 13,789 single nucleotide plymorphism (SNP) markers from 97 melon accessions using genotyping by sequencing and anchored them to chromosomes to understand genome-wide fixation index between various melon morphotypes and linkage disequilibrium (LD) decay for inodorus and cantalupensis, th...
Mutational Dynamics of Aroid Chloroplast Genomes
Ahmed, Ibrar; Biggs, Patrick J.; Matthews, Peter J.; Collins, Lesley J.; Hendy, Michael D.; Lockhart, Peter J.
2012-01-01
A characteristic feature of eukaryote and prokaryote genomes is the co-occurrence of nucleotide substitution and insertion/deletion (indel) mutations. Although similar observations have also been made for chloroplast DNA, genome-wide associations have not been reported. We determined the chloroplast genome sequences for two morphotypes of taro (Colocasia esculenta; family Araceae) and compared these with four publicly available aroid chloroplast genomes. Here, we report the extent of genome-wide association between direct and inverted repeats, indels, and substitutions in these aroid chloroplast genomes. We suggest that alternative but not mutually exclusive hypotheses explain the mutational dynamics of chloroplast genome evolution. PMID:23204304
Genome-wide analysis of tandem repeats in plants and green algae
Zhixin Zhao; Cheng Guo; Sreeskandarajan Sutharzan; Pei Li; Craig Echt; Jie Zhang; Chun Liang
2014-01-01
Tandem repeats (TRs) extensively exist in the genomes of prokaryotes and eukaryotes. Based on the sequenced genomes and gene annotations of 31 plant and algal species in Phytozome version 8.0 (http://www.phytozome.net/), we examined TRs in a genome-wide scale, characterized their distributions and motif features, and explored their putative biological functions. Among...
Investigation of common, low-frequency and rare genome-wide variation in anorexia nervosa
Huckins, L M; Hatzikotoulas, K; Southam, L; Thornton, L M; Steinberg, J; Aguilera-McKay, F; Treasure, J; Schmidt, U; Gunasinghe, C; Romero, A; Curtis, C; Rhodes, D; Moens, J; Kalsi, G; Dempster, D; Leung, R; Keohane, A; Burghardt, R; Ehrlich, S; Hebebrand, J; Hinney, A; Ludolph, A; Walton, E; Deloukas, P; Hofman, A; Palotie, A; Palta, P; van Rooij, F J A; Stirrups, K; Adan, R; Boni, C; Cone, R; Dedoussis, G; van Furth, E; Gonidakis, F; Gorwood, P; Hudson, J; Kaprio, J; Kas, M; Keski-Rahonen, A; Kiezebrink, K; Knudsen, G-P; Slof-Op 't Landt, M C T; Maj, M; Monteleone, A M; Monteleone, P; Raevuori, A H; Reichborn-Kjennerud, T; Tozzi, F; Tsitsika, A; van Elburg, A; Adan, R A H; Alfredsson, L; Ando, T; Andreassen, O A; Aschauer, H; Baker, J H; Barrett, J C; Bencko, V; Bergen, A W; Berrettini, W H; Birgegard, A; Boni, C; Boraska Perica, V; Brandt, H; Breen, G; Bulik, C M; Carlberg, L; Cassina, M; Cichon, S; Clementi, M; Cohen-Woods, S; Coleman, J; Cone, R D; Courtet, P; Crawford, S; Crow, S; Crowley, J; Danner, U N; Davis, O S P; de Zwaan, M; Dedoussis, G; Degortes, D; DeSocio, J E; Dick, D M; Dikeos, D; Dina, C; Ding, B; Dmitrzak-Weglarz, M; Docampo, E; Duncan, L; Egberts, K; Ehrlich, S; Escaramís, G; Esko, T; Espeseth, T; Estivill, X; Favaro, A; Fernández-Aranda, F; Fichter, M M; Finan, C; Fischer, K; Floyd, J A B; Foretova, L; Forzan, M; Franklin, C S; Gallinger, S; Gambaro, G; Gaspar, H A; Giegling, I; Gonidakis, F; Gorwood, P; Gratacos, M; Guillaume, S; Guo, Y; Hakonarson, H; Halmi, K A; Hatzikotoulas, K; Hauser, J; Hebebrand, J; Helder, S; Herms, S; Herpertz-Dahlmann, B; Herzog, W; Hilliard, C E; Hinney, A; Hübel, C; Huckins, L M; Hudson, J I; Huemer, J; Inoko, H; Janout, V; Jiménez-Murcia, S; Johnson, C; Julià, A; Juréus, A; Kalsi, G; Kaminska, D; Kaplan, A S; Kaprio, J; Karhunen, L; Karwautz, A; Kas, M J H; Kaye, W; Kennedy, J L; Keski-Rahkonen, A; Kiezebrink, K; Klareskog, L; Klump, K L; Knudsen, G P S; Koeleman, B P C; Koubek, D; La Via, M C; Landén, M; Le Hellard, S; Levitan, R D; Li, D; Lichtenstein, P; Lilenfeld, L; Lissowska, J; Lundervold, A; Magistretti, P; Maj, M; Mannik, K; Marsal, S; Martin, N; Mattingsdal, M; McDevitt, S; McGuffin, P; Merl, E; Metspalu, A; Meulenbelt, I; Micali, N; Mitchell, J; Mitchell, K; Monteleone, P; Monteleone, A M; Mortensen, P; Munn-Chernoff, M A; Navratilova, M; Nilsson, I; Norring, C; Ntalla, I; Ophoff, R A; O'Toole, J K; Palotie, A; Pante, J; Papezova, H; Pinto, D; Rabionet, R; Raevuori, A; Rajewski, A; Ramoz, N; Rayner, N W; Reichborn-Kjennerud, T; Ripatti, S; Roberts, M; Rotondo, A; Rujescu, D; Rybakowski, F; Santonastaso, P; Scherag, A; Scherer, S W; Schmidt, U; Schork, N J; Schosser, A; Slachtova, L; Sladek, R; Slagboom, P E; Slof-Op 't Landt, M C T; Slopien, A; Soranzo, N; Southam, L; Steen, V M; Strengman, E; Strober, M; Sullivan, P F; Szatkiewicz, J P; Szeszenia-Dabrowska, N; Tachmazidou, I; Tenconi, E; Thornton, L M; Tortorella, A; Tozzi, F; Treasure, J; Tsitsika, A; Tziouvas, K; van Elburg, A A; van Furth, E F; Wagner, G; Walton, E; Watson, H; Wichmann, H-E; Widen, E; Woodside, D B; Yanovski, J; Yao, S; Yilmaz, Z; Zeggini, E; Zerwas, S; Zipfel, S; Collier, D A; Sullivan, P F; Breen, G; Bulik, C M; Zeggini, E
2018-01-01
Anorexia nervosa (AN) is a complex neuropsychiatric disorder presenting with dangerously low body weight, and a deep and persistent fear of gaining weight. To date, only one genome-wide significant locus associated with AN has been identified. We performed an exome-chip based genome-wide association studies (GWAS) in 2158 cases from nine populations of European origin and 15 485 ancestrally matched controls. Unlike previous studies, this GWAS also probed association in low-frequency and rare variants. Sixteen independent variants were taken forward for in silico and de novo replication (11 common and 5 rare). No findings reached genome-wide significance. Two notable common variants were identified: rs10791286, an intronic variant in OPCML (P=9.89 × 10−6), and rs7700147, an intergenic variant (P=2.93 × 10−5). No low-frequency variant associations were identified at genome-wide significance, although the study was well-powered to detect low-frequency variants with large effect sizes, suggesting that there may be no AN loci in this genomic search space with large effect sizes. PMID:29155802
Investigation of common, low-frequency and rare genome-wide variation in anorexia nervosa.
Huckins, L M; Hatzikotoulas, K; Southam, L; Thornton, L M; Steinberg, J; Aguilera-McKay, F; Treasure, J; Schmidt, U; Gunasinghe, C; Romero, A; Curtis, C; Rhodes, D; Moens, J; Kalsi, G; Dempster, D; Leung, R; Keohane, A; Burghardt, R; Ehrlich, S; Hebebrand, J; Hinney, A; Ludolph, A; Walton, E; Deloukas, P; Hofman, A; Palotie, A; Palta, P; van Rooij, F J A; Stirrups, K; Adan, R; Boni, C; Cone, R; Dedoussis, G; van Furth, E; Gonidakis, F; Gorwood, P; Hudson, J; Kaprio, J; Kas, M; Keski-Rahonen, A; Kiezebrink, K; Knudsen, G-P; Slof-Op 't Landt, M C T; Maj, M; Monteleone, A M; Monteleone, P; Raevuori, A H; Reichborn-Kjennerud, T; Tozzi, F; Tsitsika, A; van Elburg, A; Collier, D A; Sullivan, P F; Breen, G; Bulik, C M; Zeggini, E
2018-05-01
Anorexia nervosa (AN) is a complex neuropsychiatric disorder presenting with dangerously low body weight, and a deep and persistent fear of gaining weight. To date, only one genome-wide significant locus associated with AN has been identified. We performed an exome-chip based genome-wide association studies (GWAS) in 2158 cases from nine populations of European origin and 15 485 ancestrally matched controls. Unlike previous studies, this GWAS also probed association in low-frequency and rare variants. Sixteen independent variants were taken forward for in silico and de novo replication (11 common and 5 rare). No findings reached genome-wide significance. Two notable common variants were identified: rs10791286, an intronic variant in OPCML (P=9.89 × 10 -6 ), and rs7700147, an intergenic variant (P=2.93 × 10 -5 ). No low-frequency variant associations were identified at genome-wide significance, although the study was well-powered to detect low-frequency variants with large effect sizes, suggesting that there may be no AN loci in this genomic search space with large effect sizes.
Genetic Control of Plant Root Colonization by the Biocontrol agent, Pseudomonas fluorescens
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cole, Benjamin J.; Fletcher, Meghan; Waters, Jordan
Plant growth promoting rhizobacteria (PGPR) are a critical component of plant root ecosystems. PGPR promote plant growth by solubilizing inaccessible minerals, suppressing pathogenic microorganisms in the soil, and directly stimulating growth through hormone synthesis. Pseudomonas fluorescens is a well-established PGPR isolated from wheat roots that can also colonize the root system of the model plant, Arabidopsis thaliana. We have created barcoded transposon insertion mutant libraries suitable for genome-wide transposon-mediated mutagenesis followed by sequencing (TnSeq). These libraries consist of over 105 independent insertions, collectively providing loss-of-function mutants for nearly all genes in the P.fluorescens genome. Each insertion mutant can be unambiguouslymore » identified by a randomized 20 nucleotide sequence (barcode) engineered into the transposon sequence. We used these libraries in a gnotobiotic assay to examine the colonization ability of P.fluorescens on A.thaliana roots. Taking advantage of the ability to distinguish individual colonization events using barcode sequences, we assessed the timing and microbial concentration dependence of colonization of the rhizoplane niche. These data provide direct insight into the dynamics of plant root colonization in an in vivo system and define baseline parameters for the systematic identification of the bacterial genes and molecular pathways using TnSeq assays. Having determined parameters that facilitate potential colonization of roots by thousands of independent insertion mutants in a single assay, we are currently establishing a genome-wide functional map of genes required for root colonization in P.fluorescens. Importantly, the approach developed and optimized here for P.fluorescens>A.thaliana colonization will be applicable to a wide range of plant-microbe interactions, including biofuel feedstock plants and microbes known or hypothesized to impact on biofuel-relevant traits including biomass productivity and pathogen resistance.« less
Chaudhary, Neha; Tøndel, Kristin; Bhatnagar, Rakesh; dos Santos, Vítor A P Martins; Puchałka, Jacek
2016-03-01
Genome-Scale Metabolic Reconstructions (GSMRs), along with optimization-based methods, predominantly Flux Balance Analysis (FBA) and its derivatives, are widely applied for assessing and predicting the behavior of metabolic networks upon perturbation, thereby enabling identification of potential novel drug targets and biotechnologically relevant pathways. The abundance of alternate flux profiles has led to the evolution of methods to explore the complete solution space aiming to increase the accuracy of predictions. Herein we present a novel, generic algorithm to characterize the entire flux space of GSMR upon application of FBA, leading to the optimal value of the objective (the optimal flux space). Our method employs Modified Latin-Hypercube Sampling (LHS) to effectively border the optimal space, followed by Principal Component Analysis (PCA) to identify and explain the major sources of variability within it. The approach was validated with the elementary mode analysis of a smaller network of Saccharomyces cerevisiae and applied to the GSMR of Pseudomonas aeruginosa PAO1 (iMO1086). It is shown to surpass the commonly used Monte Carlo Sampling (MCS) in providing a more uniform coverage for a much larger network in less number of samples. Results show that although many fluxes are identified as variable upon fixing the objective value, majority of the variability can be reduced to several main patterns arising from a few alternative pathways. In iMO1086, initial variability of 211 reactions could almost entirely be explained by 7 alternative pathway groups. These findings imply that the possibilities to reroute greater portions of flux may be limited within metabolic networks of bacteria. Furthermore, the optimal flux space is subject to change with environmental conditions. Our method may be a useful device to validate the predictions made by FBA-based tools, by describing the optimal flux space associated with these predictions, thus to improve them.
Microeconomic principles explain an optimal genome size in bacteria.
Ranea, Juan A G; Grant, Alastair; Thornton, Janet M; Orengo, Christine A
2005-01-01
Bacteria can clearly enhance their survival by expanding their genetic repertoire. However, the tight packing of the bacterial genome and the fact that the most evolved species do not necessarily have the biggest genomes suggest there are other evolutionary factors limiting their genome expansion. To clarify these restrictions on size, we studied those protein families contributing most significantly to bacterial-genome complexity. We found that all bacteria apply the same basic and ancestral 'molecular technology' to optimize their reproductive efficiency. The same microeconomics principles that define the optimum size in a factory can also explain the existence of a statistical optimum in bacterial genome size. This optimum is reached when the bacterial genome obtains the maximum metabolic complexity (revenue) for minimal regulatory genes (logistic cost).
Performance of polygenic scores for predicting phobic anxiety.
Walter, Stefan; Glymour, M Maria; Koenen, Karestan; Liang, Liming; Tchetgen Tchetgen, Eric J; Cornelis, Marilyn; Chang, Shun-Chiao; Rimm, Eric; Kawachi, Ichiro; Kubzansky, Laura D
2013-01-01
Anxiety disorders are common, with a lifetime prevalence of 20% in the U.S., and are responsible for substantial burdens of disability, missed work days and health care utilization. To date, no causal genetic variants have been identified for anxiety, anxiety disorders, or related traits. To investigate whether a phobic anxiety symptom score was associated with 3 alternative polygenic risk scores, derived from external genome-wide association studies of anxiety, an internally estimated agnostic polygenic score, or previously identified candidate genes. Longitudinal follow-up study. Using linear and logistic regression we investigated whether phobic anxiety was associated with polygenic risk scores derived from internal, leave-one out genome-wide association studies, from 31 candidate genes, and from out-of-sample genome-wide association weights previously shown to predict depression and anxiety in another cohort. Study participants (n = 11,127) were individuals from the Nurses' Health Study and Health Professionals Follow-up Study. Anxiety symptoms were assessed via the 8-item phobic anxiety scale of the Crown Crisp Index at two time points, from which a continuous phenotype score was derived. We found no genome-wide significant associations with phobic anxiety. Phobic anxiety was also not associated with a polygenic risk score derived from the genome-wide association study beta weights using liberal p-value thresholds; with a previously published genome-wide polygenic score; or with a candidate gene risk score based on 31 genes previously hypothesized to predict anxiety. There is a substantial gap between twin-study heritability estimates of anxiety disorders ranging between 20-40% and heritability explained by genome-wide association results. New approaches such as improved genome imputations, application of gene expression and biological pathways information, and incorporating social or environmental modifiers of genetic risks may be necessary to identify significant genetic predictors of anxiety.
CGEMS identifies common inherited genetic variations associated with a number of cancers, including breast and prostate. Data from these genome-wide association studies (GWAS) are available through the Division of Cancer Epidemiology & Genetics website.
Rincent, R; Laloë, D; Nicolas, S; Altmann, T; Brunel, D; Revilla, P; Rodríguez, V M; Moreno-Gonzalez, J; Melchinger, A; Bauer, E; Schoen, C-C; Meyer, N; Giauffret, C; Bauland, C; Jamin, P; Laborde, J; Monod, H; Flament, P; Charcosset, A; Moreau, L
2012-10-01
Genomic selection refers to the use of genotypic information for predicting breeding values of selection candidates. A prediction formula is calibrated with the genotypes and phenotypes of reference individuals constituting the calibration set. The size and the composition of this set are essential parameters affecting the prediction reliabilities. The objective of this study was to maximize reliabilities by optimizing the calibration set. Different criteria based on the diversity or on the prediction error variance (PEV) derived from the realized additive relationship matrix-best linear unbiased predictions model (RA-BLUP) were used to select the reference individuals. For the latter, we considered the mean of the PEV of the contrasts between each selection candidate and the mean of the population (PEVmean) and the mean of the expected reliabilities of the same contrasts (CDmean). These criteria were tested with phenotypic data collected on two diversity panels of maize (Zea mays L.) genotyped with a 50k SNPs array. In the two panels, samples chosen based on CDmean gave higher reliabilities than random samples for various calibration set sizes. CDmean also appeared superior to PEVmean, which can be explained by the fact that it takes into account the reduction of variance due to the relatedness between individuals. Selected samples were close to optimality for a wide range of trait heritabilities, which suggests that the strategy presented here can efficiently sample subsets in panels of inbred lines. A script to optimize reference samples based on CDmean is available on request.
USDA-ARS?s Scientific Manuscript database
Sorghum is the second cereal crop to have a full genome completely sequenced (Nature (2009), 457:551). This achievement is widely recognized as a scientific milestone for grass genetics and genomics in general. However, the true worth of genetic information lies in translating the sequence informa...
Hockenberry, Adam J; Pah, Adam R; Jewett, Michael C; Amaral, Luís A N
2017-01-01
Studies dating back to the 1970s established that sequence complementarity between the anti-Shine-Dalgarno (aSD) sequence on prokaryotic ribosomes and the 5' untranslated region of mRNAs helps to facilitate translation initiation. The optimal location of aSD sequence binding relative to the start codon, the full extents of the aSD sequence and the functional form of the relationship between aSD sequence complementarity and translation efficiency have not been fully resolved. Here, we investigate these relationships by leveraging the sequence diversity of endogenous genes and recently available genome-wide estimates of translation efficiency. We show that-after accounting for predicted mRNA structure-aSD sequence complementarity increases the translation of endogenous mRNAs by roughly 50%. Further, we observe that this relationship is nonlinear, with translation efficiency maximized for mRNAs with intermediate levels of aSD sequence complementarity. The mechanistic insights that we observe are highly robust: we find nearly identical results in multiple datasets spanning three distantly related bacteria. Further, we verify our main conclusions by re-analysing a controlled experimental dataset. © 2017 The Authors.
Engineered CRISPR/Cas9 system for multiplex genome engineering of polyploid industrial yeast strains
Lian, Jiazhang; Bao, Zehua; Hu, Sumeng; ...
2018-02-20
The CRISPR/Cas9 system has been widely used for multiplex genome engineering of Saccharomyces cerevisiae. Furthermore, its application in manipulating industrial yeast strains is less successful, probably due to the genome complexity and low copy numbers of gRNA expression plasmids. Here we developed an efficient CRISPR/Cas9 system for industrial yeast strain engineering by using our previously engineered plasmids with increased copy numbers. Four genes in both a diploid strain (Ethanol Red, 8 alleles in total) and a triploid strain (ATCC 4124, 12 alleles in total) were knocked out in a single step with 100% efficiency. This system was used to constructmore » xylose-fermenting, lactate-producing industrial yeast strains, in which ALD6, PHO13, LEU2, and URA3 were disrupted in a single step followed by the introduction of a xylose utilization pathway and a lactate biosynthetic pathway on auxotrophic marker plasmids. The optimized CRISPR/Cas9 system provides a powerful tool for the development of industrial yeast based microbial cell factories.« less
Engineered CRISPR/Cas9 system for multiplex genome engineering of polyploid industrial yeast strains
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lian, Jiazhang; Bao, Zehua; Hu, Sumeng
The CRISPR/Cas9 system has been widely used for multiplex genome engineering of Saccharomyces cerevisiae. Furthermore, its application in manipulating industrial yeast strains is less successful, probably due to the genome complexity and low copy numbers of gRNA expression plasmids. Here we developed an efficient CRISPR/Cas9 system for industrial yeast strain engineering by using our previously engineered plasmids with increased copy numbers. Four genes in both a diploid strain (Ethanol Red, 8 alleles in total) and a triploid strain (ATCC 4124, 12 alleles in total) were knocked out in a single step with 100% efficiency. This system was used to constructmore » xylose-fermenting, lactate-producing industrial yeast strains, in which ALD6, PHO13, LEU2, and URA3 were disrupted in a single step followed by the introduction of a xylose utilization pathway and a lactate biosynthetic pathway on auxotrophic marker plasmids. The optimized CRISPR/Cas9 system provides a powerful tool for the development of industrial yeast based microbial cell factories.« less
Lian, Jiazhang; Bao, Zehua; Hu, Sumeng; Zhao, Huimin
2018-06-01
The CRISPR/Cas9 system has been widely used for multiplex genome engineering of Saccharomyces cerevisiae. However, its application in manipulating industrial yeast strains is less successful, probably due to the genome complexity and low copy numbers of gRNA expression plasmids. Here we developed an efficient CRISPR/Cas9 system for industrial yeast strain engineering by using our previously engineered plasmids with increased copy numbers. Four genes in both a diploid strain (Ethanol Red, 8 alleles in total) and a triploid strain (ATCC 4124, 12 alleles in total) were knocked out in a single step with 100% efficiency. This system was used to construct xylose-fermenting, lactate-producing industrial yeast strains, in which ALD6, PHO13, LEU2, and URA3 were disrupted in a single step followed by the introduction of a xylose utilization pathway and a lactate biosynthetic pathway on auxotrophic marker plasmids. The optimized CRISPR/Cas9 system provides a powerful tool for the development of industrial yeast based microbial cell factories. © 2018 Wiley Periodicals, Inc.
Spontaneous Mutation Rate in the Smallest Photosynthetic Eukaryotes
Krasovec, Marc; Eyre-Walker, Adam; Sanchez-Ferandin, Sophie
2017-01-01
Abstract Mutation is the ultimate source of genetic variation, and knowledge of mutation rates is fundamental for our understanding of all evolutionary processes. High throughput sequencing of mutation accumulation lines has provided genome wide spontaneous mutation rates in a dozen model species, but estimates from nonmodel organisms from much of the diversity of life are very limited. Here, we report mutation rates in four haploid marine bacterial-sized photosynthetic eukaryotic algae; Bathycoccus prasinos, Ostreococcus tauri, Ostreococcus mediterraneus, and Micromonas pusilla. The spontaneous mutation rate between species varies from μ = 4.4 × 10−10 to 9.8 × 10−10 mutations per nucleotide per generation. Within genomes, there is a two-fold increase of the mutation rate in intergenic regions, consistent with an optimization of mismatch and transcription-coupled DNA repair in coding sequences. Additionally, we show that deviation from the equilibrium GC content increases the mutation rate by ∼2% to ∼12% because of a GC bias in coding sequences. More generally, the difference between the observed and equilibrium GC content of genomes explains some of the inter-specific variation in mutation rates. PMID:28379581
Targeting Super-Enhancers for Disease Treatment and Diagnosis.
Shin, Ha Youn
2018-05-10
The transcriptional regulation of genes determines the fate of animal cell differentiation and subsequent organ development. With the recent progress in genome-wide technologies, the genomic landscapes of enhancers have been broadly explored in mammalian genomes, which led to the discovery of novel specific subsets of enhancers, termed superenhancers. Super-enhancers are large clusters of enhancers covering the long region of regulatory DNA and are densely occupied by transcription factors, active histone marks, and co-activators. Accumulating evidence points to the critical role that super-enhancers play in cell type-specific development and differentiation, as well as in the development of various diseases. Here, I provide a comprehensive description of the optimal approach for identifying functional units of superenhancers and their unique chromatin features in normal development and in diseases, including cancers. I also review the recent updated knowledge on novel approaches of targeting super-enhancers for the treatment of specific diseases, such as small-molecule inhibitors and potential gene therapy. This review will provide perspectives on using superenhancers as biomarkers to develop novel disease diagnostic tools and establish new directions in clinical therapeutic strategies.
A Genome Wide Survey of SNP Variation Reveals the Genetic Structure of Sheep Breeds
USDA-ARS?s Scientific Manuscript database
The genetic structure of sheep reflects their domestication and subsequent formation into discrete breeds. Understanding genetic structure is essential for achieving genetic improvement through genome-wide association studies, genomic selection and the dissection of quantitative traits. After identi...
Genome-wide association as a means to understanding the mammary gland
USDA-ARS?s Scientific Manuscript database
Next-generation sequencing and related technologies have facilitated the creation of enormous public databases that catalogue genomic variation. These databases have facilitated a variety of approaches to discover new genes that regulate normal biology as well as disease. Genome wide association (...
Significance of genome-wide association studies in molecular anthropology.
Gupta, Vipin; Khadgawat, Rajesh; Sachdeva, Mohinder Pal
2009-12-01
The successful advent of a genome-wide approach in association studies raises the hopes of human geneticists for solving a genetic maze of complex traits especially the disorders. This approach, which is replete with the application of cutting-edge technology and supported by big science projects (like Human Genome Project; and even more importantly the International HapMap Project) and various important databases (SNP database, CNV database, etc.), has had unprecedented success in rapidly uncovering many of the genetic determinants of complex disorders. The magnitude of this approach in the genetics of classical anthropological variables like height, skin color, eye color, and other genome diversity projects has certainly expanded the horizons of molecular anthropology. Therefore, in this article we have proposed a genome-wide association approach in molecular anthropological studies by providing lessons from the exemplary study of the Wellcome Trust Case Control Consortium. We have also highlighted the importance and uniqueness of Indian population groups in facilitating the design and finding optimum solutions for other genome-wide association-related challenges.
Application of genomic selection in farm animal breeding.
Tan, Cheng; Bian, Cheng; Yang, Da; Li, Ning; Wu, Zhen-Fang; Hu, Xiao-Xiang
2017-11-20
Genomic selection (GS) has become a widely accepted method in animal breeding to genetically improve economic traits. With the declining costs of high-density SNP chips and next-generation sequencing, GS has been applied in dairy cattle, swine, poultry and other animals and gained varying degrees of success. Currently, major challenges in GS studies include further reducing the cost of genome-wide SNP genotyping and improving the predictive accuracy of genomic estimated breeding value (GEBV). In this review, we summarize various methods for genome-wide SNP genotyping and GEBV prediction, and give a brief introduction of GS in livestock and poultry breeding. This review will provide a reference for further implementation of GS in farm animal breeding.
Cost-effective cloud computing: a case study using the comparative genomics tool, roundup.
Kudtarkar, Parul; Deluca, Todd F; Fusaro, Vincent A; Tonellato, Peter J; Wall, Dennis P
2010-12-22
Comparative genomics resources, such as ortholog detection tools and repositories are rapidly increasing in scale and complexity. Cloud computing is an emerging technological paradigm that enables researchers to dynamically build a dedicated virtual cluster and may represent a valuable alternative for large computational tools in bioinformatics. In the present manuscript, we optimize the computation of a large-scale comparative genomics resource-Roundup-using cloud computing, describe the proper operating principles required to achieve computational efficiency on the cloud, and detail important procedures for improving cost-effectiveness to ensure maximal computation at minimal costs. Utilizing the comparative genomics tool, Roundup, as a case study, we computed orthologs among 902 fully sequenced genomes on Amazon's Elastic Compute Cloud. For managing the ortholog processes, we designed a strategy to deploy the web service, Elastic MapReduce, and maximize the use of the cloud while simultaneously minimizing costs. Specifically, we created a model to estimate cloud runtime based on the size and complexity of the genomes being compared that determines in advance the optimal order of the jobs to be submitted. We computed orthologous relationships for 245,323 genome-to-genome comparisons on Amazon's computing cloud, a computation that required just over 200 hours and cost $8,000 USD, at least 40% less than expected under a strategy in which genome comparisons were submitted to the cloud randomly with respect to runtime. Our cost savings projections were based on a model that not only demonstrates the optimal strategy for deploying RSD to the cloud, but also finds the optimal cluster size to minimize waste and maximize usage. Our cost-reduction model is readily adaptable for other comparative genomics tools and potentially of significant benefit to labs seeking to take advantage of the cloud as an alternative to local computing infrastructure.
Gusev, A.; Shah, M. J.; Kenny, E. E.; Ramachandran, A.; Lowe, J. K.; Salit, J.; Lee, C. C.; Levandowsky, E. C.; Weaver, T. N.; Doan, Q. C.; Peckham, H. E.; McLaughlin, S. F.; Lyons, M. R.; Sheth, V. N.; Stoffel, M.; De La Vega, F. M.; Friedman, J. M.; Breslow, J. L.
2012-01-01
Whole-genome sequencing in an isolated population with few founders directly ascertains variants from the population bottleneck that may be rare elsewhere. In such populations, shared haplotypes allow imputation of variants in unsequenced samples without resorting to complex statistical methods as in studies of outbred cohorts. We focus on an isolated population cohort from the Pacific Island of Kosrae, Micronesia, where we previously collected SNP array and rich phenotype data for the majority of the population. We report identification of long regions with haplotypes co-inherited between pairs of individuals and methodology to leverage such shared genetic content for imputation. Our estimates show that sequencing as few as 40 personal genomes allows for inference in up to 60% of the 3000-person cohort at the average locus. We ascertained a pilot data set of whole-genome sequences from seven Kosraean individuals, with average 5× coverage. This assay identified 5,735,306 unique sites of which 1,212,831 were previously unknown. Additionally, these variants are unusually enriched for alleles that are rare in other populations when compared to geographic neighbors (published Korean genome SJK). We used the presence of shared haplotypes between the seven Kosraen individuals to estimate expected imputation accuracy of known and novel homozygous variants at 99.6% and 97.3%, respectively. This study presents whole-genome analysis of a homogenous isolate population with emphasis on optimal rare variant inference. PMID:22135348
Erdoğan, Onur; Aydin Son, Yeşim
2014-01-01
Single Nucleotide Polymorphisms (SNPs) are the most common genomic variations where only a single nucleotide differs between individuals. Individual SNPs and SNP profiles associated with diseases can be utilized as biological markers. But there is a need to determine the SNP subsets and patients' clinical data which is informative for the diagnosis. Data mining approaches have the highest potential for extracting the knowledge from genomic datasets and selecting the representative SNPs as well as most effective and informative clinical features for the clinical diagnosis of the diseases. In this study, we have applied one of the widely used data mining classification methodology: "decision tree" for associating the SNP biomarkers and significant clinical data with the Alzheimer's disease (AD), which is the most common form of "dementia". Different tree construction parameters have been compared for the optimization, and the most accurate tree for predicting the AD is presented.
Generation of influenza A viruses as live but replication-incompetent virus vaccines.
Si, Longlong; Xu, Huan; Zhou, Xueying; Zhang, Ziwei; Tian, Zhenyu; Wang, Yan; Wu, Yiming; Zhang, Bo; Niu, Zhenlan; Zhang, Chuanling; Fu, Ge; Xiao, Sulong; Xia, Qing; Zhang, Lihe; Zhou, Demin
2016-12-02
The conversion of life-threatening viruses into live but avirulent vaccines represents a revolution in vaccinology. In a proof-of-principle study, we expanded the genetic code of the genome of influenza A virus via a transgenic cell line containing orthogonal translation machinery. This generated premature termination codon (PTC)-harboring viruses that exerted full infectivity but were replication-incompetent in conventional cells. Genome-wide optimization of the sites for incorporation of multiple PTCs resulted in highly reproductive and genetically stable progeny viruses in transgenic cells. In mouse, ferret, and guinea pig models, vaccination with PTC viruses elicited robust humoral, mucosal, and T cell-mediated immunity against antigenically distinct influenza viruses and even neutralized existing infecting strains. The methods presented here may become a general approach for generating live virus vaccines that can be adapted to almost any virus. Copyright © 2016, American Association for the Advancement of Science.
Pathways for virus assembly around nucleic acids
Perlmutter, Jason D; Perkett, Matthew R
2014-01-01
Understanding the pathways by which viral capsid proteins assemble around their genomes could identify key intermediates as potential drug targets. In this work we use computer simulations to characterize assembly over a wide range of capsid protein-protein interaction strengths and solution ionic strengths. We find that assembly pathways can be categorized into two classes, in which intermediates are either predominantly ordered or disordered. Our results suggest that estimating the protein-protein and the protein-genome binding affinities may be sufficient to predict which pathway occurs. Furthermore, the calculated phase diagrams suggest that knowledge of the dominant assembly pathway and its relationship to control parameters could identify optimal strategies to thwart or redirect assembly to block infection. Finally, analysis of simulation trajectories suggests that the two classes of assembly pathways can be distinguished in single molecule fluorescence correlation spectroscopy or bulk time resolved small angle x-ray scattering experiments. PMID:25036288
Belaghzal, Houda; Dekker, Job; Gibcus, Johan H
2017-07-01
Chromosome conformation capture-based methods such as Hi-C have become mainstream techniques for the study of the 3D organization of genomes. These methods convert chromatin interactions reflecting topological chromatin structures into digital information (counts of pair-wise interactions). Here, we describe an updated protocol for Hi-C (Hi-C 2.0) that integrates recent improvements into a single protocol for efficient and high-resolution capture of chromatin interactions. This protocol combines chromatin digestion and frequently cutting enzymes to obtain kilobase (kb) resolution. It also includes steps to reduce random ligation and the generation of uninformative molecules, such as unligated ends, to improve the amount of valid intra-chromosomal read pairs. This protocol allows for obtaining information on conformational structures such as compartment and topologically associating domains, as well as high-resolution conformational features such as DNA loops. Copyright © 2017 Elsevier Inc. All rights reserved.
Dugué, Pierre-Antoine; Brinkman, Maree T; Milne, Roger L; Wong, Ee Ming; FitzGerald, Liesel M; Bassett, Julie K; Joo, Jihoon E; Jung, Chol-Hee; Makalic, Enes; Schmidt, Daniel F; Park, Daniel J; Chung, Jessica; Ta, Anthony D; Bolton, Damien M; Lonie, Andrew; Longano, Anthony; Hopper, John L; Severi, Gianluca; Saffery, Richard; English, Dallas R; Southey, Melissa C; Giles, Graham G
2016-01-01
Background: Global DNA methylation has been reported to be associated with urothelial cell carcinoma (UCC) by studies using blood samples collected at diagnosis. Using the Illumina HumanMethylation450 assay, we derived genome-wide measures of blood DNA methylation and assessed them for their prospective association with UCC risk. Methods: We used 439 case–control pairs from the Melbourne Collaborative Cohort Study matched on age, sex, country of birth, DNA sample type, and collection period. Conditional logistic regression was used to compute odds ratios (OR) of UCC risk per s.d. of each genome-wide measure of DNA methylation and 95% confidence intervals (CIs), adjusted for potential confounders. We also investigated associations by disease subtype, sex, smoking, and time since blood collection. Results: The risk of superficial UCC was decreased for individuals with higher levels of our genome-wide DNA methylation measure (OR=0.71, 95% CI: 0.54–0.94; P=0.02). This association was particularly strong for current smokers at sample collection (OR=0.47, 95% CI: 0.27–0.83). Intermediate levels of our genome-wide measure were associated with decreased risk of invasive UCC. Some variation was observed between UCC subtypes and the location and regulatory function of the CpGs included in the genome-wide measures of methylation. Conclusions: Higher levels of our genome-wide DNA methylation measure were associated with decreased risk of superficial UCC and intermediate levels were associated with reduced risk of invasive disease. These findings require replication by other prospective studies. PMID:27490804
Dugué, Pierre-Antoine; Brinkman, Maree T; Milne, Roger L; Wong, Ee Ming; FitzGerald, Liesel M; Bassett, Julie K; Joo, Jihoon E; Jung, Chol-Hee; Makalic, Enes; Schmidt, Daniel F; Park, Daniel J; Chung, Jessica; Ta, Anthony D; Bolton, Damien M; Lonie, Andrew; Longano, Anthony; Hopper, John L; Severi, Gianluca; Saffery, Richard; English, Dallas R; Southey, Melissa C; Giles, Graham G
2016-09-06
Global DNA methylation has been reported to be associated with urothelial cell carcinoma (UCC) by studies using blood samples collected at diagnosis. Using the Illumina HumanMethylation450 assay, we derived genome-wide measures of blood DNA methylation and assessed them for their prospective association with UCC risk. We used 439 case-control pairs from the Melbourne Collaborative Cohort Study matched on age, sex, country of birth, DNA sample type, and collection period. Conditional logistic regression was used to compute odds ratios (OR) of UCC risk per s.d. of each genome-wide measure of DNA methylation and 95% confidence intervals (CIs), adjusted for potential confounders. We also investigated associations by disease subtype, sex, smoking, and time since blood collection. The risk of superficial UCC was decreased for individuals with higher levels of our genome-wide DNA methylation measure (OR=0.71, 95% CI: 0.54-0.94; P=0.02). This association was particularly strong for current smokers at sample collection (OR=0.47, 95% CI: 0.27-0.83). Intermediate levels of our genome-wide measure were associated with decreased risk of invasive UCC. Some variation was observed between UCC subtypes and the location and regulatory function of the CpGs included in the genome-wide measures of methylation. Higher levels of our genome-wide DNA methylation measure were associated with decreased risk of superficial UCC and intermediate levels were associated with reduced risk of invasive disease. These findings require replication by other prospective studies.
Wang, Yi-Ting; Sung, Pei-Yuan; Lin, Peng-Lin; Yu, Ya-Wen; Chung, Ren-Hua
2015-05-15
Genome-wide association studies (GWAS) have become a common approach to identifying single nucleotide polymorphisms (SNPs) associated with complex diseases. As complex diseases are caused by the joint effects of multiple genes, while the effect of individual gene or SNP is modest, a method considering the joint effects of multiple SNPs can be more powerful than testing individual SNPs. The multi-SNP analysis aims to test association based on a SNP set, usually defined based on biological knowledge such as gene or pathway, which may contain only a portion of SNPs with effects on the disease. Therefore, a challenge for the multi-SNP analysis is how to effectively select a subset of SNPs with promising association signals from the SNP set. We developed the Optimal P-value Threshold Pedigree Disequilibrium Test (OPTPDT). The OPTPDT uses general nuclear families. A variable p-value threshold algorithm is used to determine an optimal p-value threshold for selecting a subset of SNPs. A permutation procedure is used to assess the significance of the test. We used simulations to verify that the OPTPDT has correct type I error rates. Our power studies showed that the OPTPDT can be more powerful than the set-based test in PLINK, the multi-SNP FBAT test, and the p-value based test GATES. We applied the OPTPDT to a family-based autism GWAS dataset for gene-based association analysis and identified MACROD2-AS1 with genome-wide significance (p-value=2.5×10(-6)). Our simulation results suggested that the OPTPDT is a valid and powerful test. The OPTPDT will be helpful for gene-based or pathway association analysis. The method is ideal for the secondary analysis of existing GWAS datasets, which may identify a set of SNPs with joint effects on the disease.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zuniga, Cristal; Li, Chien -Ting; Huelsman, Tyler
The green microalgae Chlorella vulgaris has been widely recognized as a promising candidate for biofuel production due to its ability to store high lipid content and its natural metabolic versatility. Compartmentalized genome-scale metabolic models constructed from genome sequences enable quantitative insight into the transport and metabolism of compounds within a target organism. These metabolic models have long been utilized to generate optimized design strategies for an improved production process. Here, we describe the reconstruction, validation, and application of a genome-scale metabolic model for C. vulgaris UTEX 395, iCZ843. The reconstruction represents the most comprehensive model for any eukaryotic photosynthetic organismmore » to date, based on the genome size and number of genes in the reconstruction. The highly curated model accurately predicts phenotypes under photoautotrophic, heterotrophic, and mixotrophic conditions. The model was validated against experimental data and lays the foundation for model-driven strain design and medium alteration to improve yield. Calculated flux distributions under different trophic conditions show that a number of key pathways are affected by nitrogen starvation conditions, including central carbon metabolism and amino acid, nucleotide, and pigment biosynthetic pathways. Moreover, model prediction of growth rates under various medium compositions and subsequent experimental validation showed an increased growth rate with the addition of tryptophan and methionine.« less
Teleosts Genomics: Progress and Prospects in Disease Prevention and Control.
Munang'andu, Hetron Mweemba; Galindo-Villegas, Jorge; David, Lior
2018-04-04
Genome wide studies based on conventional molecular tools and upcoming omics technologies are beginning to gain functional applications in the control and prevention of diseases in teleosts fish. Herein, we provide insights into current progress and prospects in the use genomics studies for the control and prevention of fish diseases. Metagenomics has emerged to be an important tool used to identify emerging infectious diseases for the timely design of rational disease control strategies, determining microbial compositions in different aquatic environments used for fish farming and the use of host microbiota to monitor the health status of fish. Expounding the use of antimicrobial peptides (AMPs) as therapeutic agents against different pathogens as well as elucidating their role in tissue regeneration is another vital aspect of genomics studies that had taken precedent in recent years. In vaccine development, prospects made include the identification of highly immunogenic proteins for use in recombinant vaccine designs as well as identifying gene signatures that correlate with protective immunity for use as benchmarks in optimizing vaccine efficacy. Progress in quantitative trait loci (QTL) mapping is beginning to yield considerable success in identifying resistant traits against some of the highly infectious diseases that have previously ravaged the aquaculture industry. Altogether, the synopsis put forth shows that genomics studies are beginning to yield positive contribution in the prevention and control of fish diseases in aquaculture.
Hayeems, R Z; Babul-Hirji, R; Hoang, N; Weksberg, R; Shuman, C
2016-04-01
Advances in genome-based microarray and sequencing technologies hold tremendous promise for understanding, better-managing and/or preventing disease and disease-related risk. Chromosome microarray technology (array based comparative genomic hybridization [aCGH]) is widely utilized in pediatric care to inform diagnostic etiology and medical management. Less clear is how parents experience and perceive the value of this technology. This study explored parents' experiences with aCGH in the pediatric setting, focusing on how they make meaning of various types of test results. We conducted in-person or telephone-based semi-structured interviews with parents of 21 children who underwent aCGH testing in 2010. Transcripts were coded and analyzed thematically according to the principles of interpretive description. We learned that parents expect genomic tests to be of personal use; their experiences with aCGH results characterize this use as intrinsic in the test's ability to provide a much sought-after answer for their child's condition, and instrumental in its ability to guide care, access to services, and family planning. In addition, parents experience uncertainty regardless of whether aCGH results are of pathogenic, uncertain, or benign significance; this triggers frustration, fear, and hope. Findings reported herein better characterize the notion of personal utility and highlight the pervasive nature of uncertainty in the context of genomic testing. Empiric research that links pre-test counseling content and psychosocial outcomes is warranted to optimize patient care.
Phylogenomics of plant genomes: a methodology for genome-wide searches for orthologs in plants
Conte, Matthieu G; Gaillard, Sylvain; Droc, Gaetan; Perin, Christophe
2008-01-01
Background Gene ortholog identification is now a major objective for mining the increasing amount of sequence data generated by complete or partial genome sequencing projects. Comparative and functional genomics urgently need a method for ortholog detection to reduce gene function inference and to aid in the identification of conserved or divergent genetic pathways between several species. As gene functions change during evolution, reconstructing the evolutionary history of genes should be a more accurate way to differentiate orthologs from paralogs. Phylogenomics takes into account phylogenetic information from high-throughput genome annotation and is the most straightforward way to infer orthologs. However, procedures for automatic detection of orthologs are still scarce and suffer from several limitations. Results We developed a procedure for ortholog prediction between Oryza sativa and Arabidopsis thaliana. Firstly, we established an efficient method to cluster A. thaliana and O. sativa full proteomes into gene families. Then, we developed an optimized phylogenomics pipeline for ortholog inference. We validated the full procedure using test sets of orthologs and paralogs to demonstrate that our method outperforms pairwise methods for ortholog predictions. Conclusion Our procedure achieved a high level of accuracy in predicting ortholog and paralog relationships. Phylogenomic predictions for all validated gene families in both species were easily achieved and we can conclude that our methodology outperforms similarly based methods. PMID:18426584
Zuniga, Cristal; Li, Chien -Ting; Huelsman, Tyler; ...
2016-07-02
The green microalgae Chlorella vulgaris has been widely recognized as a promising candidate for biofuel production due to its ability to store high lipid content and its natural metabolic versatility. Compartmentalized genome-scale metabolic models constructed from genome sequences enable quantitative insight into the transport and metabolism of compounds within a target organism. These metabolic models have long been utilized to generate optimized design strategies for an improved production process. Here, we describe the reconstruction, validation, and application of a genome-scale metabolic model for C. vulgaris UTEX 395, iCZ843. The reconstruction represents the most comprehensive model for any eukaryotic photosynthetic organismmore » to date, based on the genome size and number of genes in the reconstruction. The highly curated model accurately predicts phenotypes under photoautotrophic, heterotrophic, and mixotrophic conditions. The model was validated against experimental data and lays the foundation for model-driven strain design and medium alteration to improve yield. Calculated flux distributions under different trophic conditions show that a number of key pathways are affected by nitrogen starvation conditions, including central carbon metabolism and amino acid, nucleotide, and pigment biosynthetic pathways. Moreover, model prediction of growth rates under various medium compositions and subsequent experimental validation showed an increased growth rate with the addition of tryptophan and methionine.« less
Zuñiga, Cristal; Li, Chien-Ting; Huelsman, Tyler; Levering, Jennifer; Zielinski, Daniel C; McConnell, Brian O; Long, Christopher P; Knoshaug, Eric P; Guarnieri, Michael T; Antoniewicz, Maciek R; Betenbaugh, Michael J; Zengler, Karsten
2016-09-01
The green microalga Chlorella vulgaris has been widely recognized as a promising candidate for biofuel production due to its ability to store high lipid content and its natural metabolic versatility. Compartmentalized genome-scale metabolic models constructed from genome sequences enable quantitative insight into the transport and metabolism of compounds within a target organism. These metabolic models have long been utilized to generate optimized design strategies for an improved production process. Here, we describe the reconstruction, validation, and application of a genome-scale metabolic model for C. vulgaris UTEX 395, iCZ843. The reconstruction represents the most comprehensive model for any eukaryotic photosynthetic organism to date, based on the genome size and number of genes in the reconstruction. The highly curated model accurately predicts phenotypes under photoautotrophic, heterotrophic, and mixotrophic conditions. The model was validated against experimental data and lays the foundation for model-driven strain design and medium alteration to improve yield. Calculated flux distributions under different trophic conditions show that a number of key pathways are affected by nitrogen starvation conditions, including central carbon metabolism and amino acid, nucleotide, and pigment biosynthetic pathways. Furthermore, model prediction of growth rates under various medium compositions and subsequent experimental validation showed an increased growth rate with the addition of tryptophan and methionine. © 2016 American Society of Plant Biologists. All rights reserved.
Zuñiga, Cristal; Li, Chien-Ting; Zielinski, Daniel C.; Guarnieri, Michael T.; Antoniewicz, Maciek R.; Zengler, Karsten
2016-01-01
The green microalga Chlorella vulgaris has been widely recognized as a promising candidate for biofuel production due to its ability to store high lipid content and its natural metabolic versatility. Compartmentalized genome-scale metabolic models constructed from genome sequences enable quantitative insight into the transport and metabolism of compounds within a target organism. These metabolic models have long been utilized to generate optimized design strategies for an improved production process. Here, we describe the reconstruction, validation, and application of a genome-scale metabolic model for C. vulgaris UTEX 395, iCZ843. The reconstruction represents the most comprehensive model for any eukaryotic photosynthetic organism to date, based on the genome size and number of genes in the reconstruction. The highly curated model accurately predicts phenotypes under photoautotrophic, heterotrophic, and mixotrophic conditions. The model was validated against experimental data and lays the foundation for model-driven strain design and medium alteration to improve yield. Calculated flux distributions under different trophic conditions show that a number of key pathways are affected by nitrogen starvation conditions, including central carbon metabolism and amino acid, nucleotide, and pigment biosynthetic pathways. Furthermore, model prediction of growth rates under various medium compositions and subsequent experimental validation showed an increased growth rate with the addition of tryptophan and methionine. PMID:27372244
Genomics of adaptation to host-plants in herbivorous insects.
Simon, Jean-Christophe; d'Alençon, Emmanuelle; Guy, Endrick; Jacquin-Joly, Emmanuelle; Jaquiéry, Julie; Nouhaud, Pierre; Peccoud, Jean; Sugio, Akiko; Streiff, Réjane
2015-11-01
Herbivorous insects represent the most species-rich lineages of metazoans. The high rate of diversification in herbivorous insects is thought to result from their specialization to distinct host-plants, which creates conditions favorable for the build-up of reproductive isolation and speciation. These conditions rely on constraints against the optimal use of a wide range of plant species, as each must constitute a viable food resource, oviposition site and mating site for an insect. Utilization of plants involves many essential traits of herbivorous insects, as they locate and select their hosts, overcome their defenses and acquire nutrients while avoiding intoxication. Although advances in understanding insect-plant molecular interactions have been limited by the complexity of insect traits involved in host use and the lack of genomic resources and functional tools, recent studies at the molecular level, combined with large-scale genomics studies at population and species levels, are revealing the genetic underpinning of plant specialization and adaptive divergence in non-model insect herbivores. Here, we review the recent advances in the genomics of plant adaptation in hemipterans and lepidopterans, two major insect orders, each of which includes a large number of crop pests. We focus on how genomics and post-genomics have improved our understanding of the mechanisms involved in insect-plant interactions by reviewing recent molecular discoveries in sensing, feeding, digesting and detoxifying strategies. We also present the outcomes of large-scale genomics approaches aimed at identifying loci potentially involved in plant adaptation in these insects. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.
A community effort to protect genomic data sharing, collaboration and outsourcing.
Wang, Shuang; Jiang, Xiaoqian; Tang, Haixu; Wang, Xiaofeng; Bu, Diyue; Carey, Knox; Dyke, Stephanie Om; Fox, Dov; Jiang, Chao; Lauter, Kristin; Malin, Bradley; Sofia, Heidi; Telenti, Amalio; Wang, Lei; Wang, Wenhao; Ohno-Machado, Lucila
2017-01-01
The human genome can reveal sensitive information and is potentially re-identifiable, which raises privacy and security concerns about sharing such data on wide scales. In 2016, we organized the third Critical Assessment of Data Privacy and Protection competition as a community effort to bring together biomedical informaticists, computer privacy and security researchers, and scholars in ethical, legal, and social implications (ELSI) to assess the latest advances on privacy-preserving techniques for protecting human genomic data. Teams were asked to develop novel protection methods for emerging genome privacy challenges in three scenarios: Track (1) data sharing through the Beacon service of the Global Alliance for Genomics and Health. Track (2) collaborative discovery of similar genomes between two institutions; and Track (3) data outsourcing to public cloud services. The latter two tracks represent continuing themes from our 2015 competition, while the former was new and a response to a recently established vulnerability. The winning strategy for Track 1 mitigated the privacy risk by hiding approximately 11% of the variation in the database while permitting around 160,000 queries, a significant improvement over the baseline. The winning strategies in Tracks 2 and 3 showed significant progress over the previous competition by achieving multiple orders of magnitude performance improvement in terms of computational runtime and memory requirements. The outcomes suggest that applying highly optimized privacy-preserving and secure computation techniques to safeguard genomic data sharing and analysis is useful. However, the results also indicate that further efforts are needed to refine these techniques into practical solutions.
A Genomic Resource for the Development, Improvement, and Exploitation of Sorghum for Bioenergy
Brenton, Zachary W.; Cooper, Elizabeth A.; Myers, Mathew T.; Boyles, Richard E.; Shakoor, Nadia; Zielinski, Kelsey J.; Rauh, Bradley L.; Bridges, William C.; Morris, Geoffrey P.; Kresovich, Stephen
2016-01-01
With high productivity and stress tolerance, numerous grass genera of the Andropogoneae have emerged as candidates for bioenergy production. To optimize these candidates, research examining the genetic architecture of yield, carbon partitioning, and composition is required to advance breeding objectives. Significant progress has been made developing genetic and genomic resources for Andropogoneae, and advances in comparative and computational genomics have enabled research examining the genetic basis of photosynthesis, carbon partitioning, composition, and sink strength. To provide a pivotal resource aimed at developing a comparative understanding of key bioenergy traits in the Andropogoneae, we have established and characterized an association panel of 390 racially, geographically, and phenotypically diverse Sorghum bicolor accessions with 232,303 genetic markers. Sorghum bicolor was selected because of its genomic simplicity, phenotypic diversity, significant genomic tools, and its agricultural productivity and resilience. We have demonstrated the value of sorghum as a functional model for candidate gene discovery for bioenergy Andropogoneae by performing genome-wide association analysis for two contrasting phenotypes representing key components of structural and non-structural carbohydrates. We identified potential genes, including a cellulase enzyme and a vacuolar transporter, associated with increased non-structural carbohydrates that could lead to bioenergy sorghum improvement. Although our analysis identified genes with potentially clear functions, other candidates did not have assigned functions, suggesting novel molecular mechanisms for carbon partitioning traits. These results, combined with our characterization of phenotypic and genetic diversity and the public accessibility of each accession and genomic data, demonstrate the value of this resource and provide a foundation for future improvement of sorghum and related grasses for bioenergy production. PMID:27356613
Gürsoy, Gamze; Xu, Yun; Liang, Jie
2017-07-01
Nuclear landmarks and biochemical factors play important roles in the organization of the yeast genome. The interaction pattern of budding yeast as measured from genome-wide 3C studies are largely recapitulated by model polymer genomes subject to landmark constraints. However, the origin of inter-chromosomal interactions, specific roles of individual landmarks, and the roles of biochemical factors in yeast genome organization remain unclear. Here we describe a multi-chromosome constrained self-avoiding chromatin model (mC-SAC) to gain understanding of the budding yeast genome organization. With significantly improved sampling of genome structures, both intra- and inter-chromosomal interaction patterns from genome-wide 3C studies are accurately captured in our model at higher resolution than previous studies. We show that nuclear confinement is a key determinant of the intra-chromosomal interactions, and centromere tethering is responsible for the inter-chromosomal interactions. In addition, important genomic elements such as fragile sites and tRNA genes are found to be clustered spatially, largely due to centromere tethering. We uncovered previously unknown interactions that were not captured by genome-wide 3C studies, which are found to be enriched with tRNA genes, RNAPIII and TFIIS binding. Moreover, we identified specific high-frequency genome-wide 3C interactions that are unaccounted for by polymer effects under landmark constraints. These interactions are enriched with important genes and likely play biological roles.
Enhancing genomic prediction with genome-wide association studies in multiparental maize populations
USDA-ARS?s Scientific Manuscript database
Genome-wide association mapping using dense marker sets has identified some nucleotide variants affecting complex traits which have been validated with fine-mapping and functional analysis. Many sequence variants associated with complex traits in maize have small effects and low repeatability, howev...
Codon optimization underpins generalist parasitism in fungi
Badet, Thomas; Peyraud, Remi; Mbengue, Malick; Navaud, Olivier; Derbyshire, Mark; Oliver, Richard P; Barbacci, Adelin; Raffaele, Sylvain
2017-01-01
The range of hosts that parasites can infect is a key determinant of the emergence and spread of disease. Yet, the impact of host range variation on the evolution of parasite genomes remains unknown. Here, we show that codon optimization underlies genome adaptation in broad host range parasites. We found that the longer proteins encoded by broad host range fungi likely increase natural selection on codon optimization in these species. Accordingly, codon optimization correlates with host range across the fungal kingdom. At the species level, biased patterns of synonymous substitutions underpin increased codon optimization in a generalist but not a specialist fungal pathogen. Virulence genes were consistently enriched in highly codon-optimized genes of generalist but not specialist species. We conclude that codon optimization is related to the capacity of parasites to colonize multiple hosts. Our results link genome evolution and translational regulation to the long-term persistence of generalist parasitism. DOI: http://dx.doi.org/10.7554/eLife.22472.001 PMID:28157073
Leulliot, Nicolas; Trésaugues, Lionel; Bremang, Michael; Sorel, Isabelle; Ulryck, Nathalie; Graille, Marc; Aboulfath, Ilham; Poupon, Anne; Liger, Dominique; Quevillon-Cheruel, Sophie; Janin, Joël; van Tilbeurgh, Herman
2005-06-01
Crystallization has long been regarded as one of the major bottlenecks in high-throughput structural determination by X-ray crystallography. Structural genomics projects have addressed this issue by using robots to set up automated crystal screens using nanodrop technology. This has moved the bottleneck from obtaining the first crystal hit to obtaining diffraction-quality crystals, as crystal optimization is a notoriously slow process that is difficult to automatize. This article describes the high-throughput optimization strategies used in the Yeast Structural Genomics project, with selected successful examples.
A review of genome-wide approaches to study the genetic basis for spermatogenic defects.
Aston, Kenneth I; Conrad, Donald F
2013-01-01
Rapidly advancing tools for genetic analysis on a genome-wide scale have been instrumental in identifying the genetic bases for many complex diseases. About half of male infertility cases are of unknown etiology in spite of tremendous efforts to characterize the genetic basis for the disorder. Advancing our understanding of the genetic basis for male infertility will require the application of established and emerging genomic tools. This chapter introduces many of the tools available for genetic studies on a genome-wide scale along with principles of study design and data analysis.
Optimizing complex phenotypes through model-guided multiplex genome engineering
Kuznetsov, Gleb; Goodman, Daniel B.; Filsinger, Gabriel T.; ...
2017-05-25
Here, we present a method for identifying genomic modifications that optimize a complex phenotype through multiplex genome engineering and predictive modeling. We apply our method to identify six single nucleotide mutations that recover 59% of the fitness defect exhibited by the 63-codon E. coli strain C321.ΔA. By introducing targeted combinations of changes in multiplex we generate rich genotypic and phenotypic diversity and characterize clones using whole-genome sequencing and doubling time measurements. Regularized multivariate linear regression accurately quantifies individual allelic effects and overcomes bias from hitchhiking mutations and context-dependence of genome editing efficiency that would confound other strategies.
Optimizing complex phenotypes through model-guided multiplex genome engineering
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kuznetsov, Gleb; Goodman, Daniel B.; Filsinger, Gabriel T.
Here, we present a method for identifying genomic modifications that optimize a complex phenotype through multiplex genome engineering and predictive modeling. We apply our method to identify six single nucleotide mutations that recover 59% of the fitness defect exhibited by the 63-codon E. coli strain C321.ΔA. By introducing targeted combinations of changes in multiplex we generate rich genotypic and phenotypic diversity and characterize clones using whole-genome sequencing and doubling time measurements. Regularized multivariate linear regression accurately quantifies individual allelic effects and overcomes bias from hitchhiking mutations and context-dependence of genome editing efficiency that would confound other strategies.
USDA-ARS?s Scientific Manuscript database
Transcription initiation, essential to gene expression regulation, involves recruitment of basal transcription factors to the core promoter elements (CPEs). The distribution of currently known CPEs across plant genomes is largely unknown. This is the first large scale genome-wide report on the compu...
USDA-ARS?s Scientific Manuscript database
Fast neutron radiation has been used as a mutagen to develop extensive mutant collections. However, the genome-wide structural consequences of fast neutron radiation are not well understood. Here, we examine the genome-wide structural variants observed among 264 soybean (Glycine max (L.) Merrill) pl...
Genome-wide association mapping of qualitatively inherited traits in a germplasm collection
USDA-ARS?s Scientific Manuscript database
Genome-wide association (GWA) has been used as a tool for dissecting the genetic architecture of quantitatively inherited traits. We demonstrate here that GWA can also be highly useful for detecting the genomic locations of major genes governing categorically defined phenotype variants that exist fo...
Genome wide association analyses based on a multiple trait approach for modeling feed efficiency
USDA-ARS?s Scientific Manuscript database
Genome wide association (GWA) of feed efficiency (FE) could help target important genomic regions influencing FE. Data provided by an international dairy FE research consortium consisted of phenotypic records on dry matter intakes (DMI), milk energy (MILKE), and metabolic body weight (MBW) on 6,937 ...
Genome-wide Association Analysis of Kernel Weight in Hard Winter Wheat
USDA-ARS?s Scientific Manuscript database
Wheat kernel weight is an important and heritable component of wheat grain yield and a key predictor of flour extraction. Genome-wide association analysis was conducted to identify genomic regions associated with kernel weight and kernel weight environmental response in 8 trials of 299 hard winter ...
USDA-ARS?s Scientific Manuscript database
A recent genome-wide association study associated 62 single nucleotide polymorphisms (SNPs) from 43 genomic loci, with fasting lipoprotein subfractions in European–Americans (EAs) at genome-wide levels of significance across three independent samples. Whether these associations are consistent across...
Kelleher, Erin S; Barbash, Daniel A
2013-08-01
The Piwi-interacting RNA (piRNA) pathway defends animal genomes against the harmful consequences of transposable element (TE) infection by imposing small-RNA-mediated silencing. Because silencing is targeted by TE-derived piRNAs, piRNA production is posited to be central to the evolution of genome defense. We harnessed genomic data sets from Drosophila melanogaster, including genome-wide measures of piRNA, mRNA, and genomic abundance, along with estimates of age structure and risk of ectopic recombination, to address fundamental questions about the functional and evolutionary relationships between TE families and their regulatory piRNAs. We demonstrate that mRNA transcript abundance, robustness of "ping-pong" amplification, and representation in piRNA clusters together explain the majority of variation in piRNA abundance between TE families, providing the first robust statistical support for the prevailing model of piRNA biogenesis. Intriguingly, we also discover that the most transpositionally active TE families, with the greatest capacity to induce harmful mutations or disrupt gametogenesis, are not necessarily the most abundant among piRNAs. Rather, the level of piRNA targeting is largely independent of recent transposition rate for active TE families, but is rapidly lost for inactive TEs. These observations are consistent with population genetic theory that suggests a limited selective advantage for host repression of transposition. Additionally, we find no evidence that piRNA targeting responds to selection against a second major cost of TE infection: ectopic recombination between TE insertions. Our observations confirm the pivotal role of piRNA-mediated silencing in defending the genome against selfish transposition, yet also suggest limits to the optimization of host genome defense.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gallaher, Sean D.; Fitz-Gibbon, Sorel T.; Strenkert, Daniela
Chlamydomonas reinhardtii is a unicellular chlorophyte alga that is widely studied as a reference organism for understanding photosynthesis, sensory and motile cilia, and for development of an algal-based platform for producing biofuels and bio-products. Its highly repetitive, ~205-kbp circular chloroplast genome and ~15.8-kbp linear mitochondrial genome were sequenced prior to the advent of high-throughput sequencing technologies. Here, high coverage shotgun sequencing was used to assemble both organellar genomes de novo. These new genomes correct dozens of errors in the prior genome sequences and annotations. Gen-ome sequencing coverage indicates that each cell contains on average 83 copies of the chloroplast genomemore » and 130 copies of the mitochondrial genome. Using protocols and analyses optimized for organellar tran-scripts, RNA-Seq was used to quantify their relative abundances across 12 different growth conditions. Forty-six percent of total cellular mRNA is attributable to high expression from a few dozen chloroplast genes. RNA-Seq data were used to guide gene annotation, to demonstrate polycistronic gene expression, and to quantify splicing of psaA and psbA introns. In contrast to a conclusion from a recent study, we found that chloroplast transcripts are not edited. Unexpectedly, cytosine-rich polynucleotide tails were observed at the 3’-end of all mitochondrial transcripts. A comparative genomics analysis of eight laboratory strains and 11 wild isolates of C. reinhardtii identified 2658 variants in the organellargenomes, which is 1/10th as much genetic diversity as is found in the nucleus.« less
Stomatal vs. genome size in angiosperms: the somatic tail wagging the genomic dog?
Hodgson, J. G.; Sharafi, M.; Jalili, A.; Díaz, S.; Montserrat-Martí, G.; Palmer, C.; Cerabolini, B.; Pierce, S.; Hamzehee, B.; Asri, Y.; Jamzad, Z.; Wilson, P.; Raven, J. A.; Band, S. R.; Basconcelo, S.; Bogard, A.; Carter, G.; Charles, M.; Castro-Díez, P.; Cornelissen, J. H. C.; Funes, G.; Jones, G.; Khoshnevis, M.; Pérez-Harguindeguy, N.; Pérez-Rontomé, M. C.; Shirvany, F. A.; Vendramini, F.; Yazdani, S.; Abbas-Azimi, R.; Boustani, S.; Dehghan, M.; Guerrero-Campo, J.; Hynd, A.; Kowsary, E.; Kazemi-Saeed, F.; Siavash, B.; Villar-Salvador, P.; Craigie, R.; Naqinezhad, A.; Romo-Díez, A.; de Torres Espuny, L.; Simmons, E.
2010-01-01
Background and Aims Genome size is a function, and the product, of cell volume. As such it is contingent on ecological circumstance. The nature of ‘this ecological circumstance’ is, however, hotly debated. Here, we investigate for angiosperms whether stomatal size may be this ‘missing link’: the primary determinant of genome size. Stomata are crucial for photosynthesis and their size affects functional efficiency. Methods Stomatal and leaf characteristics were measured for 1442 species from Argentina, Iran, Spain and the UK and, using PCA, some emergent ecological and taxonomic patterns identified. Subsequently, an assessment of the relationship between genome-size values obtained from the Plant DNA C-values database and measurements of stomatal size was carried out. Key Results Stomatal size is an ecologically important attribute. It varies with life-history (woody species < herbaceous species < vernal geophytes) and contributes to ecologically and physiologically important axes of leaf specialization. Moreover, it is positively correlated with genome size across a wide range of major taxa. Conclusions Stomatal size predicts genome size within angiosperms. Correlation is not, however, proof of causality and here our interpretation is hampered by unexpected deficiencies in the scientific literature. Firstly, there are discrepancies between our own observations and established ideas about the ecological significance of stomatal size; very large stomata, theoretically facilitating photosynthesis in deep shade, were, in this study (and in other studies), primarily associated with vernal geophytes of unshaded habitats. Secondly, the lower size limit at which stomata can function efficiently, and the ecological circumstances under which these minute stomata might occur, have not been satisfactorally resolved. Thus, our hypothesis, that the optimization of stomatal size for functional efficiency is a major ecological determinant of genome size, remains unproven. PMID:20375204
Wóycicki, Rafał; Witkowicz, Justyna; Gawroński, Piotr; Dąbrowska, Joanna; Lomsadze, Alexandre; Pawełkowicz, Magdalena; Siedlecka, Ewa; Yagi, Kohei; Pląder, Wojciech; Seroczyńska, Anna; Śmiech, Mieczysław; Gutman, Wojciech; Niemirowicz-Szczytt, Katarzyna; Bartoszewski, Grzegorz; Tagashira, Norikazu; Hoshi, Yoshikazu; Borodovsky, Mark; Karpiński, Stanisław; Malepszy, Stefan; Przybecki, Zbigniew
2011-01-01
Cucumber (Cucumis sativus L.), a widely cultivated crop, has originated from Eastern Himalayas and secondary domestication regions includes highly divergent climate conditions e.g. temperate and subtropical. We wanted to uncover adaptive genome differences between the cucumber cultivars and what sort of evolutionary molecular mechanisms regulate genetic adaptation of plants to different ecosystems and organism biodiversity. Here we present the draft genome sequence of the Cucumis sativus genome of the North-European Borszczagowski cultivar (line B10) and comparative genomics studies with the known genomes of: C. sativus (Chinese cultivar – Chinese Long (line 9930)), Arabidopsis thaliana, Populus trichocarpa and Oryza sativa. Cucumber genomes show extensive chromosomal rearrangements, distinct differences in quantity of the particular genes (e.g. involved in photosynthesis, respiration, sugar metabolism, chlorophyll degradation, regulation of gene expression, photooxidative stress tolerance, higher non-optimal temperatures tolerance and ammonium ion assimilation) as well as in distributions of abscisic acid-, dehydration- and ethylene-responsive cis-regulatory elements (CREs) in promoters of orthologous group of genes, which lead to the specific adaptation features. Abscisic acid treatment of non-acclimated Arabidopsis and C. sativus seedlings induced moderate freezing tolerance in Arabidopsis but not in C. sativus. This experiment together with analysis of abscisic acid-specific CRE distributions give a clue why C. sativus is much more susceptible to moderate freezing stresses than A. thaliana. Comparative analysis of all the five genomes showed that, each species and/or cultivars has a specific profile of CRE content in promoters of orthologous genes. Our results constitute the substantial and original resource for the basic and applied research on environmental adaptations of plants, which could facilitate creation of new crops with improved growth and yield in divergent conditions. PMID:21829493
Wóycicki, Rafał; Witkowicz, Justyna; Gawroński, Piotr; Dąbrowska, Joanna; Lomsadze, Alexandre; Pawełkowicz, Magdalena; Siedlecka, Ewa; Yagi, Kohei; Pląder, Wojciech; Seroczyńska, Anna; Śmiech, Mieczysław; Gutman, Wojciech; Niemirowicz-Szczytt, Katarzyna; Bartoszewski, Grzegorz; Tagashira, Norikazu; Hoshi, Yoshikazu; Borodovsky, Mark; Karpiński, Stanisław; Malepszy, Stefan; Przybecki, Zbigniew
2011-01-01
Cucumber (Cucumis sativus L.), a widely cultivated crop, has originated from Eastern Himalayas and secondary domestication regions includes highly divergent climate conditions e.g. temperate and subtropical. We wanted to uncover adaptive genome differences between the cucumber cultivars and what sort of evolutionary molecular mechanisms regulate genetic adaptation of plants to different ecosystems and organism biodiversity. Here we present the draft genome sequence of the Cucumis sativus genome of the North-European Borszczagowski cultivar (line B10) and comparative genomics studies with the known genomes of: C. sativus (Chinese cultivar--Chinese Long (line 9930)), Arabidopsis thaliana, Populus trichocarpa and Oryza sativa. Cucumber genomes show extensive chromosomal rearrangements, distinct differences in quantity of the particular genes (e.g. involved in photosynthesis, respiration, sugar metabolism, chlorophyll degradation, regulation of gene expression, photooxidative stress tolerance, higher non-optimal temperatures tolerance and ammonium ion assimilation) as well as in distributions of abscisic acid-, dehydration- and ethylene-responsive cis-regulatory elements (CREs) in promoters of orthologous group of genes, which lead to the specific adaptation features. Abscisic acid treatment of non-acclimated Arabidopsis and C. sativus seedlings induced moderate freezing tolerance in Arabidopsis but not in C. sativus. This experiment together with analysis of abscisic acid-specific CRE distributions give a clue why C. sativus is much more susceptible to moderate freezing stresses than A. thaliana. Comparative analysis of all the five genomes showed that, each species and/or cultivars has a specific profile of CRE content in promoters of orthologous genes. Our results constitute the substantial and original resource for the basic and applied research on environmental adaptations of plants, which could facilitate creation of new crops with improved growth and yield in divergent conditions.
A newly isolated and identified vitamin B12 producing strain: Sinorhizobium meliloti 320.
Dong, Huina; Li, Sha; Fang, Huan; Xia, Miaomiao; Zheng, Ping; Zhang, Dawei; Sun, Jibin
2016-10-01
Vitamin B12 (Cobalamin, VB12) has several physiological functions and is widely used in pharmaceutical and food industries. A new unicellular species was extracted from China farmland, and the strain could produce VB12 which was identified by HPLC and HPLC-MS/MS. 16S rDNA analysis reveals this strain belongs to the species Sinorhizobium meliloti and we named it S. meliloti 320. Its whole genome information indicates that this strain has a complete VB12 synthetic pathway, which paves the way for further metabolic engineering studies. The optimal carbon and nitrogen sources are sucrose and corn steep liquor (CSL) plus peptone. The optimal combination of sucrose and CSL was obtained by response surface methodology as they are the most suitable carbon and nitrogen sources, respectively. This strain could produce 140 ± 4.2 mg L(-1) vitamin B12 after incubating for 7 days in the optimal medium.
Learning about human population history from ancient and modern genomes.
Stoneking, Mark; Krause, Johannes
2011-08-18
Genome-wide data, both from SNP arrays and from complete genome sequencing, are becoming increasingly abundant and are now even available from extinct hominins. These data are providing new insights into population history; in particular, when combined with model-based analytical approaches, genome-wide data allow direct testing of hypotheses about population history. For example, genome-wide data from both contemporary populations and extinct hominins strongly support a single dispersal of modern humans from Africa, followed by two archaic admixture events: one with Neanderthals somewhere outside Africa and a second with Denisovans that (so far) has only been detected in New Guinea. These new developments promise to reveal new stories about human population history, without having to resort to storytelling.
The Cancer Genomics Hub (CGHub): overcoming cancer through the power of torrential data.
Wilks, Christopher; Cline, Melissa S; Weiler, Erich; Diehkans, Mark; Craft, Brian; Martin, Christy; Murphy, Daniel; Pierce, Howdy; Black, John; Nelson, Donavan; Litzinger, Brian; Hatton, Thomas; Maltbie, Lori; Ainsworth, Michael; Allen, Patrick; Rosewood, Linda; Mitchell, Elizabeth; Smith, Bradley; Warner, Jim; Groboske, John; Telc, Haifang; Wilson, Daniel; Sanford, Brian; Schmidt, Hannes; Haussler, David; Maltbie, Daniel
2014-01-01
The Cancer Genomics Hub (CGHub) is the online repository of the sequencing programs of the National Cancer Institute (NCI), including The Cancer Genomics Atlas (TCGA), the Cancer Cell Line Encyclopedia (CCLE) and the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) projects, with data from 25 different types of cancer. The CGHub currently contains >1.4 PB of data, has grown at an average rate of 50 TB a month and serves >100 TB per week. The architecture of CGHub is designed to support bulk searching and downloading through a Web-accessible application programming interface, enforce patient genome confidentiality in data storage and transmission and optimize for efficiency in access and transfer. In this article, we describe the design of these three components, present performance results for our transfer protocol, GeneTorrent, and finally report on the growth of the system in terms of data stored and transferred, including estimated limits on the current architecture. Our experienced-based estimates suggest that centralizing storage and computational resources is more efficient than wide distribution across many satellite labs. Database URL: https://cghub.ucsc.edu. Published by Oxford University Press 2014. This work is written by US Government employees and is in the public domain in the US.
2011-01-01
Background Many plants have large and complex genomes with an abundance of repeated sequences. Many plants are also polyploid. Both of these attributes typify the genome architecture in the tribe Triticeae, whose members include economically important wheat, rye and barley. Large genome sizes, an abundance of repeated sequences, and polyploidy present challenges to genome-wide SNP discovery using next-generation sequencing (NGS) of total genomic DNA by making alignment and clustering of short reads generated by the NGS platforms difficult, particularly in the absence of a reference genome sequence. Results An annotation-based, genome-wide SNP discovery pipeline is reported using NGS data for large and complex genomes without a reference genome sequence. Roche 454 shotgun reads with low genome coverage of one genotype are annotated in order to distinguish single-copy sequences and repeat junctions from repetitive sequences and sequences shared by paralogous genes. Multiple genome equivalents of shotgun reads of another genotype generated with SOLiD or Solexa are then mapped to the annotated Roche 454 reads to identify putative SNPs. A pipeline program package, AGSNP, was developed and used for genome-wide SNP discovery in Aegilops tauschii-the diploid source of the wheat D genome, and with a genome size of 4.02 Gb, of which 90% is repetitive sequences. Genomic DNA of Ae. tauschii accession AL8/78 was sequenced with the Roche 454 NGS platform. Genomic DNA and cDNA of Ae. tauschii accession AS75 was sequenced primarily with SOLiD, although some Solexa and Roche 454 genomic sequences were also generated. A total of 195,631 putative SNPs were discovered in gene sequences, 155,580 putative SNPs were discovered in uncharacterized single-copy regions, and another 145,907 putative SNPs were discovered in repeat junctions. These SNPs were dispersed across the entire Ae. tauschii genome. To assess the false positive SNP discovery rate, DNA containing putative SNPs was amplified by PCR from AL8/78 and AS75 and resequenced with the ABI 3730 xl. In a sample of 302 randomly selected putative SNPs, 84.0% in gene regions, 88.0% in repeat junctions, and 81.3% in uncharacterized regions were validated. Conclusion An annotation-based genome-wide SNP discovery pipeline for NGS platforms was developed. The pipeline is suitable for SNP discovery in genomic libraries of complex genomes and does not require a reference genome sequence. The pipeline is applicable to all current NGS platforms, provided that at least one such platform generates relatively long reads. The pipeline package, AGSNP, and the discovered 497,118 Ae. tauschii SNPs can be accessed at (http://avena.pw.usda.gov/wheatD/agsnp.shtml). PMID:21266061
Haraksingh, Rajini R.; Abyzov, Alexej; Gerstein, Mark; Urban, Alexander E.; Snyder, Michael
2011-01-01
Accurate and efficient genome-wide detection of copy number variants (CNVs) is essential for understanding human genomic variation, genome-wide CNV association type studies, cytogenetics research and diagnostics, and independent validation of CNVs identified from sequencing based technologies. Numerous, array-based platforms for CNV detection exist utilizing array Comparative Genome Hybridization (aCGH), Single Nucleotide Polymorphism (SNP) genotyping or both. We have quantitatively assessed the abilities of twelve leading genome-wide CNV detection platforms to accurately detect Gold Standard sets of CNVs in the genome of HapMap CEU sample NA12878, and found significant differences in performance. The technologies analyzed were the NimbleGen 4.2 M, 2.1 M and 3×720 K Whole Genome and CNV focused arrays, the Agilent 1×1 M CGH and High Resolution and 2×400 K CNV and SNP+CGH arrays, the Illumina Human Omni1Quad array and the Affymetrix SNP 6.0 array. The Gold Standards used were a 1000 Genomes Project sequencing-based set of 3997 validated CNVs and an ultra high-resolution aCGH-based set of 756 validated CNVs. We found that sensitivity, total number, size range and breakpoint resolution of CNV calls were highest for CNV focused arrays. Our results are important for cost effective CNV detection and validation for both basic and clinical applications. PMID:22140474
Wang, Jing; Street, Nathaniel R.; Scofield, Douglas G.; Ingvarsson, Pär K.
2016-01-01
A central aim of evolutionary genomics is to identify the relative roles that various evolutionary forces have played in generating and shaping genetic variation within and among species. Here we use whole-genome resequencing data to characterize and compare genome-wide patterns of nucleotide polymorphism, site frequency spectrum, and population-scaled recombination rates in three species of Populus: Populus tremula, P. tremuloides, and P. trichocarpa. We find that P. tremuloides has the highest level of genome-wide variation, skewed allele frequencies, and population-scaled recombination rates, whereas P. trichocarpa harbors the lowest. Our findings highlight multiple lines of evidence suggesting that natural selection, due to both purifying and positive selection, has widely shaped patterns of nucleotide polymorphism at linked neutral sites in all three species. Differences in effective population sizes and rates of recombination largely explain the disparate magnitudes and signatures of linked selection that we observe among species. The present work provides the first phylogenetic comparative study on a genome-wide scale in forest trees. This information will also improve our ability to understand how various evolutionary forces have interacted to influence genome evolution among related species. PMID:26721855
Wang, Jing; Street, Nathaniel R; Scofield, Douglas G; Ingvarsson, Pär K
2016-03-01
A central aim of evolutionary genomics is to identify the relative roles that various evolutionary forces have played in generating and shaping genetic variation within and among species. Here we use whole-genome resequencing data to characterize and compare genome-wide patterns of nucleotide polymorphism, site frequency spectrum, and population-scaled recombination rates in three species of Populus: Populus tremula, P. tremuloides, and P. trichocarpa. We find that P. tremuloides has the highest level of genome-wide variation, skewed allele frequencies, and population-scaled recombination rates, whereas P. trichocarpa harbors the lowest. Our findings highlight multiple lines of evidence suggesting that natural selection, due to both purifying and positive selection, has widely shaped patterns of nucleotide polymorphism at linked neutral sites in all three species. Differences in effective population sizes and rates of recombination largely explain the disparate magnitudes and signatures of linked selection that we observe among species. The present work provides the first phylogenetic comparative study on a genome-wide scale in forest trees. This information will also improve our ability to understand how various evolutionary forces have interacted to influence genome evolution among related species. Copyright © 2016 by the Genetics Society of America.
Xu, Andrew Wei
2010-09-01
In genome rearrangement, given a set of genomes G and a distance measure d, the median problem asks for another genome q that minimizes the total distance [Formula: see text]. This is a key problem in genome rearrangement based phylogenetic analysis. Although this problem is known to be NP-hard, we have shown in a previous article, on circular genomes and under the DCJ distance measure, that a family of patterns in the given genomes--represented by adequate subgraphs--allow us to rapidly find exact solutions to the median problem in a decomposition approach. In this article, we extend this result to the case of linear multichromosomal genomes, in order to solve more interesting problems on eukaryotic nuclear genomes. A multi-way capping problem in the linear multichromosomal case imposes an extra computational challenge on top of the difficulty in the circular case, and this difficulty has been underestimated in our previous study and is addressed in this article. We represent the median problem by the capped multiple breakpoint graph, extend the adequate subgraphs into the capped adequate subgraphs, and prove optimality-preserving decomposition theorems, which give us the tools to solve the median problem and the multi-way capping optimization problem together. We also develop an exact algorithm ASMedian-linear, which iteratively detects instances of (capped) adequate subgraphs and decomposes problems into subproblems. Tested on simulated data, ASMedian-linear can rapidly solve most problems with up to several thousand genes, and it also can provide optimal or near-optimal solutions to the median problem under the reversal/HP distance measures. ASMedian-linear is available at http://sites.google.com/site/andrewweixu .
Hu, Yao; Li, Huaixing; Lu, Ling; Manichaikul, Ani; Zhu, Jingwen; Chen, Yii-Der I; Sun, Liang; Liang, Shuang; Siscovick, David S; Steffen, Lyn M; Tsai, Michael Y; Rich, Stephen S; Lemaitre, Rozenn N; Lin, Xu
2016-03-15
Epidemiological studies suggest that levels of n-3 and n-6 long-chain polyunsaturated fatty acids are associated with risk of cardio-metabolic outcomes across different ethnic groups. Recent genome-wide association studies in populations of European ancestry have identified several loci associated with plasma and/or erythrocyte polyunsaturated fatty acids. To identify additional novel loci, we carried out a genome-wide association study in two population-based cohorts consisting of 3521 Chinese participants, followed by a trans-ethnic meta-analysis with meta-analysis results from 8962 participants of European ancestry. Four novel loci (MYB, AGPAT4, DGAT2 and PPT2) reached genome-wide significance in the trans-ethnic meta-analysis (log10(Bayes Factor) ≥ 6). Of them, associations of MYB and AGPAT4 with docosatetraenoic acid (log10(Bayes Factor) = 11.5 and 8.69, respectively) also reached genome-wide significance in the Chinese-specific genome-wide association analyses (P = 4.15 × 10(-14) and 4.30 × 10(-12), respectively), while associations of DGAT2 with gamma-linolenic acid (log10(Bayes Factor) = 6.16) and of PPT2 with docosapentaenoic acid (log10(Bayes Factor) = 6.24) were nominally significant in both Chinese- and European-specific genome-wide association analyses (P ≤ 0.003). We also confirmed previously reported loci including FADS1, NTAN1, NRBF2, ELOVL2 and GCKR. Different effect sizes in FADS1 and independent association signals in ELOVL2 were observed. These results provide novel insight into the genetic background of polyunsaturated fatty acids and their differences between Chinese and European populations. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Optimizing Illumina next-generation sequencing library preparation for extremely AT-biased genomes.
Oyola, Samuel O; Otto, Thomas D; Gu, Yong; Maslen, Gareth; Manske, Magnus; Campino, Susana; Turner, Daniel J; Macinnis, Bronwyn; Kwiatkowski, Dominic P; Swerdlow, Harold P; Quail, Michael A
2012-01-03
Massively parallel sequencing technology is revolutionizing approaches to genomic and genetic research. Since its advent, the scale and efficiency of Next-Generation Sequencing (NGS) has rapidly improved. In spite of this success, sequencing genomes or genomic regions with extremely biased base composition is still a great challenge to the currently available NGS platforms. The genomes of some important pathogenic organisms like Plasmodium falciparum (high AT content) and Mycobacterium tuberculosis (high GC content) display extremes of base composition. The standard library preparation procedures that employ PCR amplification have been shown to cause uneven read coverage particularly across AT and GC rich regions, leading to problems in genome assembly and variation analyses. Alternative library-preparation approaches that omit PCR amplification require large quantities of starting material and hence are not suitable for small amounts of DNA/RNA such as those from clinical isolates. We have developed and optimized library-preparation procedures suitable for low quantity starting material and tolerant to extremely high AT content sequences. We have used our optimized conditions in parallel with standard methods to prepare Illumina sequencing libraries from a non-clinical and a clinical isolate (containing ~53% host contamination). By analyzing and comparing the quality of sequence data generated, we show that our optimized conditions that involve a PCR additive (TMAC), produces amplified libraries with improved coverage of extremely AT-rich regions and reduced bias toward GC neutral templates. We have developed a robust and optimized Next-Generation Sequencing library amplification method suitable for extremely AT-rich genomes. The new amplification conditions significantly reduce bias and retain the complexity of either extremes of base composition. This development will greatly benefit sequencing clinical samples that often require amplification due to low mass of DNA starting material.
snpGeneSets: An R Package for Genome-Wide Study Annotation
Mei, Hao; Li, Lianna; Jiang, Fan; Simino, Jeannette; Griswold, Michael; Mosley, Thomas; Liu, Shijian
2016-01-01
Genome-wide studies (GWS) of SNP associations and differential gene expressions have generated abundant results; next-generation sequencing technology has further boosted the number of variants and genes identified. Effective interpretation requires massive annotation and downstream analysis of these genome-wide results, a computationally challenging task. We developed the snpGeneSets package to simplify annotation and analysis of GWS results. Our package integrates local copies of knowledge bases for SNPs, genes, and gene sets, and implements wrapper functions in the R language to enable transparent access to low-level databases for efficient annotation of large genomic data. The package contains functions that execute three types of annotations: (1) genomic mapping annotation for SNPs and genes and functional annotation for gene sets; (2) bidirectional mapping between SNPs and genes, and genes and gene sets; and (3) calculation of gene effect measures from SNP associations and performance of gene set enrichment analyses to identify functional pathways. We applied snpGeneSets to type 2 diabetes (T2D) results from the NHGRI genome-wide association study (GWAS) catalog, a Finnish GWAS, and a genome-wide expression study (GWES). These studies demonstrate the usefulness of snpGeneSets for annotating and performing enrichment analysis of GWS results. The package is open-source, free, and can be downloaded at: https://www.umc.edu/biostats_software/. PMID:27807048
Jordan, Rebecca; Dillon, Shannon K; Prober, Suzanne M; Hoffmann, Ary A
2016-12-01
In order to contribute to evolutionary resilience and adaptive potential in highly modified landscapes, revegetated areas should ideally reflect levels of genetic diversity within and across natural stands. Landscape genomic analyses enable such diversity patterns to be characterized at genome and chromosomal levels. Landscape-wide patterns of genomic diversity were assessed in Eucalyptus microcarpa, a dominant tree species widely used in revegetation in Southeastern Australia. Trees from small and large patches within large remnants, small isolated remnants and revegetation sites were assessed across the now highly fragmented distribution of this species using the DArTseq genomic approach. Genomic diversity was similar within all three types of remnant patches analysed, although often significantly but only slightly lower in revegetation sites compared with natural remnants. Differences in diversity between stand types varied across chromosomes. Genomic differentiation was higher between small, isolated remnants, and among revegetated sites compared with natural stands. We conclude that small remnants and revegetated sites of our E. microcarpa samples largely but not completely capture patterns in genomic diversity across the landscape. Genomic approaches provide a powerful tool for assessing restoration efforts across the landscape. © 2016 The Authors. New Phytologist © 2016 New Phytologist Trust.
A Genome-Wide Breast Cancer Scan in African Americans
2010-06-01
SNPs from the African American breast cancer scan to COGs , a European collaborative study which is has designed a SNP array with that will be genotyped...Award Number: W81XWH-08-1-0383 TITLE: A Genome-wide Breast Cancer Scan in African Americans PRINCIPAL INVESTIGATOR: Christopher A...SUBTITLE A Genome-wide Breast Cancer Scan in African Americans 5a. CONTRACT NUMBER 5b. GRANT NUMBER W81XWH-08-1-0383 5c. PROGRAM
A genome-wide association study in soybean
USDA-ARS?s Scientific Manuscript database
A genome-wide association study (GWAS) was performed to estimate the feasibility of identifying genes controlling the quantitative traits, seed protein and oil concentration, in 298 soybean germplasm accessions exhibiting a wide range of seed protein and oil content. A total of 55,159 single nucleo...
Meta-analysis of genome-wide association from genomic prediction models
USDA-ARS?s Scientific Manuscript database
A limitation of many genome-wide association studies (GWA) in animal breeding is that there are many loci with small effect sizes; thus, larger sample sizes (N) are required to guarantee suitable power of detection. To increase sample size, results from different GWA can be combined in a meta-analys...
USDA-ARS?s Scientific Manuscript database
The comprehensive identification of genes underlying phenotypic variation of complex traits such as disease resistance remains one of the greatest challenges in biology despite having genome sequences and more powerful tools. Most genome-wide screens lack sufficient resolving power as they typically...
USDA-ARS?s Scientific Manuscript database
Bovine Viral Diarrhea Virus (BVDV) is a diverse group of viruses causing disease in ruminants. The objective was to determine genomic regions harboring single nucleotide polymorphisms (SNP) associated with presence or absence of persistent BVDV infections. A genome wide association approach based on...
Signatures of positive selection in East African Shorthorn Zebu: a genome-wide SNP analysis
USDA-ARS?s Scientific Manuscript database
The small East African Shorthorn Zebu is the main indigenous cattle across East Africa. A recent genome wide SNPs analysis has revealed their ancient stable African taurine x Asian zebu admixture. Here, we assess the presence of candidate signature of positive selection in their genome, with the aim...
USDA-ARS?s Scientific Manuscript database
Non-high-density lipoprotein cholesterol (NHDL) is an independent and superior predictor of CVD risk as compared to low-density lipoprotein alone. It represents a spectrum of atherogenic lipid fractions with possibly a distinct genomic signature. We performed genome-wide association studies (GWAS) t...
USDA-ARS?s Scientific Manuscript database
Genome wide analysis of orthologous clusters is an important component of comparative genomics studies. Identifying the overlap among orthologous clusters can enable us to elucidate the function and evolution of proteins across multiple species. Here, we report a web platform named OrthoVenn that i...
Performance of Polygenic Scores for Predicting Phobic Anxiety
Walter, Stefan; Glymour, M. Maria; Koenen, Karestan; Liang, Liming; Tchetgen Tchetgen, Eric J.; Cornelis, Marilyn; Chang, Shun-Chiao; Rimm, Eric; Kawachi, Ichiro; Kubzansky, Laura D.
2013-01-01
Context Anxiety disorders are common, with a lifetime prevalence of 20% in the U.S., and are responsible for substantial burdens of disability, missed work days and health care utilization. To date, no causal genetic variants have been identified for anxiety, anxiety disorders, or related traits. Objective To investigate whether a phobic anxiety symptom score was associated with 3 alternative polygenic risk scores, derived from external genome-wide association studies of anxiety, an internally estimated agnostic polygenic score, or previously identified candidate genes. Design Longitudinal follow-up study. Using linear and logistic regression we investigated whether phobic anxiety was associated with polygenic risk scores derived from internal, leave-one out genome-wide association studies, from 31 candidate genes, and from out-of-sample genome-wide association weights previously shown to predict depression and anxiety in another cohort. Setting and Participants Study participants (n = 11,127) were individuals from the Nurses' Health Study and Health Professionals Follow-up Study. Main Outcome Measure Anxiety symptoms were assessed via the 8-item phobic anxiety scale of the Crown Crisp Index at two time points, from which a continuous phenotype score was derived. Results We found no genome-wide significant associations with phobic anxiety. Phobic anxiety was also not associated with a polygenic risk score derived from the genome-wide association study beta weights using liberal p-value thresholds; with a previously published genome-wide polygenic score; or with a candidate gene risk score based on 31 genes previously hypothesized to predict anxiety. Conclusion There is a substantial gap between twin-study heritability estimates of anxiety disorders ranging between 20–40% and heritability explained by genome-wide association results. New approaches such as improved genome imputations, application of gene expression and biological pathways information, and incorporating social or environmental modifiers of genetic risks may be necessary to identify significant genetic predictors of anxiety. PMID:24278274
Smith, Nicholas L; Felix, Janine F; Morrison, Alanna C; Demissie, Serkalem; Glazer, Nicole L; Loehr, Laura R; Cupples, L Adrienne; Dehghan, Abbas; Lumley, Thomas; Rosamond, Wayne D; Lieb, Wolfgang; Rivadeneira, Fernando; Bis, Joshua C; Folsom, Aaron R; Benjamin, Emelia; Aulchenko, Yurii S; Haritunians, Talin; Couper, David; Murabito, Joanne; Wang, Ying A; Stricker, Bruno H; Gottdiener, John S; Chang, Patricia P; Wang, Thomas J; Rice, Kenneth M; Hofman, Albert; Heckbert, Susan R; Fox, Ervin R; O'Donnell, Christopher J; Uitterlinden, Andre G; Rotter, Jerome I; Willerson, James T; Levy, Daniel; van Duijn, Cornelia M; Psaty, Bruce M; Witteman, Jacqueline C M; Boerwinkle, Eric; Vasan, Ramachandran S
2010-06-01
Although genetic factors contribute to the onset of heart failure (HF), no large-scale genome-wide investigation of HF risk has been published to date. We have investigated the association of 2,478,304 single-nucleotide polymorphisms with incident HF by meta-analyzing data from 4 community-based prospective cohorts: the Atherosclerosis Risk in Communities Study, the Cardiovascular Health Study, the Framingham Heart Study, and the Rotterdam Study. Eligible participants for these analyses were of European or African ancestry and free of clinical HF at baseline. Each study independently conducted genome-wide scans and imputed data to the approximately 2.5 million single-nucleotide polymorphisms in HapMap. Within each study, Cox proportional hazards regression models provided age- and sex-adjusted estimates of the association between each variant and time to incident HF. Fixed-effect meta-analyses combined results for each single-nucleotide polymorphism from the 4 cohorts to produce an overall association estimate and P value. A genome-wide significance P value threshold was set a priori at 5.0x10(-7). During a mean follow-up of 11.5 years, 2526 incident HF events (12%) occurred in 20 926 European-ancestry participants. The meta-analysis identified a genome-wide significant locus at chromosomal position 15q22 (1.4x10(-8)), which was 58.8 kb from USP3. Among 2895 African-ancestry participants, 466 incident HF events (16%) occurred during a mean follow-up of 13.7 years. One genome-wide significant locus was identified at 12q14 (6.7x10(-8)), which was 6.3 kb from LRIG3. We identified 2 loci that were associated with incident HF and exceeded genome-wide significance. The findings merit replication in other community-based settings of incident HF.
Rapid scoring of genes in microbial pan-genome-wide association studies with Scoary.
Brynildsrud, Ola; Bohlin, Jon; Scheffer, Lonneke; Eldholm, Vegard
2016-11-25
Genome-wide association studies (GWAS) have become indispensable in human medicine and genomics, but very few have been carried out on bacteria. Here we introduce Scoary, an ultra-fast, easy-to-use, and widely applicable software tool that scores the components of the pan-genome for associations to observed phenotypic traits while accounting for population stratification, with minimal assumptions about evolutionary processes. We call our approach pan-GWAS to distinguish it from traditional, single nucleotide polymorphism (SNP)-based GWAS. Scoary is implemented in Python and is available under an open source GPLv3 license at https://github.com/AdmiralenOla/Scoary .
Genetic study of multimodal imaging Alzheimer's disease progression score implicates novel loci.
Scelsi, Marzia A; Khan, Raiyan R; Lorenzi, Marco; Christopher, Leigh; Greicius, Michael D; Schott, Jonathan M; Ourselin, Sebastien; Altmann, Andre
2018-05-30
Identifying genetic risk factors underpinning different aspects of Alzheimer's disease has the potential to provide important insights into pathogenesis. Moving away from simple case-control definitions, there is considerable interest in using quantitative endophenotypes, such as those derived from imaging as outcome measures. Previous genome-wide association studies of imaging-derived biomarkers in sporadic late-onset Alzheimer's disease focused only on phenotypes derived from single imaging modalities. In contrast, we computed a novel multi-modal neuroimaging phenotype comprising cortical amyloid burden and bilateral hippocampal volume. Both imaging biomarkers were used as input to a disease progression modelling algorithm, which estimates the biomarkers' long-term evolution curves from population-based longitudinal data. Among other parameters, the algorithm computes the shift in time required to optimally align a subjects' biomarker trajectories with these population curves. This time shift serves as a disease progression score and it was used as a quantitative trait in a discovery genome-wide association study with n = 944 subjects from the Alzheimer's Disease Neuroimaging Initiative database diagnosed as Alzheimer's disease, mild cognitive impairment or healthy at the time of imaging. We identified a genome-wide significant locus implicating LCORL (rs6850306, chromosome 4; P = 1.03 × 10-8). The top variant rs6850306 was found to act as an expression quantitative trait locus for LCORL in brain tissue. The clinical role of rs6850306 in conversion from healthy ageing to mild cognitive impairment or Alzheimer's disease was further validated in an independent cohort comprising healthy, older subjects from the National Alzheimer's Coordinating Center database. Specifically, possession of a minor allele at rs6850306 was protective against conversion from mild cognitive impairment to Alzheimer's disease in the National Alzheimer's Coordinating Center cohort (hazard ratio = 0.593, 95% confidence interval = 0.387-0.907, n = 911, PBonf = 0.032), in keeping with the negative direction of effect reported in the genome-wide association study (βdisease progression score = -0.07 ± 0.01). The implicated locus is linked to genes with known connections to Alzheimer's disease pathophysiology and other neurodegenerative diseases. Using multimodal imaging phenotypes in association studies may assist in unveiling the genetic drivers of the onset and progression of complex diseases.
Efficient Breeding by Genomic Mating.
Akdemir, Deniz; Sánchez, Julio I
2016-01-01
Selection in breeding programs can be done by using phenotypes (phenotypic selection), pedigree relationship (breeding value selection) or molecular markers (marker assisted selection or genomic selection). All these methods are based on truncation selection, focusing on the best performance of parents before mating. In this article we proposed an approach to breeding, named genomic mating, which focuses on mating instead of truncation selection. Genomic mating uses information in a similar fashion to genomic selection but includes information on complementation of parents to be mated. Following the efficiency frontier surface, genomic mating uses concepts of estimated breeding values, risk (usefulness) and coefficient of ancestry to optimize mating between parents. We used a genetic algorithm to find solutions to this optimization problem and the results from our simulations comparing genomic selection, phenotypic selection and the mating approach indicate that current approach for breeding complex traits is more favorable than phenotypic and genomic selection. Genomic mating is similar to genomic selection in terms of estimating marker effects, but in genomic mating the genetic information and the estimated marker effects are used to decide which genotypes should be crossed to obtain the next breeding population.
A Genome-Wide Association Study for Regulators of Micronucleus Formation in Mice.
McIntyre, Rebecca E; Nicod, Jérôme; Robles-Espinoza, Carla Daniela; Maciejowski, John; Cai, Na; Hill, Jennifer; Verstraten, Ruth; Iyer, Vivek; Rust, Alistair G; Balmus, Gabriel; Mott, Richard; Flint, Jonathan; Adams, David J
2016-08-09
In mammals the regulation of genomic instability plays a key role in tumor suppression and also controls genome plasticity, which is important for recombination during the processes of immunity and meiosis. Most studies to identify regulators of genomic instability have been performed in cells in culture or in systems that report on gross rearrangements of the genome, yet subtle differences in the level of genomic instability can contribute to whole organism phenotypes such as tumor predisposition. Here we performed a genome-wide association study in a population of 1379 outbred Crl:CFW(SW)-US_P08 mice to dissect the genetic landscape of micronucleus formation, a biomarker of chromosomal breaks, whole chromosome loss, and extranuclear DNA. Variation in micronucleus levels is a complex trait with a genome-wide heritability of 53.1%. We identify seven loci influencing micronucleus formation (false discovery rate <5%), and define candidate genes at each locus. Intriguingly at several loci we find evidence for sexual dimorphism in micronucleus formation, with a locus on chromosome 11 being specific to males. Copyright © 2016 McIntyre et al.
Gene networks are rapidly growing in size and number, raising the question of which networks are most appropriate for particular applications. Here, we evaluate 21 human genome-wide interaction networks for their ability to recover 446 disease gene sets identified through literature curation, gene expression profiling, or genome-wide association studies. While all networks have some ability to recover disease genes, we observe a wide range of performance with STRING, ConsensusPathDB, and GIANT networks having the best performance overall.
Optimality models in the age of experimental evolution and genomics.
Bull, J J; Wang, I-N
2010-09-01
Optimality models have been used to predict evolution of many properties of organisms. They typically neglect genetic details, whether by necessity or design. This omission is a common source of criticism, and although this limitation of optimality is widely acknowledged, it has mostly been defended rather than evaluated for its impact. Experimental adaptation of model organisms provides a new arena for testing optimality models and for simultaneously integrating genetics. First, an experimental context with a well-researched organism allows dissection of the evolutionary process to identify causes of model failure--whether the model is wrong about genetics or selection. Second, optimality models provide a meaningful context for the process and mechanics of evolution, and thus may be used to elicit realistic genetic bases of adaptation--an especially useful augmentation to well-researched genetic systems. A few studies of microbes have begun to pioneer this new direction. Incompatibility between the assumed and actual genetics has been demonstrated to be the cause of model failure in some cases. More interestingly, evolution at the phenotypic level has sometimes matched prediction even though the adaptive mutations defy mechanisms established by decades of classic genetic studies. Integration of experimental evolutionary tests with genetics heralds a new wave for optimality models and their extensions that does not merely emphasize the forces driving evolution.
Evaluation of methods and marker Systems in Genomic Selection of oil palm (Elaeis guineensis Jacq.).
Kwong, Qi Bin; Teh, Chee Keng; Ong, Ai Ling; Chew, Fook Tim; Mayes, Sean; Kulaveerasingam, Harikrishna; Tammi, Martti; Yeoh, Suat Hui; Appleton, David Ross; Harikrishna, Jennifer Ann
2017-12-11
Genomic selection (GS) uses genome-wide markers as an attempt to accelerate genetic gain in breeding programs of both animals and plants. This approach is particularly useful for perennial crops such as oil palm, which have long breeding cycles, and for which the optimal method for GS is still under debate. In this study, we evaluated the effect of different marker systems and modeling methods for implementing GS in an introgressed dura family derived from a Deli dura x Nigerian dura (Deli x Nigerian) with 112 individuals. This family is an important breeding source for developing new mother palms for superior oil yield and bunch characters. The traits of interest selected for this study were fruit-to-bunch (F/B), shell-to-fruit (S/F), kernel-to-fruit (K/F), mesocarp-to-fruit (M/F), oil per palm (O/P) and oil-to-dry mesocarp (O/DM). The marker systems evaluated were simple sequence repeats (SSRs) and single nucleotide polymorphisms (SNPs). RR-BLUP, Bayesian A, B, Cπ, LASSO, Ridge Regression and two machine learning methods (SVM and Random Forest) were used to evaluate GS accuracy of the traits. The kinship coefficient between individuals in this family ranged from 0.35 to 0.62. S/F and O/DM had the highest genomic heritability, whereas F/B and O/P had the lowest. The accuracies using 135 SSRs were low, with accuracies of the traits around 0.20. The average accuracy of machine learning methods was 0.24, as compared to 0.20 achieved by other methods. The trait with the highest mean accuracy was F/B (0.28), while the lowest were both M/F and O/P (0.18). By using whole genomic SNPs, the accuracies for all traits, especially for O/DM (0.43), S/F (0.39) and M/F (0.30) were improved. The average accuracy of machine learning methods was 0.32, compared to 0.31 achieved by other methods. Due to high genomic resolution, the use of whole-genome SNPs improved the efficiency of GS dramatically for oil palm and is recommended for dura breeding programs. Machine learning slightly outperformed other methods, but required parameters optimization for GS implementation.
Willing, Eva-Maria; Bentzen, Paul; van Oosterhout, Cock; Hoffmann, Margarete; Cable, Joanne; Breden, Felix; Weigel, Detlef; Dreyer, Christine
2010-03-01
Adaptation of guppies (Poecilia reticulata) to contrasting upland and lowland habitats has been extensively studied with respect to behaviour, morphology and life history traits. Yet population history has not been studied at the whole-genome level. Although single nucleotide polymorphisms (SNPs) are the most abundant form of variation in many genomes and consequently very informative for a genome-wide picture of standing natural variation in populations, genome-wide SNP data are rarely available for wild vertebrates. Here we use genetically mapped SNP markers to comprehensively survey genetic variation within and among naturally occurring guppy populations from a wide geographic range in Trinidad and Venezuela. Results from three different clustering methods, Neighbor-net, principal component analysis (PCA) and Bayesian analysis show that the population substructure agrees with geographic separation and largely with previously hypothesized patterns of historical colonization. Within major drainages (Caroni, Oropouche and Northern), populations are genetically similar, but those in different geographic regions are highly divergent from one another, with some indications of ancient shared polymorphisms. Clear genomic signatures of a previous introduction experiment were seen, and we detected additional potential admixture events. Headwater populations were significantly less heterozygous than downstream populations. Pairwise F(ST) values revealed marked differences in allele frequencies among populations from different regions, and also among populations within the same region. F(ST) outlier methods indicated some regions of the genome as being under directional selection. Overall, this study demonstrates the power of a genome-wide SNP data set to inform for studies on natural variation, adaptation and evolution of wild populations.
Brant, Steven R.; Okou, David T.; Simpson, Claire L.; Cutler, David J.; Haritunians, Talin; Bradfield, Jonathan P.; Chopra, Pankaj; Prince, Jarod; Begum, Ferdouse; Kumar, Archana; Huang, Chengrui; Venkateswaran, Suresh; Datta, Lisa W.; Wei, Zhi; Thomas, Kelly; Herrinton, Lisa J.; Klapproth, Jan-Micheal A.; Quiros, Antonio J.; Seminerio, Jenifer; Liu, Zhenqiu; Alexander, Jonathan S.; Baldassano, Robert N.; Dudley-Brown, Sharon; Cross, Raymond K.; Dassopoulos, Themistocles; Denson, Lee A.; Dhere, Tanvi A.; Dryden, Gerald W.; Hanson, John S.; Hou, Jason K.; Hussain, Sunny Z.; Hyams, Jeffrey S.; Isaacs, Kim L.; Kader, Howard; Kappelman, Michael D.; Katz, Jeffry; Kellermayer, Richard; Kirschner, Barbara S.; Kuemmerle, John F.; Kwon, John H.; Lazarev, Mark; Li, Ellen; Mack, David; Mannon, Peter; Moulton, Dedrick E.; Newberry, Rodney D.; Osuntokun, Bankole O.; Patel, Ashish S.; Saeed, Shehzad A.; Targan, Stephan R.; Valentine, John F.; Wang, Ming-Hsi; Zonca, Martin; Rioux, John D.; Duerr, Richard H.; Silverberg, Mark S.; Cho, Judy H.; Hakonarson, Hakon; Zwick, Michael E.; McGovern, Dermot P.B.; Kugathasan, Subra
2016-01-01
Background & Aims The inflammatory bowel diseases (IBD) ulcerative colitis (UC) and Crohn’s disease (CD) cause significant morbidity and are increasing in prevalence among all populations, including African Americans. More than 200 susceptibility loci have been identified in populations of predominantly European ancestry, but few loci have been associated with IBD in other ethnicities. Methods We performed 2 high-density, genome-wide scans comprising 2345 cases of African Americans with IBD (1646 with CD, 583 with UC, and 116 inflammatory bowel disease unclassified [IBD-U]) and 5002 individuals without IBD (controls, identified from the Health Retirement Study and Kaiser Permanente database). Single-nucleotide polymorphisms (SNPs) associated at P<5.0×10−8 in meta-analysis with a nominal evidence (P<.05) in each scan were considered to have genome-wide significance. Results We detected SNPs at HLA-DRB1, and African-specific SNPs at ZNF649 and LSAMP, with associations of genome-wide significance for UC. We detected SNPs at USP25 with associations of genome-wide significance associations for IBD. No associations of genome-wide significance were detected for CD. In addition, 9 genes previously associated with IBD contained SNPs with significant evidence for replication (P<1.6×10−6): ADCY3, CXCR6, HLA-DRB1 to HLA-DQA1 (genome-wide significance on conditioning), IL12B, PTGER4, and TNC for IBD; IL23R, PTGER4, and SNX20 (in strong linkage disequilibrium with NOD2) for CD; and KCNQ2 (near TNFRSF6B) for UC. Several of these genes, such as TNC (near TNFSF15), CXCR6, and genes associated with IBD at the HLA locus, contained SNPs with unique association patterns with African-specific alleles. Conclusions We performed a genome-wide association study of African Americans with IBD and identified loci associated with CD and UC in only this population; we also replicated loci identified in European populations. The detection of variants associated with IBD risk in only people of African descent demonstrates the importance of studying the genetics of IBD and other complex diseases in populations beyond those of European ancestry. PMID:27693347
Mapping 3D genome architecture through in situ DNase Hi-C.
Ramani, Vijay; Cusanovich, Darren A; Hause, Ronald J; Ma, Wenxiu; Qiu, Ruolan; Deng, Xinxian; Blau, C Anthony; Disteche, Christine M; Noble, William S; Shendure, Jay; Duan, Zhijun
2016-11-01
With the advent of massively parallel sequencing, considerable work has gone into adapting chromosome conformation capture (3C) techniques to study chromosomal architecture at a genome-wide scale. We recently demonstrated that the inactive murine X chromosome adopts a bipartite structure using a novel 3C protocol, termed in situ DNase Hi-C. Like traditional Hi-C protocols, in situ DNase Hi-C requires that chromatin be chemically cross-linked, digested, end-repaired, and proximity-ligated with a biotinylated bridge adaptor. The resulting ligation products are optionally sheared, affinity-purified via streptavidin bead immobilization, and subjected to traditional next-generation library preparation for Illumina paired-end sequencing. Importantly, in situ DNase Hi-C obviates the dependence on a restriction enzyme to digest chromatin, instead relying on the endonuclease DNase I. Libraries generated by in situ DNase Hi-C have a higher effective resolution than traditional Hi-C libraries, which makes them valuable in cases in which high sequencing depth is allowed for, or when hybrid capture technologies are expected to be used. The protocol described here, which involves ∼4 d of bench work, is optimized for the study of mammalian cells, but it can be broadly applicable to any cell or tissue of interest, given experimental parameter optimization.
Yoon, Hyejin; Leitner, Thomas
2014-12-17
Analyses of entire viral genomes or mtDNA requires comprehensive design of many primers across their genomes. In addition, simultaneous optimization of several DNA primer design criteria may improve overall experimental efficiency and downstream bioinformatic processing. To achieve these goals, we developed PrimerDesign-M. It includes several options for multiple-primer design, allowing researchers to efficiently design walking primers that cover long DNA targets, such as entire HIV-1 genomes, and that optimizes primers simultaneously informed by genetic diversity in multiple alignments and experimental design constraints given by the user. PrimerDesign-M can also design primers that include DNA barcodes and minimize primer dimerization. PrimerDesign-Mmore » finds optimal primers for highly variable DNA targets and facilitates design flexibility by suggesting alternative designs to adapt to experimental conditions.« less
Implementing meta-analysis from genome-wide association studies for pork quality traits
USDA-ARS?s Scientific Manuscript database
Pork quality plays an important role in the meat processing industry, thus different methodologies have been implemented to elucidate the genetic architecture of traits affecting meat quality. One of the most common and widely used approaches is to perform genome-wide association (GWA) studies. Howe...
Inferring genome-wide interplay landscape between DNA methylation and transcriptional regulation.
Tang, Binhua; Wang, Xin
2015-01-01
DNA methylation and transcriptional regulation play important roles in cancer cell development and differentiation processes. Based on the currently available cell line profiling information from the ENCODE Consortium, we propose a Bayesian inference model to infer and construct genome-wide interaction landscape between DNA methylation and transcriptional regulation, which sheds light on the underlying complex functional mechanisms important within the human cancer and disease context. For the first time, we select all the currently available cell lines (>=20) and transcription factors (>=80) profiling information from the ENCODE Consortium portal. Through the integration of those genome-wide profiling sources, our genome-wide analysis detects multiple functional loci of interest, and indicates that DNA methylation is cell- and region-specific, due to the interplay mechanisms with transcription regulatory activities. We validate our analysis results with the corresponding RNA-sequencing technique for those detected genomic loci. Our results provide novel and meaningful insights for the interplay mechanisms of transcriptional regulation and gene expression for the human cancer and disease studies.
Liu, Xin; Wang, Li Gang; Luo, Wei Zhen; Li, Yong; Liang, Jing; Yan, Hua; Zhao, Ke Bin; Wang, Li Xian; Zhang, Long Chao
2014-12-01
A high-density single nucleotide polymorphism (SNP) array containing 62 163 markers was employed for a genome-wide association study (GWAS) to identify variants associated with lean meat in ham (LMH, %) and lean meat percentage (LMP, %) within a porcine Large White×Minzhu intercross population. For each individual, LMH and LMP were measured after slaughter at the age of 240±7 days. A total of 557 F2 animals were genotyped. The GWAS revealed that 21 SNPs showed significant genome-wide or chromosome-wide associations with LMH and LMP by the Genome-wide Rapid Association using Mixed Model and Regression-Genomic Control approach. Nineteen significant genome-wide SNPs were mapped to the distal end of Sus Scrofa Chromosome (SSC) 2, where a major known gene responsible for muscle mass, IGF2 is located. A conditioned analysis, in which the genotype of the strongest associated SNP is included as a fixed effect in the model, showed that those significant SNPs on SSC2 were derived from a single quantitative trait locus. The two chromosome-wide association SNPs on SSC1 disappeared after conditioned analysis suggested the association signal is a false association derived from using a F2 population. The present result is expected to lead to novel insights into muscle mass in different pig breeds and lays a preliminary foundation for follow-up studies for identification of causal mutations for subsequent application in marker-assisted selection programs for improving muscle mass in pigs. © 2014 Japanese Society of Animal Science.
Bai, Yu; Iwasaki, Yuki; Kanaya, Shigehiko; Zhao, Yue; Ikemura, Toshimichi
2014-01-01
With remarkable increase of genomic sequence data of a wide range of species, novel tools are needed for comprehensive analyses of the big sequence data. Self-Organizing Map (SOM) is an effective tool for clustering and visualizing high-dimensional data such as oligonucleotide composition on one map. By modifying the conventional SOM, we have previously developed Batch-Learning SOM (BLSOM), which allows classification of sequence fragments according to species, solely depending on the oligonucleotide composition. In the present study, we introduce the oligonucleotide BLSOM used for characterization of vertebrate genome sequences. We first analyzed pentanucleotide compositions in 100 kb sequences derived from a wide range of vertebrate genomes and then the compositions in the human and mouse genomes in order to investigate an efficient method for detecting differences between the closely related genomes. BLSOM can recognize the species-specific key combination of oligonucleotide frequencies in each genome, which is called a "genome signature," and the specific regions specifically enriched in transcription-factor-binding sequences. Because the classification and visualization power is very high, BLSOM is an efficient powerful tool for extracting a wide range of information from massive amounts of genomic sequences (i.e., big sequence data).
USDA-ARS?s Scientific Manuscript database
Genotyping-by-sequencing allows for large-scale genetic analyses in plant species with no reference genome, creating the challenge of sound inference in the presence of uncertain genotypes. Here we report an imputation-based genome-wide association study (GWAS) in reed canarygrass (Phalaris arundina...
USDA-ARS?s Scientific Manuscript database
Genotyping by sequencing allows for large-scale genetic analyses in plant species with no reference genome, but sets the challenge of sound inference in presence of uncertain genotypes. We report an imputation-based genome-wide association study (GWAS) in reed canarygrass (Phalaris arundinacea L., P...
USDA-ARS?s Scientific Manuscript database
The identification of specific genes underlying phenotypic variation of complex traits remains one of the greatest challenges in biology despite having genome sequences and more powerful tools. Most genome-wide screens lack sufficient resolving power as they typically depend on linkage. One altern...
USDA-ARS?s Scientific Manuscript database
Copy number variations (CNVs) are large insertions, deletions or duplications in the genome that vary between members of a species and are known to affect a wide variety of phenotypic traits. In this study, we identified CNVs in a population of bulls using low coverage next-generation sequence data....
Cost-Effective Cloud Computing: A Case Study Using the Comparative Genomics Tool, Roundup
Kudtarkar, Parul; DeLuca, Todd F.; Fusaro, Vincent A.; Tonellato, Peter J.; Wall, Dennis P.
2010-01-01
Background Comparative genomics resources, such as ortholog detection tools and repositories are rapidly increasing in scale and complexity. Cloud computing is an emerging technological paradigm that enables researchers to dynamically build a dedicated virtual cluster and may represent a valuable alternative for large computational tools in bioinformatics. In the present manuscript, we optimize the computation of a large-scale comparative genomics resource—Roundup—using cloud computing, describe the proper operating principles required to achieve computational efficiency on the cloud, and detail important procedures for improving cost-effectiveness to ensure maximal computation at minimal costs. Methods Utilizing the comparative genomics tool, Roundup, as a case study, we computed orthologs among 902 fully sequenced genomes on Amazon’s Elastic Compute Cloud. For managing the ortholog processes, we designed a strategy to deploy the web service, Elastic MapReduce, and maximize the use of the cloud while simultaneously minimizing costs. Specifically, we created a model to estimate cloud runtime based on the size and complexity of the genomes being compared that determines in advance the optimal order of the jobs to be submitted. Results We computed orthologous relationships for 245,323 genome-to-genome comparisons on Amazon’s computing cloud, a computation that required just over 200 hours and cost $8,000 USD, at least 40% less than expected under a strategy in which genome comparisons were submitted to the cloud randomly with respect to runtime. Our cost savings projections were based on a model that not only demonstrates the optimal strategy for deploying RSD to the cloud, but also finds the optimal cluster size to minimize waste and maximize usage. Our cost-reduction model is readily adaptable for other comparative genomics tools and potentially of significant benefit to labs seeking to take advantage of the cloud as an alternative to local computing infrastructure. PMID:21258651
Chopra, Pankaj; Papale, Ligia A; White, Andrew T J; Hatch, Andrea; Brown, Ryan M; Garthwaite, Mark A; Roseboom, Patrick H; Golos, Thaddeus G; Warren, Stephen T; Alisch, Reid S
2014-02-13
Methylation on the fifth position of cytosine (5-mC) is an essential epigenetic mark that is linked to both normal neurodevelopment and neurological diseases. The recent identification of another modified form of cytosine, 5-hydroxymethylcytosine (5-hmC), in both stem cells and post-mitotic neurons, raises new questions as to the role of this base in mediating epigenetic effects. Genomic studies of these marks using model systems are limited, particularly with array-based tools, because the standard method of detecting DNA methylation cannot distinguish between 5-mC and 5-hmC and most methods have been developed to only survey the human genome. We show that non-human data generated using the optimization of a widely used human DNA methylation array, designed only to detect 5-mC, reproducibly distinguishes tissue types within and between chimpanzee, rhesus, and mouse, with correlations near the human DNA level (R(2) > 0.99). Genome-wide methylation analysis, using this approach, reveals 6,102 differentially methylated loci between rhesus placental and fetal tissues with pathways analysis significantly overrepresented for developmental processes. Restricting the analysis to oncogenes and tumor suppressor genes finds 76 differentially methylated loci, suggesting that rhesus placental tissue carries a cancer epigenetic signature. Similarly, adapting the assay to detect 5-hmC finds highly reproducible 5-hmC levels within human, rhesus, and mouse brain tissue that is species-specific with a hierarchical abundance among the three species (human > rhesus > mouse). Annotation of 5-hmC with respect to gene structure reveals a significant prevalence in the 3'UTR and an association with chromatin-related ontological terms, suggesting an epigenetic feedback loop mechanism for 5-hmC. Together, these data show that this array-based methylation assay is generalizable to all mammals for the detection of both 5-mC and 5-hmC, greatly improving the utility of mammalian model systems to study the role of epigenetics in human health, disease, and evolution.
Training set optimization under population structure in genomic selection
USDA-ARS?s Scientific Manuscript database
The optimization of the training set (TRS) in genomic selection (GS) has received much interest in both animal and plant breeding, because it is critical to the accuracy of the prediction models. In this study, five different TRS sampling algorithms, stratified sampling, mean of the Coefficient of D...
Atlas2 Cloud: a framework for personal genome analysis in the cloud
2012-01-01
Background Until recently, sequencing has primarily been carried out in large genome centers which have invested heavily in developing the computational infrastructure that enables genomic sequence analysis. The recent advancements in next generation sequencing (NGS) have led to a wide dissemination of sequencing technologies and data, to highly diverse research groups. It is expected that clinical sequencing will become part of diagnostic routines shortly. However, limited accessibility to computational infrastructure and high quality bioinformatic tools, and the demand for personnel skilled in data analysis and interpretation remains a serious bottleneck. To this end, the cloud computing and Software-as-a-Service (SaaS) technologies can help address these issues. Results We successfully enabled the Atlas2 Cloud pipeline for personal genome analysis on two different cloud service platforms: a community cloud via the Genboree Workbench, and a commercial cloud via the Amazon Web Services using Software-as-a-Service model. We report a case study of personal genome analysis using our Atlas2 Genboree pipeline. We also outline a detailed cost structure for running Atlas2 Amazon on whole exome capture data, providing cost projections in terms of storage, compute and I/O when running Atlas2 Amazon on a large data set. Conclusions We find that providing a web interface and an optimized pipeline clearly facilitates usage of cloud computing for personal genome analysis, but for it to be routinely used for large scale projects there needs to be a paradigm shift in the way we develop tools, in standard operating procedures, and in funding mechanisms. PMID:23134663
Atlas2 Cloud: a framework for personal genome analysis in the cloud.
Evani, Uday S; Challis, Danny; Yu, Jin; Jackson, Andrew R; Paithankar, Sameer; Bainbridge, Matthew N; Jakkamsetti, Adinarayana; Pham, Peter; Coarfa, Cristian; Milosavljevic, Aleksandar; Yu, Fuli
2012-01-01
Until recently, sequencing has primarily been carried out in large genome centers which have invested heavily in developing the computational infrastructure that enables genomic sequence analysis. The recent advancements in next generation sequencing (NGS) have led to a wide dissemination of sequencing technologies and data, to highly diverse research groups. It is expected that clinical sequencing will become part of diagnostic routines shortly. However, limited accessibility to computational infrastructure and high quality bioinformatic tools, and the demand for personnel skilled in data analysis and interpretation remains a serious bottleneck. To this end, the cloud computing and Software-as-a-Service (SaaS) technologies can help address these issues. We successfully enabled the Atlas2 Cloud pipeline for personal genome analysis on two different cloud service platforms: a community cloud via the Genboree Workbench, and a commercial cloud via the Amazon Web Services using Software-as-a-Service model. We report a case study of personal genome analysis using our Atlas2 Genboree pipeline. We also outline a detailed cost structure for running Atlas2 Amazon on whole exome capture data, providing cost projections in terms of storage, compute and I/O when running Atlas2 Amazon on a large data set. We find that providing a web interface and an optimized pipeline clearly facilitates usage of cloud computing for personal genome analysis, but for it to be routinely used for large scale projects there needs to be a paradigm shift in the way we develop tools, in standard operating procedures, and in funding mechanisms.
Iteratively improving Hi-C experiments one step at a time.
Golloshi, Rosela; Sanders, Jacob T; McCord, Rachel Patton
2018-06-01
The 3D organization of eukaryotic chromosomes affects key processes such as gene expression, DNA replication, cell division, and response to DNA damage. The genome-wide chromosome conformation capture (Hi-C) approach can characterize the landscape of 3D genome organization by measuring interaction frequencies between all genomic regions. Hi-C protocol improvements and rapid advances in DNA sequencing power have made Hi-C useful to study diverse biological systems, not only to elucidate the role of 3D genome structure in proper cellular function, but also to characterize genomic rearrangements, assemble new genomes, and consider chromatin interactions as potential biomarkers for diseases. Yet, the Hi-C protocol is still complex and subject to variations at numerous steps that can affect the resulting data. Thus, there is still a need for better understanding and control of factors that contribute to Hi-C experiment success and data quality. Here, we evaluate recently proposed Hi-C protocol modifications as well as often overlooked variables in sample preparation and examine their effects on Hi-C data quality. We examine artifacts that can occur during Hi-C library preparation, including microhomology-based artificial template copying and chimera formation that can add noise to the downstream data. Exploring the mechanisms underlying Hi-C artifacts pinpoints steps that should be further optimized in the future. To improve the utility of Hi-C in characterizing the 3D genome of specialized populations of cells or small samples of primary tissue, we identify steps prone to DNA loss which should be considered to adapt Hi-C to lower cell numbers. Copyright © 2018 Elsevier Inc. All rights reserved.
McGowan, Michelle L; Settersten, Richard A; Juengst, Eric T; Fishman, Jennifer R
2014-02-01
The use of molecular tools to individualize health care, predict appropriate therapies, and prevent adverse health outcomes has gained significant traction in the field of oncology under the banner of "personalized medicine" (PM). Enthusiasm for PM in oncology has been fueled by success stories of targeted treatments for a variety of cancers based on their molecular profiles. Though these are clear indications of optimism for PM, little is known about the ethical and social implications of personalized approaches in clinical oncology. The objective of this study is to assess how a range of stakeholders engaged in promoting, monitoring, and providing PM understand the challenges of integrating genomic testing and targeted therapies into clinical oncology. The study involved the analysis of in-depth interviews with 117 stakeholders whose experiences and perspectives on PM span a wide variety of institutional and professional settings. Despite their considerable enthusiasm for this shift, promoters, monitors, and providers of PM identified 4 domains that provoke heightened ethical and social concerns: (1) informed consent for cancer genomic testing, (2) privacy, confidentiality, and disclosure of genomic test results, (3) access to genomic testing and targeted therapies in oncology, and (4) the costs of scaling up pharmacogenomic testing and targeted cancer therapies. These specific concerns are not unique to oncology, or even genomics. However, those most invested in the success of PM view oncologists' responses to these challenges as precedent setting because oncology is farther along the path of clinical integration of genomic technologies than other fields of medicine. This study illustrates that the rapid emergence of PM approaches in clinical oncology provides a crucial lens for identifying and managing potential frictions and pitfalls that emerge as health care paradigms shift in these directions. © 2014 Published by Elsevier Inc.
On Computing Breakpoint Distances for Genomes with Duplicate Genes.
Shao, Mingfu; Moret, Bernard M E
2017-06-01
A fundamental problem in comparative genomics is to compute the distance between two genomes in terms of its higher level organization (given by genes or syntenic blocks). For two genomes without duplicate genes, we can easily define (and almost always efficiently compute) a variety of distance measures, but the problem is NP-hard under most models when genomes contain duplicate genes. To tackle duplicate genes, three formulations (exemplar, maximum matching, and any matching) have been proposed, all of which aim to build a matching between homologous genes so as to minimize some distance measure. Of the many distance measures, the breakpoint distance (the number of nonconserved adjacencies) was the first one to be studied and remains of significant interest because of its simplicity and model-free property. The three breakpoint distance problems corresponding to the three formulations have been widely studied. Although we provided last year a solution for the exemplar problem that runs very fast on full genomes, computing optimal solutions for the other two problems has remained challenging. In this article, we describe very fast, exact algorithms for these two problems. Our algorithms rely on a compact integer-linear program that we further simplify by developing an algorithm to remove variables, based on new results on the structure of adjacencies and matchings. Through extensive experiments using both simulations and biological data sets, we show that our algorithms run very fast (in seconds) on mammalian genomes and scale well beyond. We also apply these algorithms (as well as the classic orthology tool MSOAR) to create orthology assignment, then compare their quality in terms of both accuracy and coverage. We find that our algorithm for the "any matching" formulation significantly outperforms other methods in terms of accuracy while achieving nearly maximum coverage.
A Genomic Resource for the Development, Improvement, and Exploitation of Sorghum for Bioenergy.
Brenton, Zachary W; Cooper, Elizabeth A; Myers, Mathew T; Boyles, Richard E; Shakoor, Nadia; Zielinski, Kelsey J; Rauh, Bradley L; Bridges, William C; Morris, Geoffrey P; Kresovich, Stephen
2016-09-01
With high productivity and stress tolerance, numerous grass genera of the Andropogoneae have emerged as candidates for bioenergy production. To optimize these candidates, research examining the genetic architecture of yield, carbon partitioning, and composition is required to advance breeding objectives. Significant progress has been made developing genetic and genomic resources for Andropogoneae, and advances in comparative and computational genomics have enabled research examining the genetic basis of photosynthesis, carbon partitioning, composition, and sink strength. To provide a pivotal resource aimed at developing a comparative understanding of key bioenergy traits in the Andropogoneae, we have established and characterized an association panel of 390 racially, geographically, and phenotypically diverse Sorghum bicolor accessions with 232,303 genetic markers. Sorghum bicolor was selected because of its genomic simplicity, phenotypic diversity, significant genomic tools, and its agricultural productivity and resilience. We have demonstrated the value of sorghum as a functional model for candidate gene discovery for bioenergy Andropogoneae by performing genome-wide association analysis for two contrasting phenotypes representing key components of structural and non-structural carbohydrates. We identified potential genes, including a cellulase enzyme and a vacuolar transporter, associated with increased non-structural carbohydrates that could lead to bioenergy sorghum improvement. Although our analysis identified genes with potentially clear functions, other candidates did not have assigned functions, suggesting novel molecular mechanisms for carbon partitioning traits. These results, combined with our characterization of phenotypic and genetic diversity and the public accessibility of each accession and genomic data, demonstrate the value of this resource and provide a foundation for future improvement of sorghum and related grasses for bioenergy production. Copyright © 2016 by the Genetics Society of America.
Li, Yanwei; Ding, Xianlong; Wang, Xuan; He, Tingting; Zhang, Hao; Yang, Longshu; Wang, Tanliu; Chen, Linfeng; Gai, Junyi; Yang, Shouping
2017-08-10
DNA methylation is an important epigenetic modification. It can regulate the expression of many key genes without changing the primary structure of the genomic DNA, and plays a vital role in the growth and development of the organism. The genome-wide DNA methylation profile of the cytoplasmic male sterile (CMS) line in soybean has not been reported so far. In this study, genome-wide comparative analysis of DNA methylation between soybean CMS line NJCMS5A and its maintainer NJCMS5B was conducted by whole-genome bisulfite sequencing. The results showed 3527 differentially methylated regions (DMRs) and 485 differentially methylated genes (DMGs), including 353 high-credible methylated genes, 56 methylated genes coding unknown protein and 76 novel methylated genes with no known function were identified. Among them, 25 DMRs were further validated that the genome-wide DNA methylation data were reliable through bisulfite treatment, and 9 DMRs were confirmed the relationship between DNA methylation and gene expression by qRT-PCR. Finally, 8 key DMGs possibly associated with soybean CMS were identified. Genome-wide DNA methylation profile of the soybean CMS line NJCMS5A and its maintainer NJCMS5B was obtained for the first time. Several specific DMGs which participated in pollen and flower development were further identified to be probably associated with soybean CMS. This study will contribute to further understanding of the molecular mechanism behind soybean CMS.
Exome-wide DNA capture and next generation sequencing in domestic and wild species.
Cosart, Ted; Beja-Pereira, Albano; Chen, Shanyuan; Ng, Sarah B; Shendure, Jay; Luikart, Gordon
2011-07-05
Gene-targeted and genome-wide markers are crucial to advance evolutionary biology, agriculture, and biodiversity conservation by improving our understanding of genetic processes underlying adaptation and speciation. Unfortunately, for eukaryotic species with large genomes it remains costly to obtain genome sequences and to develop genome resources such as genome-wide SNPs. A method is needed to allow gene-targeted, next-generation sequencing that is flexible enough to include any gene or number of genes, unlike transcriptome sequencing. Such a method would allow sequencing of many individuals, avoiding ascertainment bias in subsequent population genetic analyses.We demonstrate the usefulness of a recent technology, exon capture, for genome-wide, gene-targeted marker discovery in species with no genome resources. We use coding gene sequences from the domestic cow genome sequence (Bos taurus) to capture (enrich for), and subsequently sequence, thousands of exons of B. taurus, B. indicus, and Bison bison (wild bison). Our capture array has probes for 16,131 exons in 2,570 genes, including 203 candidate genes with known function and of interest for their association with disease and other fitness traits. We successfully sequenced and mapped exon sequences from across the 29 autosomes and X chromosome in the B. taurus genome sequence. Exon capture and high-throughput sequencing identified thousands of putative SNPs spread evenly across all reference chromosomes, in all three individuals, including hundreds of SNPs in our targeted candidate genes. This study shows exon capture can be customized for SNP discovery in many individuals and for non-model species without genomic resources. Our captured exome subset was small enough for affordable next-generation sequencing, and successfully captured exons from a divergent wild species using the domestic cow genome as reference.
Hu, Jiazhi; Meyers, Robin M; Dong, Junchao; Panchakshari, Rohit A; Alt, Frederick W; Frock, Richard L
2016-05-01
Unbiased, high-throughput assays for detecting and quantifying DNA double-stranded breaks (DSBs) across the genome in mammalian cells will facilitate basic studies of the mechanisms that generate and repair endogenous DSBs. They will also enable more applied studies, such as those to evaluate the on- and off-target activities of engineered nucleases. Here we describe a linear amplification-mediated high-throughput genome-wide sequencing (LAM-HTGTS) method for the detection of genome-wide 'prey' DSBs via their translocation in cultured mammalian cells to a fixed 'bait' DSB. Bait-prey junctions are cloned directly from isolated genomic DNA using LAM-PCR and unidirectionally ligated to bridge adapters; subsequent PCR steps amplify the single-stranded DNA junction library in preparation for Illumina Miseq paired-end sequencing. A custom bioinformatics pipeline identifies prey sequences that contribute to junctions and maps them across the genome. LAM-HTGTS differs from related approaches because it detects a wide range of broken end structures with nucleotide-level resolution. Familiarity with nucleic acid methods and next-generation sequencing analysis is necessary for library generation and data interpretation. LAM-HTGTS assays are sensitive, reproducible, relatively inexpensive, scalable and straightforward to implement with a turnaround time of <1 week.
Krapohl, E; Plomin, R
2016-03-01
One of the best predictors of children's educational achievement is their family's socioeconomic status (SES), but the degree to which this association is genetically mediated remains unclear. For 3000 UK-representative unrelated children we found that genome-wide single-nucleotide polymorphisms could explain a third of the variance of scores on an age-16 UK national examination of educational achievement and half of the correlation between their scores and family SES. Moreover, genome-wide polygenic scores based on a previously published genome-wide association meta-analysis of total number of years in education accounted for ~3.0% variance in educational achievement and ~2.5% in family SES. This study provides the first molecular evidence for substantial genetic influence on differences in children's educational achievement and its association with family SES.
Krapohl, E; Plomin, R
2016-01-01
One of the best predictors of children's educational achievement is their family's socioeconomic status (SES), but the degree to which this association is genetically mediated remains unclear. For 3000 UK-representative unrelated children we found that genome-wide single-nucleotide polymorphisms could explain a third of the variance of scores on an age-16 UK national examination of educational achievement and half of the correlation between their scores and family SES. Moreover, genome-wide polygenic scores based on a previously published genome-wide association meta-analysis of total number of years in education accounted for ~3.0% variance in educational achievement and ~2.5% in family SES. This study provides the first molecular evidence for substantial genetic influence on differences in children's educational achievement and its association with family SES. PMID:25754083
Haque, M Muksitul; Holder, Lawrence B; Skinner, Michael K
2015-01-01
Environmentally induced epigenetic transgenerational inheritance of disease and phenotypic variation involves germline transmitted epimutations. The primary epimutations identified involve altered differential DNA methylation regions (DMRs). Different environmental toxicants have been shown to promote exposure (i.e., toxicant) specific signatures of germline epimutations. Analysis of genomic features associated with these epimutations identified low-density CpG regions (<3 CpG / 100bp) termed CpG deserts and a number of unique DNA sequence motifs. The rat genome was annotated for these and additional relevant features. The objective of the current study was to use a machine learning computational approach to predict all potential epimutations in the genome. A number of previously identified sperm epimutations were used as training sets. A novel machine learning approach using a sequential combination of Active Learning and Imbalance Class Learner analysis was developed. The transgenerational sperm epimutation analysis identified approximately 50K individual sites with a 1 kb mean size and 3,233 regions that had a minimum of three adjacent sites with a mean size of 3.5 kb. A select number of the most relevant genomic features were identified with the low density CpG deserts being a critical genomic feature of the features selected. A similar independent analysis with transgenerational somatic cell epimutation training sets identified a smaller number of 1,503 regions of genome-wide predicted sites and differences in genomic feature contributions. The predicted genome-wide germline (sperm) epimutations were found to be distinct from the predicted somatic cell epimutations. Validation of the genome-wide germline predicted sites used two recently identified transgenerational sperm epimutation signature sets from the pesticides dichlorodiphenyltrichloroethane (DDT) and methoxychlor (MXC) exposure lineage F3 generation. Analysis of this positive validation data set showed a 100% prediction accuracy for all the DDT-MXC sperm epimutations. Observations further elucidate the genomic features associated with transgenerational germline epimutations and identify a genome-wide set of potential epimutations that can be used to facilitate identification of epigenetic diagnostics for ancestral environmental exposures and disease susceptibility.
Milan, David J; Lubitz, Steven A; Kääb, Stefan; Ellinor, Patrick T
2010-08-01
Genome-wide association studies have been increasingly used to study the genetics of complex human diseases. Within the field of cardiac electrophysiology, this technique has been applied to conditions such as atrial fibrillation, and several electrocardiographic parameters including the QT interval. While these studies have identified multiple genomic regions associated with each trait, questions remain, including the best way to explore the pathophysiology of each association and the potential for clinical utility. This review will summarize recent genome-wide association study results within cardiac electrophysiology and discuss their broader implications in basic science and clinical medicine. Copyright 2010 Heart Rhythm Society. Published by Elsevier Inc. All rights reserved.
Tuskan, Gerry
2018-02-13
The U.S. Department of Energy Joint Genome Institute (JGI) invited scientists interested in the application of genomics to bioenergy and environmental issues, as well as all current and prospective users and collaborators, to attend the annual DOE JGI Genomics of Energy Environment Meeting held March 22-24, 2011 in Walnut Creek, Calif. The emphasis of this meeting was on the genomics of renewable energy strategies, carbon cycling, environmental gene discovery, and engineering of fuel-producing organisms. The meeting features presentations by leading scientists advancing these topics. Gerry Tuskan of Oak Ridge National Laboratory on Resequencing in Populus: Towards Genome Wide Association Genetics at the 6th annual Genomics of Energy Environment Meeting on March 23, 2011.
Wang, Shuai; Wei, Wei; Luo, Xuenong; Cai, Xuepeng
2014-01-01
Besides the complete genome, different partial genomic sequences of Hepatitis E virus (HEV) have been used in genotyping studies, making it difficult to compare the results based on them. No commonly agreed partial region for HEV genotyping has been determined. In this study, we used a statistical method to evaluate the phylogenetic performance of each partial genomic sequence from a genome wide, by comparisons of evolutionary distances between genomic regions and the full-length genomes of 101 HEV isolates to identify short genomic regions that can reproduce HEV genotype assignments based on full-length genomes. Several genomic regions, especially one genomic region at the 3'-terminal of the papain-like cysteine protease domain, were detected to have relatively high phylogenetic correlations with the full-length genome. Phylogenetic analyses confirmed the identical performances between these regions and the full-length genome in genotyping, in which the HEV isolates involved could be divided into reasonable genotypes. This analysis may be of value in developing a partial sequence-based consensus classification of HEV species.
Namkung, Junghyun; Nam, Jin-Wu; Park, Taesung
2007-01-01
Many genes with major effects on quantitative traits have been reported to interact with other genes. However, finding a group of interacting genes from thousands of SNPs is challenging. Hence, an efficient and robust algorithm is needed. The genetic algorithm (GA) is useful in searching for the optimal solution from a very large searchable space. In this study, we show that genome-wide interaction analysis using GA and a statistical interaction model can provide a practical method to detect biologically interacting loci. We focus our search on transcriptional regulators by analyzing gene x gene interactions for cancer-related genes. The expression values of three cancer-related genes were selected from the expression data of the Genetic Analysis Workshop 15 Problem 1 data set. We implemented a GA to identify the expression quantitative trait loci that are significantly associated with expression levels of the cancer-related genes. The time complexity of the GA was compared with that of an exhaustive search algorithm. As a result, our GA, which included heuristic methods, such as archive, elitism, and local search, has greatly reduced computational time in a genome-wide search for gene x gene interactions. In general, the GA took one-fifth the computation time of an exhaustive search for the most significant pair of single-nucleotide polymorphisms.
Namkung, Junghyun; Nam, Jin-Wu; Park, Taesung
2007-01-01
Many genes with major effects on quantitative traits have been reported to interact with other genes. However, finding a group of interacting genes from thousands of SNPs is challenging. Hence, an efficient and robust algorithm is needed. The genetic algorithm (GA) is useful in searching for the optimal solution from a very large searchable space. In this study, we show that genome-wide interaction analysis using GA and a statistical interaction model can provide a practical method to detect biologically interacting loci. We focus our search on transcriptional regulators by analyzing gene × gene interactions for cancer-related genes. The expression values of three cancer-related genes were selected from the expression data of the Genetic Analysis Workshop 15 Problem 1 data set. We implemented a GA to identify the expression quantitative trait loci that are significantly associated with expression levels of the cancer-related genes. The time complexity of the GA was compared with that of an exhaustive search algorithm. As a result, our GA, which included heuristic methods, such as archive, elitism, and local search, has greatly reduced computational time in a genome-wide search for gene × gene interactions. In general, the GA took one-fifth the computation time of an exhaustive search for the most significant pair of single-nucleotide polymorphisms. PMID:18466570
A universal TagModule collection for parallel genetic analysis of microorganisms
Oh, Julia; Fung, Eula; Price, Morgan N.; Dehal, Paramvir S.; Davis, Ronald W.; Giaever, Guri; Nislow, Corey; Arkin, Adam P.; Deutschbauer, Adam
2010-01-01
Systems-level analyses of non-model microorganisms are limited by the existence of numerous uncharacterized genes and a corresponding over-reliance on automated computational annotations. One solution to this challenge is to disrupt gene function using DNA tag technology, which has been highly successful in parallelizing reverse genetics in Saccharomyces cerevisiae and has led to discoveries in gene function, genetic interactions and drug mechanism of action. To extend the yeast DNA tag methodology to a wide variety of microorganisms and applications, we have created a universal, sequence-verified TagModule collection. A hallmark of the 4280 TagModules is that they are cloned into a Gateway entry vector, thus facilitating rapid transfer to any compatible genetic system. Here, we describe the application of the TagModules to rapidly generate tagged mutants by transposon mutagenesis in the metal-reducing bacterium Shewanella oneidensis MR-1 and the pathogenic yeast Candida albicans. Our results demonstrate the optimal hybridization properties of the TagModule collection, the flexibility in applying the strategy to diverse microorganisms and the biological insights that can be gained from fitness profiling tagged mutant collections. The publicly available TagModule collection is a platform-independent resource for the functional genomics of a wide range of microbial systems in the post-genome era. PMID:20494978
Photoperiod-H1 (Ppd-H1) Controls Leaf Size1[OPEN
Digel, Benedikt; Tavakol, Elahe; Verderio, Gabriele; Xu, Xin
2016-01-01
Leaf size is a major determinant of plant photosynthetic activity and biomass; however, it is poorly understood how leaf size is genetically controlled in cereal crop plants like barley (Hordeum vulgare). We conducted a genome-wide association scan for flowering time, leaf width, and leaf length in a diverse panel of European winter cultivars grown in the field and genotyped with a single-nucleotide polymorphism array. The genome-wide association scan identified PHOTOPERIOD-H1 (Ppd-H1) as a candidate gene underlying the major quantitative trait loci for flowering time and leaf size in the barley population. Microscopic phenotyping of three independent introgression lines confirmed the effect of Ppd-H1 on leaf size. Differences in the duration of leaf growth and consequent variation in leaf cell number were responsible for the leaf size differences between the Ppd-H1 variants. The Ppd-H1-dependent induction of the BARLEY MADS BOX genes BM3 and BM8 in the leaf correlated with reductions in leaf size and leaf number. Our results indicate that leaf size is controlled by the Ppd-H1- and photoperiod-dependent progression of plant development. The coordination of leaf growth with flowering may be part of a reproductive strategy to optimize resource allocation to the developing inflorescences and seeds. PMID:27457126
Meta-Analyses of Genome-Wide Association Data Hold New Promise for Addiction Genetics.
Agrawal, Arpana; Edenberg, Howard J; Gelernter, Joel
2016-09-01
Meta-analyses of genome-wide association study data have begun to lead to promising new discoveries for behavioral and psychiatrically relevant phenotypes (e.g., schizophrenia, educational attainment). We outline how this methodology can similarly lead to novel discoveries in genomic studies of substance use disorders, and discuss challenges that will need to be overcome to accomplish this goal. We illustrate our approach with the work of the newly established Substance Use Disorders workgroup of the Psychiatric Genomics Consortium.
USDA-ARS?s Scientific Manuscript database
The genome-wide association study (GWAS) is a useful tool for detecting and characterizing traits of interest including those associated with disease resistance in soybean. The availability of 50,000 single nucleotide polymorphism (SNP) markers (SoySNP50K iSelect BeadChip; www.soybase.org) on 19,652...
Genome-Wide Profiling of DNA Double-Strand Breaks by the BLESS and BLISS Methods.
Mirzazadeh, Reza; Kallas, Tomasz; Bienko, Magda; Crosetto, Nicola
2018-01-01
DNA double-strand breaks (DSBs) are major DNA lesions that are constantly formed during physiological processes such as DNA replication, transcription, and recombination, or as a result of exogenous agents such as ionizing radiation, radiomimetic drugs, and genome editing nucleases. Unrepaired DSBs threaten genomic stability by leading to the formation of potentially oncogenic rearrangements such as translocations. In past few years, several methods based on next-generation sequencing (NGS) have been developed to study the genome-wide distribution of DSBs or their conversion to translocation events. We developed Breaks Labeling, Enrichment on Streptavidin, and Sequencing (BLESS), which was the first method for direct labeling of DSBs in situ followed by their genome-wide mapping at nucleotide resolution (Crosetto et al., Nat Methods 10:361-365, 2013). Recently, we have further expanded the quantitative nature, applicability, and scalability of BLESS by developing Breaks Labeling In Situ and Sequencing (BLISS) (Yan et al., Nat Commun 8:15058, 2017). Here, we first present an overview of existing methods for genome-wide localization of DSBs, and then focus on the BLESS and BLISS methods, discussing different assay design options depending on the sample type and application.
Liu, Guozheng; Zhao, Yusheng; Gowda, Manje; Longin, C. Friedrich H.; Reif, Jochen C.; Mette, Michael F.
2016-01-01
Bread-making quality traits are central targets for wheat breeding. The objectives of our study were to (1) examine the presence of major effect QTLs for quality traits in a Central European elite wheat population, (2) explore the optimal strategy for predicting the hybrid performance for wheat quality traits, and (3) investigate the effects of marker density and the composition and size of the training population on the accuracy of prediction of hybrid performance. In total 135 inbred lines of Central European bread wheat (Triticum aestivum L.) and 1,604 hybrids derived from them were evaluated for seven quality traits in up to six environments. The 135 parental lines were genotyped using a 90k single-nucleotide polymorphism array. Genome-wide association mapping initially suggested presence of several quantitative trait loci (QTLs), but cross-validation rather indicated the absence of major effect QTLs for all quality traits except of 1000-kernel weight. Genomic selection substantially outperformed marker-assisted selection in predicting hybrid performance. A resampling study revealed that increasing the effective population size in the estimation set of hybrids is relevant to boost the accuracy of prediction for an unrelated test population. PMID:27383841
Systems metabolic engineering in an industrial setting.
Sagt, Cees M J
2013-03-01
Systems metabolic engineering is based on systems biology, synthetic biology, and evolutionary engineering and is now also applied in industry. Industrial use of systems metabolic engineering focuses on strain and process optimization. Since ambitious yields, titers, productivities, and low costs are key in an industrial setting, the use of effective and robust methods in systems metabolic engineering is becoming very important. Major improvements in the field of proteomics and metabolomics have been crucial in the development of genome-wide approaches in strain and process development. This is accompanied by a rapid increase in DNA sequencing and synthesis capacity. These developments enable the use of systems metabolic engineering in an industrial setting. Industrial systems metabolic engineering can be defined as the combined use of genome-wide genomics, transcriptomics, proteomics, and metabolomics to modify strains or processes. This approach has become very common since the technology for generating large data sets of all levels of the cellular processes has developed quite fast into robust, reliable, and affordable methods. The main challenge and scope of this mini review is how to translate these large data sets in relevant biological leads which can be tested for strain or process improvements. Experimental setup, heterogeneity of the culture, and sample pretreatment are important issues which are easily underrated. In addition, the process of structuring, filtering, and visualization of data is important, but also, the availability of a genetic toolbox and equipment for medium/high-throughput fermentation is a key success factor. For an efficient bioprocess, all the different components in this process have to work together. Therefore, mutual tuning of these components is an important strategy.
MAGNAMWAR: an R package for genome-wide association studies of bacterial orthologs.
Sexton, Corinne E; Smith, Hayden Z; Newell, Peter D; Douglas, Angela E; Chaston, John M
2018-06-01
Here we report on an R package for genome-wide association studies of orthologous genes in bacteria. Before using the software, orthologs from bacterial genomes or metagenomes are defined using local or online implementations of OrthoMCL. These presence-absence patterns are statistically associated with variation in user-collected phenotypes using the Mono-Associated GNotobiotic Animals Metagenome-Wide Association R package (MAGNAMWAR). Genotype-phenotype associations can be performed with several different statistical tests based on the type and distribution of the data. MAGNAMWAR is available on CRAN. john_chaston@byu.edu.
Chip Based Magnetic Imager for Molecular Profiling of Ovarian Cancer Cells
2016-12-01
2015) Genome-wide CRISPR screen in a mouse model of tumor growth and metastasis. Cell 160:1246-1260. PMC4380877, PMID:25748654. Acknowledgement of...Weissleder R, Lee H, Zhang F, Sharp PA (2015) Genome-wide CRISPR screen in a mouse model of tumor growth and metastasis. Cell 160:1246-1260. 5. Im H, Shao H...Lett 32(10):1229–1231. 6 of 6 | www.pnas.org/cgi/doi/10.1073/pnas.1501815112 Im et al. Resource Genome-wide CRISPR Screen in a Mouse Model of Tumor
Potential contribution of genomics and biotechnology in animal production
USDA-ARS?s Scientific Manuscript database
The overall objective of the book chapter is to define the potential contribution of genomics in livestock production in Latin American countries. A brief description on what is genomics, genome-wide association studies (GWAS), and genomic selection (GS) is provided. Genomics has been rapidly adopte...
Aguilar, I; Misztal, I; Legarra, A; Tsuruta, S
2011-12-01
Genomic evaluations can be calculated using a unified procedure that combines phenotypic, pedigree and genomic information. Implementation of such a procedure requires the inverse of the relationship matrix based on pedigree and genomic relationships. The objective of this study was to investigate efficient computing options to create relationship matrices based on genomic markers and pedigree information as well as their inverses. SNP maker information was simulated for a panel of 40 K SNPs, with the number of genotyped animals up to 30 000. Matrix multiplication in the computation of the genomic relationship was by a simple 'do' loop, by two optimized versions of the loop, and by a specific matrix multiplication subroutine. Inversion was by a generalized inverse algorithm and by a LAPACK subroutine. With the most efficient choices and parallel processing, creation of matrices for 30 000 animals would take a few hours. Matrices required to implement a unified approach can be computed efficiently. Optimizations can be either by modifications of existing code or by the use of efficient automatic optimizations provided by open source or third-party libraries. © 2011 Blackwell Verlag GmbH.
Nimmakayala, Padma; Abburi, Venkata L.; Saminathan, Thangasamy; Almeida, Aldo; Davenport, Brittany; Davidson, Joshua; Reddy, C. V. Chandra Mohan; Hankins, Gerald; Ebert, Andreas; Choi, Doil; Stommel, John; Reddy, Umesh K.
2016-01-01
Principal component analysis (PCA) with 36,621 polymorphic genome-anchored single nucleotide polymorphisms (SNPs) identified collectively for Capsicum annuum and Capsicum baccatum was used to characterize population structure and species domestication of these two important incompatible cultivated pepper species. Estimated mean nucleotide diversity (π) and Tajima's D across various chromosomes revealed biased distribution toward negative values on all chromosomes (except for chromosome 4) in cultivated C. baccatum, indicating a population bottleneck during domestication of C. baccatum. In contrast, C. annuum chromosomes showed positive π and Tajima's D on all chromosomes except chromosome 8, which may be because of domestication at multiple sites contributing to wider genetic diversity. For C. baccatum, 13,129 SNPs were available, with minor allele frequency (MAF) ≥0.05; PCA of the SNPs revealed 283 C. baccatum accessions grouped into 3 distinct clusters, for strong population structure. The fixation index (FST) between domesticated C. annuum and C. baccatum was 0.78, which indicates genome-wide divergence. We conducted extensive linkage disequilibrium (LD) analysis of C. baccatum var. pendulum cultivars on all adjacent SNP pairs within a chromosome to identify regions of high and low LD interspersed with a genome-wide average LD block size of 99.1 kb. We characterized 1742 haplotypes containing 4420 SNPs (range 9–2 SNPs per haplotype). Genome-wide association study (GWAS) of peduncle length, a trait that differentiates wild and domesticated C. baccatum types, revealed 36 significantly associated genome-wide SNPs. Population structure, identity by state (IBS) and LD patterns across the genome will be of potential use for future GWAS of economically important traits in C. baccatum peppers. PMID:27857720
Network-Based Identification and Prioritization of Key Regulators of Coronary Artery Disease Loci
Zhao, Yuqi; Chen, Jing; Freudenberg, Johannes M.; Meng, Qingying; Rajpal, Deepak K.; Yang, Xia
2017-01-01
Objective Recent genome-wide association studies of coronary artery disease (CAD) have revealed 58 genome-wide significant and 148 suggestive genetic loci. However, the molecular mechanisms through which they contribute to CAD and the clinical implications of these findings remain largely unknown. We aim to retrieve gene subnetworks of the 206 CAD loci and identify and prioritize candidate regulators to better understand the biological mechanisms underlying the genetic associations. Approach and Results We devised a new integrative genomics approach that incorporated (1) candidate genes from the top CAD loci, (2) the complete genetic association results from the 1000 genomes-based CAD genome-wide association studies from the Coronary Artery Disease Genome Wide Replication and Meta-Analysis Plus the Coronary Artery Disease consortium, (3) tissue-specific gene regulatory networks that depict the potential relationship and interactions between genes, and (4) tissue-specific gene expression patterns between CAD patients and controls. The networks and top-ranked regulators according to these data-driven criteria were further queried against literature, experimental evidence, and drug information to evaluate their disease relevance and potential as drug targets. Our analysis uncovered several potential novel regulators of CAD such as LUM and STAT3, which possess properties suitable as drug targets. We also revealed molecular relations and potential mechanisms through which the top CAD loci operate. Furthermore, we found that multiple CAD-relevant biological processes such as extracellular matrix, inflammatory and immune pathways, complement and coagulation cascades, and lipid metabolism interact in the CAD networks. Conclusions Our data-driven integrative genomics framework unraveled tissue-specific relations among the candidate genes of the CAD genome-wide association studies loci and prioritized novel network regulatory genes orchestrating biological processes relevant to CAD. PMID:26966275
Nimmakayala, Padma; Abburi, Venkata L; Saminathan, Thangasamy; Almeida, Aldo; Davenport, Brittany; Davidson, Joshua; Reddy, C V Chandra Mohan; Hankins, Gerald; Ebert, Andreas; Choi, Doil; Stommel, John; Reddy, Umesh K
2016-01-01
Principal component analysis (PCA) with 36,621 polymorphic genome-anchored single nucleotide polymorphisms (SNPs) identified collectively for Capsicum annuum and Capsicum baccatum was used to characterize population structure and species domestication of these two important incompatible cultivated pepper species. Estimated mean nucleotide diversity (π) and Tajima's D across various chromosomes revealed biased distribution toward negative values on all chromosomes (except for chromosome 4) in cultivated C. baccatum , indicating a population bottleneck during domestication of C. baccatum . In contrast, C. annuum chromosomes showed positive π and Tajima's D on all chromosomes except chromosome 8, which may be because of domestication at multiple sites contributing to wider genetic diversity. For C. baccatum , 13,129 SNPs were available, with minor allele frequency (MAF) ≥0.05; PCA of the SNPs revealed 283 C. baccatum accessions grouped into 3 distinct clusters, for strong population structure. The fixation index ( F ST ) between domesticated C. annuum and C. baccatum was 0.78, which indicates genome-wide divergence. We conducted extensive linkage disequilibrium (LD) analysis of C. baccatum var. pendulum cultivars on all adjacent SNP pairs within a chromosome to identify regions of high and low LD interspersed with a genome-wide average LD block size of 99.1 kb. We characterized 1742 haplotypes containing 4420 SNPs (range 9-2 SNPs per haplotype). Genome-wide association study (GWAS) of peduncle length, a trait that differentiates wild and domesticated C. baccatum types, revealed 36 significantly associated genome-wide SNPs. Population structure, identity by state (IBS) and LD patterns across the genome will be of potential use for future GWAS of economically important traits in C. baccatum peppers.
Liao, R; Zhang, X; Chen, Q; Wang, Z; Wang, Q; Yang, C; Pan, Y
2016-10-01
This study was designed to investigate the genetic basis of growth and egg traits in Dongxiang blue-shelled chickens and White Leghorn chickens. In this study, we employed a reduced representation sequencing approach called genotyping by genome reducing and sequencing to detect genome-wide SNPs in 252 Dongxiang blue-shelled chickens and 252 White Leghorn chickens. The Dongxiang blue-shelled chicken breed has many specific traits and is characterized by blue-shelled eggs, black plumage, black skin, black bone and black organs. The White Leghorn chicken is an egg-type breed with high productivity. As multibreed genome-wide association studies (GWASs) can improve precision due to less linkage disequilibrium across breeds, a multibreed GWAS was performed with 156 575 SNPs to identify the associated variants underlying growth and egg traits within the two chicken breeds. The analysis revealed 32 SNPs exhibiting a significant genome-wide association with growth and egg traits. Some of the significant SNPs are located in genes that are known to impact growth and egg traits, but nearly half of the significant SNPs are located in genes with unclear functions in chickens. To our knowledge, this is the first multibreed genome-wide report for the genetics of growth and egg traits in the Dongxiang blue-shelled and White Leghorn chickens. © 2016 Stichting International Foundation for Animal Genetics.
Detection of DNA Methylation by Whole-Genome Bisulfite Sequencing.
Li, Qing; Hermanson, Peter J; Springer, Nathan M
2018-01-01
DNA methylation plays an important role in the regulation of the expression of transposons and genes. Various methods have been developed to assay DNA methylation levels. Bisulfite sequencing is considered to be the "gold standard" for single-base resolution measurement of DNA methylation levels. Coupled with next-generation sequencing, whole-genome bisulfite sequencing (WGBS) allows DNA methylation to be evaluated at a genome-wide scale. Here, we described a protocol for WGBS in plant species with large genomes. This protocol has been successfully applied to assay genome-wide DNA methylation levels in maize and barley. This protocol has also been successfully coupled with sequence capture technology to assay DNA methylation levels in a targeted set of genomic regions.
Amyotte, Beatrice; Bowen, Amy J.; Banks, Travis; Rajcan, Istvan; Somers, Daryl J.
2017-01-01
Breeding apples is a long-term endeavour and it is imperative that new cultivars are selected to have outstanding consumer appeal. This study has taken the approach of merging sensory science with genome wide association analyses in order to map the human perception of apple flavour and texture onto the apple genome. The goal was to identify genomic associations that could be used in breeding apples for improved fruit quality. A collection of 85 apple cultivars was examined over two years through descriptive sensory evaluation by a trained sensory panel. The trained sensory panel scored randomized sliced samples of each apple cultivar for seventeen taste, flavour and texture attributes using controlled sensory evaluation practices. In addition, the apple collection was subjected to genotyping by sequencing for marker discovery. A genome wide association analysis suggested significant genomic associations for several sensory traits including juiciness, crispness, mealiness and fresh green apple flavour. The findings include previously unreported genomic regions that could be used in apple breeding and suggest that similar sensory association mapping methods could be applied in other plants. PMID:28231290
Amyotte, Beatrice; Bowen, Amy J; Banks, Travis; Rajcan, Istvan; Somers, Daryl J
2017-01-01
Breeding apples is a long-term endeavour and it is imperative that new cultivars are selected to have outstanding consumer appeal. This study has taken the approach of merging sensory science with genome wide association analyses in order to map the human perception of apple flavour and texture onto the apple genome. The goal was to identify genomic associations that could be used in breeding apples for improved fruit quality. A collection of 85 apple cultivars was examined over two years through descriptive sensory evaluation by a trained sensory panel. The trained sensory panel scored randomized sliced samples of each apple cultivar for seventeen taste, flavour and texture attributes using controlled sensory evaluation practices. In addition, the apple collection was subjected to genotyping by sequencing for marker discovery. A genome wide association analysis suggested significant genomic associations for several sensory traits including juiciness, crispness, mealiness and fresh green apple flavour. The findings include previously unreported genomic regions that could be used in apple breeding and suggest that similar sensory association mapping methods could be applied in other plants.
Fox, Ervin R.; Musani, Solomon K.; Barbalic, Maja; Lin, Honghuang; Yu, Bing; Ogunyankin, Kofo O.; Smith, Nicholas L.; Kutlar, Abdullah; Glazer, Nicole L.; Post, Wendy S.; Paltoo, Dina N.; Dries, Daniel L.; Farlow, Deborah N.; Duarte, Christine W.; Kardia, Sharon L.; Meyers, Kristin J.; Sun, Yan V.; Arnett, Donna K.; Patki, Amit A.; Sha, Jin; Cui, Xiangqui; Samdarshi, Tandaw E.; Penman, Alan D.; Bibbins-Domingo, Kirsten; Bůžková, Petra; Benjamin, Emelia J.; Bluemke, David A.; Morrison, Alanna C.; Heiss, Gerardo; Carr, J. Jeffrey; Tracy, Russell P.; Mosley, Thomas H.; Taylor, Herman A.; Psaty, Bruce M.; Heckbert, Susan R.; Cappola, Thomas P.; Vasan, Ramachandran S.
2013-01-01
Background Using data from four community-based cohorts of African Americans (AA), we tested the association between genome-wide markers (SNPs) and cardiac phenotypes in the Candidate-gene Association REsource (CARe) study. Methods and Results Among 6,765 AA, we related age, sex, height and weight-adjusted residuals for nine cardiac phenotypes (assessed by echocardiogram or MRI) to 2.5 million SNPs genotyped using Genome-Wide Affymetrix Human SNP Array 6.0 (Affy6.0) and the remainder imputed. Within cohort genome-wide association analysis was conducted followed by meta-analysis across cohorts using inverse variance weights (genome-wide significance threshold=4.0 ×10−07). Supplementary pathway analysis was performed. We attempted replication in 3 smaller cohorts of African ancestry and tested look-ups in one consortium of European ancestry (EchoGEN). Across the 9 phenotypes, variants in 4 genetic loci reached genome-wide significance: rs4552931 in UBE2V2 (p=1.43 × 10−07) for left ventricular mass (LVM); rs7213314 in WIPI1 (p=1.68 × 10−07) for LV internal diastolic diameter (LVIDD); rs1571099 in PPAPDC1A (p= 2.57 × 10−08) for interventricular septal wall thickness (IVST); and rs9530176 in KLF5 (p=4.02 × 10−07) for ejection fraction (EF). Associated variants were enriched in three signaling pathways involved in cardiac remodeling. None of the 4 loci replicated in cohorts of African ancestry were confirmed in look-ups in EchoGEN. Conclusions In the largest GWAS of cardiac structure and function to date in AA, we identified 4 genetic loci related to LVM, IVST, LVIDD and EF that reached genome-wide significance. Replication results suggest that these loci may represent unique to individuals of African ancestry. Additional large-scale studies are warranted for these complex phenotypes. PMID:23275298
Meta-Analysis in Genome-Wide Association Datasets: Strategies and Application in Parkinson Disease
Evangelou, Evangelos; Maraganore, Demetrius M.; Ioannidis, John P.A.
2007-01-01
Background Genome-wide association studies hold substantial promise for identifying common genetic variants that regulate susceptibility to complex diseases. However, for the detection of small genetic effects, single studies may be underpowered. Power may be improved by combining genome-wide datasets with meta-analytic techniques. Methodology/Principal Findings Both single and two-stage genome-wide data may be combined and there are several possible strategies. In the two-stage framework, we considered the options of (1) enhancement of replication data and (2) enhancement of first-stage data, and then, we also considered (3) joint meta-analyses including all first-stage and second-stage data. These strategies were examined empirically using data from two genome-wide association studies (three datasets) on Parkinson disease. In the three strategies, we derived 12, 5, and 49 single nucleotide polymorphisms that show significant associations at conventional levels of statistical significance. None of these remained significant after conservative adjustment for the number of performed analyses in each strategy. However, some may warrant further consideration: 6 SNPs were identified with at least 2 of the 3 strategies and 3 SNPs [rs1000291 on chromosome 3, rs2241743 on chromosome 4 and rs3018626 on chromosome 11] were identified with all 3 strategies and had no or minimal between-dataset heterogeneity (I2 = 0, 0 and 15%, respectively). Analyses were primarily limited by the suboptimal overlap of tested polymorphisms across different datasets (e.g., only 31,192 shared polymorphisms between the two tier 1 datasets). Conclusions/Significance Meta-analysis may be used to improve the power and examine the between-dataset heterogeneity of genome-wide association studies. Prospective designs may be most efficient, if they try to maximize the overlap of genotyping platforms and anticipate the combination of data across many genome-wide association studies. PMID:17332845
Wu, Chen; Yang, Handong; Yu, Dianke; Yang, Xiaobo; Zhang, Xiaomin; Wang, Yiqin; Sun, Jielin; Gao, Yong; Tan, Aihua; He, Yunfeng; Zhang, Haiying; Qin, Xue; Zhu, Jingwen; Li, Huaixing; Lin, Xu; Zhu, Jiang; Min, Xinwen; Lang, Mingjian; Li, Dongfeng; Zhai, Kan; Chang, Jiang; Tan, Wen; Yuan, Jing; Chen, Weihong; Wang, Youjie; Wei, Sheng; Miao, Xiaoping; Wang, Feng; Fang, Weimin; Liang, Yuan; Deng, Qifei; Dai, Xiayun; Lin, Dafeng; Huang, Suli; Guo, Huan; Lilly Zheng, S.; Xu, Jianfeng; Lin, Dongxin; Hu, Frank B.; Wu, Tangchun
2013-01-01
Plasma lipid levels are important risk factors for cardiovascular disease and are influenced by genetic and environmental factors. Recent genome wide association studies (GWAS) have identified several lipid-associated loci, but these loci have been identified primarily in European populations. In order to identify genetic markers for lipid levels in a Chinese population and analyze the heterogeneity between Europeans and Asians, especially Chinese, we performed a meta-analysis of two genome wide association studies on four common lipid traits including total cholesterol (TC), triglycerides (TG), low-density lipoprotein cholesterol (LDL) and high-density lipoprotein cholesterol (HDL) in a Han Chinese population totaling 3,451 healthy subjects. Replication was performed in an additional 8,830 subjects of Han Chinese ethnicity. We replicated eight loci associated with lipid levels previously reported in a European population. The loci genome wide significantly associated with TC were near DOCK7, HMGCR and ABO; those genome wide significantly associated with TG were near APOA1/C3/A4/A5 and LPL; those genome wide significantly associated with LDL were near HMGCR, ABO and TOMM40; and those genome wide significantly associated with HDL were near LPL, LIPC and CETP. In addition, an additive genotype score of eight SNPs representing the eight loci that were found to be associated with lipid levels was associated with higher TC, TG and LDL levels (P = 5.52×10-16, 1.38×10-6 and 5.59×10-9, respectively). These findings suggest the cumulative effects of multiple genetic loci on plasma lipid levels. Comparisons with previous GWAS of lipids highlight heterogeneity in allele frequency and in effect size for some loci between Chinese and European populations. The results from our GWAS provided comprehensive and convincing evidence of the genetic determinants of plasma lipid levels in a Chinese population. PMID:24386095
Dunn, Erin C.; Wiste, Anna; Radmanesh, Farid; Almli, Lynn M.; Gogarten, Stephanie M.; Sofer, Tamar; Faul, Jessica D.; Kardia, Sharon L.R.; Smith, Jennifer A.; Weir, David R.; Zhao, Wei; Soare, Thomas W.; Mirza, Saira S.; Hek, Karin; Tiemeier, Henning W.; Goveas, Joseph S.; Sarto, Gloria E.; Snively, Beverly M.; Cornelis, Marilyn; Koenen, Karestan C.; Kraft, Peter; Purcell, Shaun; Ressler, Kerry J.; Rosand, Jonathan; Wassertheil-Smoller, Sylvia; Smoller, Jordan W.
2016-01-01
Background Genome-wide association studies (GWAS) have been unable to identify variants linked to depression. We hypothesized that examining depressive symptoms and considering gene-environment interaction (G×E) might improve efficiency for gene discovery. We therefore conducted a GWAS and genome-wide environment interaction study (GWEIS) of depressive symptoms. Methods Using data from the SHARe cohort of the Women’s Health Initiative, comprising African Americans (n=7179) and Hispanics/Latinas (n=3138), we examined genetic main effects and G×E with stressful life events and social support. We also conducted a heritability analysis using genome-wide complex trait analysis (GCTA). Replication was attempted in four independent cohorts. Results No SNPs achieved genome-wide significance for main effects in either discovery sample. The top signals in African Americans were rs73531535 (located 20kb from GPR139, p=5.75×10−8) and rs75407252 (intronic to CACNA2D3, p=6.99×10−7). In Hispanics/Latinas, the top signals were rs2532087 (located 27kb from CD38, p=2.44×10−7) and rs4542757 (intronic to DCC, p=7.31×10−7). In the GWEIS with stressful life events, one interaction signal was genome-wide significant in African Americans (rs4652467; p=4.10×10−10; located 14kb from CEP350). This interaction was not observed in a smaller replication cohort. Although heritability estimates for depressive symptoms and stressful life events were each less than 10%, they were strongly genetically correlated (rG=0.95), suggesting that common variation underlying depressive symptoms and stressful life event exposure, though modest on their own, were highly overlapping in this sample. Conclusions Our results underscore the need for larger samples, more GWEIS, and greater investigation into genetic and environmental determinants of depressive symptoms in minorities. PMID:27038408
Dunn, Erin C; Wiste, Anna; Radmanesh, Farid; Almli, Lynn M; Gogarten, Stephanie M; Sofer, Tamar; Faul, Jessica D; Kardia, Sharon L R; Smith, Jennifer A; Weir, David R; Zhao, Wei; Soare, Thomas W; Mirza, Saira S; Hek, Karin; Tiemeier, Henning; Goveas, Joseph S; Sarto, Gloria E; Snively, Beverly M; Cornelis, Marilyn; Koenen, Karestan C; Kraft, Peter; Purcell, Shaun; Ressler, Kerry J; Rosand, Jonathan; Wassertheil-Smoller, Sylvia; Smoller, Jordan W
2016-04-01
Genome-wide association studies (GWAS) have made little progress in identifying variants linked to depression. We hypothesized that examining depressive symptoms and considering gene-environment interaction (GxE) might improve efficiency for gene discovery. We therefore conducted a GWAS and genome-wide by environment interaction study (GWEIS) of depressive symptoms. Using data from the SHARe cohort of the Women's Health Initiative, comprising African Americans (n = 7,179) and Hispanics/Latinas (n = 3,138), we examined genetic main effects and GxE with stressful life events and social support. We also conducted a heritability analysis using genome-wide complex trait analysis (GCTA). Replication was attempted in four independent cohorts. No SNPs achieved genome-wide significance for main effects in either discovery sample. The top signals in African Americans were rs73531535 (located 20 kb from GPR139, P = 5.75 × 10(-8) ) and rs75407252 (intronic to CACNA2D3, P = 6.99 × 10(-7) ). In Hispanics/Latinas, the top signals were rs2532087 (located 27 kb from CD38, P = 2.44 × 10(-7) ) and rs4542757 (intronic to DCC, P = 7.31 × 10(-7) ). In the GEWIS with stressful life events, one interaction signal was genome-wide significant in African Americans (rs4652467; P = 4.10 × 10(-10) ; located 14 kb from CEP350). This interaction was not observed in a smaller replication cohort. Although heritability estimates for depressive symptoms and stressful life events were each less than 10%, they were strongly genetically correlated (rG = 0.95), suggesting that common variation underlying self-reported depressive symptoms and stressful life event exposure, though modest on their own, were highly overlapping in this sample. Our results underscore the need for larger samples, more GEWIS, and greater investigation into genetic and environmental determinants of depressive symptoms in minorities. © 2016 Wiley Periodicals, Inc.
Genome-wide comparative analysis of four Indian Drosophila species.
Mohanty, Sujata; Khanna, Radhika
2017-12-01
Comparative analysis of multiple genomes of closely or distantly related Drosophila species undoubtedly creates excitement among evolutionary biologists in exploring the genomic changes with an ecology and evolutionary perspective. We present herewith the de novo assembled whole genome sequences of four Drosophila species, D. bipectinata, D. takahashii, D. biarmipes and D. nasuta of Indian origin using Next Generation Sequencing technology on an Illumina platform along with their detailed assembly statistics. The comparative genomics analysis, e.g. gene predictions and annotations, functional and orthogroup analysis of coding sequences and genome wide SNP distribution were performed. The whole genome of Zaprionus indianus of Indian origin published earlier by us and the genome sequences of previously sequenced 12 Drosophila species available in the NCBI database were included in the analysis. The present work is a part of our ongoing genomics project of Indian Drosophila species.
Tiengwe, Calvin; Marcello, Lucio; Farr, Helen; Dickens, Nicholas; Kelly, Steven; Swiderski, Michal; Vaughan, Diane; Gull, Keith; Barry, J. David; Bell, Stephen D.; McCulloch, Richard
2012-01-01
Summary Identification of replication initiation sites, termed origins, is a crucial step in understanding genome transmission in any organism. Transcription of the Trypanosoma brucei genome is highly unusual, with each chromosome comprising a few discrete transcription units. To understand how DNA replication occurs in the context of such organization, we have performed genome-wide mapping of the binding sites of the replication initiator ORC1/CDC6 and have identified replication origins, revealing that both localize to the boundaries of the transcription units. A remarkably small number of active origins is seen, whose spacing is greater than in any other eukaryote. We show that replication and transcription in T. brucei have a profound functional overlap, as reducing ORC1/CDC6 levels leads to genome-wide increases in mRNA levels arising from the boundaries of the transcription units. In addition, ORC1/CDC6 loss causes derepression of silent Variant Surface Glycoprotein genes, which are critical for host immune evasion. PMID:22840408
Oud, Bart; Maris, Antonius J A; Daran, Jean-Marc; Pronk, Jack T
2012-01-01
Successful reverse engineering of mutants that have been obtained by nontargeted strain improvement has long presented a major challenge in yeast biotechnology. This paper reviews the use of genome-wide approaches for analysis of Saccharomyces cerevisiae strains originating from evolutionary engineering or random mutagenesis. On the basis of an evaluation of the strengths and weaknesses of different methods, we conclude that for the initial identification of relevant genetic changes, whole genome sequencing is superior to other analytical techniques, such as transcriptome, metabolome, proteome, or array-based genome analysis. Key advantages of this technique over gene expression analysis include the independency of genome sequences on experimental context and the possibility to directly and precisely reproduce the identified changes in naive strains. The predictive value of genome-wide analysis of strains with industrially relevant characteristics can be further improved by classical genetics or simultaneous analysis of strains derived from parallel, independent strain improvement lineages. PMID:22152095
Oud, Bart; van Maris, Antonius J A; Daran, Jean-Marc; Pronk, Jack T
2012-03-01
Successful reverse engineering of mutants that have been obtained by nontargeted strain improvement has long presented a major challenge in yeast biotechnology. This paper reviews the use of genome-wide approaches for analysis of Saccharomyces cerevisiae strains originating from evolutionary engineering or random mutagenesis. On the basis of an evaluation of the strengths and weaknesses of different methods, we conclude that for the initial identification of relevant genetic changes, whole genome sequencing is superior to other analytical techniques, such as transcriptome, metabolome, proteome, or array-based genome analysis. Key advantages of this technique over gene expression analysis include the independency of genome sequences on experimental context and the possibility to directly and precisely reproduce the identified changes in naive strains. The predictive value of genome-wide analysis of strains with industrially relevant characteristics can be further improved by classical genetics or simultaneous analysis of strains derived from parallel, independent strain improvement lineages. © 2011 Federation of European Microbiological Societies. Published by Blackwell Publishing Ltd. All rights reserved.
Miklós, István; Darling, Aaron E
2009-06-22
Inversions are among the most common mutations acting on the order and orientation of genes in a genome, and polynomial-time algorithms exist to obtain a minimal length series of inversions that transform one genome arrangement to another. However, the minimum length series of inversions (the optimal sorting path) is often not unique as many such optimal sorting paths exist. If we assume that all optimal sorting paths are equally likely, then statistical inference on genome arrangement history must account for all such sorting paths and not just a single estimate. No deterministic polynomial algorithm is known to count the number of optimal sorting paths nor sample from the uniform distribution of optimal sorting paths. Here, we propose a stochastic method that uniformly samples the set of all optimal sorting paths. Our method uses a novel formulation of parallel Markov chain Monte Carlo. In practice, our method can quickly estimate the total number of optimal sorting paths. We introduce a variant of our approach in which short inversions are modeled to be more likely, and we show how the method can be used to estimate the distribution of inversion lengths and breakpoint usage in pathogenic Yersinia pestis. The proposed method has been implemented in a program called "MC4Inversion." We draw comparison of MC4Inversion to the sampler implemented in BADGER and a previously described importance sampling (IS) technique. We find that on high-divergence data sets, MC4Inversion finds more optimal sorting paths per second than BADGER and the IS technique and simultaneously avoids bias inherent in the IS technique.
Optimal knockout strategies in genome-scale metabolic networks using particle swarm optimization.
Nair, Govind; Jungreuthmayer, Christian; Zanghellini, Jürgen
2017-02-01
Knockout strategies, particularly the concept of constrained minimal cut sets (cMCSs), are an important part of the arsenal of tools used in manipulating metabolic networks. Given a specific design, cMCSs can be calculated even in genome-scale networks. We would however like to find not only the optimal intervention strategy for a given design but the best possible design too. Our solution (PSOMCS) is to use particle swarm optimization (PSO) along with the direct calculation of cMCSs from the stoichiometric matrix to obtain optimal designs satisfying multiple objectives. To illustrate the working of PSOMCS, we apply it to a toy network. Next we show its superiority by comparing its performance against other comparable methods on a medium sized E. coli core metabolic network. PSOMCS not only finds solutions comparable to previously published results but also it is orders of magnitude faster. Finally, we use PSOMCS to predict knockouts satisfying multiple objectives in a genome-scale metabolic model of E. coli and compare it with OptKnock and RobustKnock. PSOMCS finds competitive knockout strategies and designs compared to other current methods and is in some cases significantly faster. It can be used in identifying knockouts which will force optimal desired behaviors in large and genome scale metabolic networks. It will be even more useful as larger metabolic models of industrially relevant organisms become available.
Exceptionally high levels of recombination across the honey bee genome.
Beye, Martin; Gattermeier, Irene; Hasselmann, Martin; Gempe, Tanja; Schioett, Morten; Baines, John F; Schlipalius, David; Mougel, Florence; Emore, Christine; Rueppell, Olav; Sirviö, Anu; Guzmán-Novoa, Ernesto; Hunt, Greg; Solignac, Michel; Page, Robert E
2006-11-01
The first draft of the honey bee genome sequence and improved genetic maps are utilized to analyze a genome displaying 10 times higher levels of recombination (19 cM/Mb) than previously analyzed genomes of higher eukaryotes. The exceptionally high recombination rate is distributed genome-wide, but varies by two orders of magnitude. Analysis of chromosome, sequence, and gene parameters with respect to recombination showed that local recombination rate is associated with distance to the telomere, GC content, and the number of simple repeats as described for low-recombining genomes. Recombination rate does not decrease with chromosome size. On average 5.7 recombination events per chromosome pair per meiosis are found in the honey bee genome. This contrasts with a wide range of taxa that have a uniform recombination frequency of about 1.6 per chromosome pair. The excess of recombination activity does not support a mechanistic role of recombination in stabilizing pairs of homologous chromosome during chromosome pairing. Recombination rate is associated with gene size, suggesting that introns are larger in regions of low recombination and may improve the efficacy of selection in these regions. Very few transposons and no retrotransposons are present in the high-recombining genome. We propose evolutionary explanations for the exceptionally high genome-wide recombination rate.
Distinct p53 genomic binding patterns in normal and cancer-derived human cells
McCorkle, Sean R; McCombie, WR; Dunn, John J
2011-01-01
Here, we report genome-wide analysis of the tumor suppressor p53 binding sites in normal human cells. 743 high-confidence ChIP-seq peaks representing putative genomic binding sites were identified in normal IMR90 fibroblasts using a reference chromatin sample. More than 40% were located within 2 kb of a transcription start site (TSS), a distribution similar to that documented for individually studied, functional p53 binding sites and, to date, not observed by previous p53 genome-wide studies. Nearly half of the high-confidence binding sites in the IMR90 cells reside in CpG islands in marked contrast to sites reported in cancer-derived cells. The distinct genomic features of the IMR90 binding sites do not reflect a distinct preference for specific sequences, since the de novo developed p53 motif based on our study is similar to those reported by genome-wide studies of cancer cells. More likely, the different chromatin landscape in normal, compared with cancer-derived cells, influences p53 binding via modulating availability of the sites. We compared the IMR90 ChIP-seq peaks to the recently published IMR90 methylome1 and demonstrated that they are enriched at hypomethylated DNA. Our study represents the first genome-wide, de novo mapping of p53 binding sites in normal human cells and reveals that p53 binding sites reside in distinct genomic landscapes in normal and cancer-derived human cells. PMID:22127205
Bajaj, Deepak; Das, Shouvik; Badoni, Saurabh; Kumar, Vinod; Singh, Mohar; Bansal, Kailash C.; Tyagi, Akhilesh K.; Parida, Swarup K.
2015-01-01
We identified 82489 high-quality genome-wide SNPs from 93 wild and cultivated Cicer accessions through integrated reference genome- and de novo-based GBS assays. High intra- and inter-specific polymorphic potential (66–85%) and broader natural allelic diversity (6–64%) detected by genome-wide SNPs among accessions signify their efficacy for monitoring introgression and transferring target trait-regulating genomic (gene) regions/allelic variants from wild to cultivated Cicer gene pools for genetic improvement. The population-specific assignment of wild Cicer accessions pertaining to the primary gene pool are more influenced by geographical origin/phenotypic characteristics than species/gene-pools of origination. The functional significance of allelic variants (non-synonymous and regulatory SNPs) scanned from transcription factors and stress-responsive genes in differentiating wild accessions (with potential known sources of yield-contributing and stress tolerance traits) from cultivated desi and kabuli accessions, fine-mapping/map-based cloning of QTLs and determination of LD patterns across wild and cultivated gene-pools are suitably elucidated. The correlation between phenotypic (agromorphological traits) and molecular diversity-based admixed domestication patterns within six structured populations of wild and cultivated accessions via genome-wide SNPs was apparent. This suggests utility of whole genome SNPs as a potential resource for identifying naturally selected trait-regulating genomic targets/functional allelic variants adaptive to diverse agroclimatic regions for genetic enhancement of cultivated gene-pools. PMID:26208313
Müller, Bárbara S F; Neves, Leandro G; de Almeida Filho, Janeo E; Resende, Márcio F R; Muñoz, Patricio R; Dos Santos, Paulo E T; Filho, Estefano Paludzyszyn; Kirst, Matias; Grattapaglia, Dario
2017-07-11
The advent of high-throughput genotyping technologies coupled to genomic prediction methods established a new paradigm to integrate genomics and breeding. We carried out whole-genome prediction and contrasted it to a genome-wide association study (GWAS) for growth traits in breeding populations of Eucalyptus benthamii (n =505) and Eucalyptus pellita (n =732). Both species are of increasing commercial interest for the development of germplasm adapted to environmental stresses. Predictive ability reached 0.16 in E. benthamii and 0.44 in E. pellita for diameter growth. Predictive abilities using either Genomic BLUP or different Bayesian methods were similar, suggesting that growth adequately fits the infinitesimal model. Genomic prediction models using ~5000-10,000 SNPs provided predictive abilities equivalent to using all 13,787 and 19,506 SNPs genotyped in the E. benthamii and E. pellita populations, respectively. No difference was detected in predictive ability when different sets of SNPs were utilized, based on position (equidistantly genome-wide, inside genes, linkage disequilibrium pruned or on single chromosomes), as long as the total number of SNPs used was above ~5000. Predictive abilities obtained by removing relatedness between training and validation sets fell near zero for E. benthamii and were halved for E. pellita. These results corroborate the current view that relatedness is the main driver of genomic prediction, although some short-range historical linkage disequilibrium (LD) was likely captured for E. pellita. A GWAS identified only one significant association for volume growth in E. pellita, illustrating the fact that while genome-wide regression is able to account for large proportions of the heritability, very little or none of it is captured into significant associations using GWAS in breeding populations of the size evaluated in this study. This study provides further experimental data supporting positive prospects of using genome-wide data to capture large proportions of trait heritability and predict growth traits in trees with accuracies equal or better than those attainable by phenotypic selection. Additionally, our results document the superiority of the whole-genome regression approach in accounting for large proportions of the heritability of complex traits such as growth in contrast to the limited value of the local GWAS approach toward breeding applications in forest trees.
A low density microarray method for the identification of human papillomavirus type 18 variants.
Meza-Menchaca, Thuluz; Williams, John; Rodríguez-Estrada, Rocío B; García-Bravo, Aracely; Ramos-Ligonio, Ángel; López-Monteon, Aracely; Zepeda, Rossana C
2013-09-26
We describe a novel microarray based-method for the screening of oncogenic human papillomavirus 18 (HPV-18) molecular variants. Due to the fact that sequencing methodology may underestimate samples containing more than one variant we designed a specific and sensitive stacking DNA hybridization assay. This technology can be used to discriminate between three possible phylogenetic branches of HPV-18. Probes were attached covalently on glass slides and hybridized with single-stranded DNA targets. Prior to hybridization with the probes, the target strands were pre-annealed with the three auxiliary contiguous oligonucleotides flanking the target sequences. Screening HPV-18 positive cell lines and cervical samples were used to evaluate the performance of this HPV DNA microarray. Our results demonstrate that the HPV-18's variants hybridized specifically to probes, with no detection of unspecific signals. Specific probes successfully reveal detectable point mutations in these variants. The present DNA oligoarray system can be used as a reliable, sensitive and specific method for HPV-18 variant screening. Furthermore, this simple assay allows the use of inexpensive equipment, making it accessible in resource-poor settings.
A Low Density Microarray Method for the Identification of Human Papillomavirus Type 18 Variants
Meza-Menchaca, Thuluz; Williams, John; Rodríguez-Estrada, Rocío B.; García-Bravo, Aracely; Ramos-Ligonio, Ángel; López-Monteon, Aracely; Zepeda, Rossana C.
2013-01-01
We describe a novel microarray based-method for the screening of oncogenic human papillomavirus 18 (HPV-18) molecular variants. Due to the fact that sequencing methodology may underestimate samples containing more than one variant we designed a specific and sensitive stacking DNA hybridization assay. This technology can be used to discriminate between three possible phylogenetic branches of HPV-18. Probes were attached covalently on glass slides and hybridized with single-stranded DNA targets. Prior to hybridization with the probes, the target strands were pre-annealed with the three auxiliary contiguous oligonucleotides flanking the target sequences. Screening HPV-18 positive cell lines and cervical samples were used to evaluate the performance of this HPV DNA microarray. Our results demonstrate that the HPV-18's variants hybridized specifically to probes, with no detection of unspecific signals. Specific probes successfully reveal detectable point mutations in these variants. The present DNA oligoarray system can be used as a reliable, sensitive and specific method for HPV-18 variant screening. Furthermore, this simple assay allows the use of inexpensive equipment, making it accessible in resource-poor settings. PMID:24077317
2012-01-01
Background Filamentous fungi are confronted with changes and limitations of their carbon source during growth in their natural habitats and during industrial applications. To survive life-threatening starvation conditions, carbon from endogenous resources becomes mobilized to fuel maintenance and self-propagation. Key to understand the underlying cellular processes is the system-wide analysis of fungal starvation responses in a temporal and spatial resolution. The knowledge deduced is important for the development of optimized industrial production processes. Results This study describes the physiological, morphological and genome-wide transcriptional changes caused by prolonged carbon starvation during submerged batch cultivation of the filamentous fungus Aspergillus niger. Bioreactor cultivation supported highly reproducible growth conditions and monitoring of physiological parameters. Changes in hyphal growth and morphology were analyzed at distinct cultivation phases using automated image analysis. The Affymetrix GeneChip platform was used to establish genome-wide transcriptional profiles for three selected time points during prolonged carbon starvation. Compared to the exponential growth transcriptome, about 50% (7,292) of all genes displayed differential gene expression during at least one of the starvation time points. Enrichment analysis of Gene Ontology, Pfam domain and KEGG pathway annotations uncovered autophagy and asexual reproduction as major global transcriptional trends. Induced transcription of genes encoding hydrolytic enzymes was accompanied by increased secretion of hydrolases including chitinases, glucanases, proteases and phospholipases as identified by mass spectrometry. Conclusions This study is the first system-wide analysis of the carbon starvation response in a filamentous fungus. Morphological, transcriptomic and secretomic analyses identified key events important for fungal survival and their chronology. The dataset obtained forms a comprehensive framework for further elucidation of the interrelation and interplay of the individual cellular events involved. PMID:22873931
Silva-Junior, Orzenil B; Grattapaglia, Dario
2015-11-01
We used high-density single nucleotide polymorphism (SNP) data and whole-genome pooled resequencing to examine the landscape of population recombination (ρ) and nucleotide diversity (ϴw ), assess the extent of linkage disequilibrium (r(2) ) and build the highest density linkage maps for Eucalyptus. At the genome-wide level, linkage disequilibrium (LD) decayed within c. 4-6 kb, slower than previously reported from candidate gene studies, but showing considerable variation from absence to complete LD up to 50 kb. A sharp decrease in the estimate of ρ was seen when going from short to genome-wide inter-SNP distances, highlighting the dependence of this parameter on the scale of observation adopted. Recombination was correlated with nucleotide diversity, gene density and distance from the centromere, with hotspots of recombination enriched for genes involved in chemical reactions and pathways of the normal metabolic processes. The high nucleotide diversity (ϴw = 0.022) of E. grandis revealed that mutation is more important than recombination in shaping its genomic diversity (ρ/ϴw = 0.645). Chromosome-wide ancestral recombination graphs allowed us to date the split of E. grandis (1.7-4.8 million yr ago) and identify a scenario for the recent demographic history of the species. Our results have considerable practical importance to Genome Wide Association Studies (GWAS), while indicating bright prospects for genomic prediction of complex phenotypes in eucalypt breeding. © 2015 The Authors. New Phytologist © 2015 New Phytologist Trust.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tuskan, Gerry
The U.S. Department of Energy Joint Genome Institute (JGI) invited scientists interested in the application of genomics to bioenergy and environmental issues, as well as all current and prospective users and collaborators, to attend the annual DOE JGI Genomics of Energy Environment Meeting held March 22-24, 2011 in Walnut Creek, Calif. The emphasis of this meeting was on the genomics of renewable energy strategies, carbon cycling, environmental gene discovery, and engineering of fuel-producing organisms. The meeting features presentations by leading scientists advancing these topics. Gerry Tuskan of Oak Ridge National Laboratory on Resequencing in Populus: Towards Genome Wide Association Geneticsmore » at the 6th annual Genomics of Energy Environment Meeting on March 23, 2011.« less
Evolution of Centromeric Retrotransposons in Grasses
Sharma, Anupma; Presting, Gernot G.
2014-01-01
Centromeric retrotransposons (CRs) constitute a family of plant retroelements, some of which have the ability to target their insertion almost exclusively to the functional centromeres. Our exhaustive analysis of CR family members in four grass genomes revealed not only horizontal transfer (HT) of CR elements between the oryzoid and panicoid grass lineages but also their subsequent recombination with endogenous elements that in some cases created prolific recombinants in foxtail millet and sorghum. HT events are easily identifiable only in cases where host genome divergence significantly predates HT, thus documented HT events likely represent only a fraction of the total. If the more difficult to detect ancient HT events occurred at frequencies similar to those observable in present day grasses, the extant long terminal repeat retrotransposons represent the mosaic products of HT and recombination that are optimized for retrotransposition in their host genomes. This complicates not only phylogenetic analysis but also the establishment of a meaningful retrotransposon nomenclature, which we have nevertheless attempted to implement here. In contrast to the plant-centric naming convention used currently for CR elements, we classify elements primarily based on their phylogenetic relationships regardless of host plant, using the exhaustively studied maize elements assigned to six different subfamilies as a standard. The CR2 subfamily is the most widely distributed of the six CR subfamilies discovered in grass genomes to date and thus the most likely to play a functional role at grass centromeres. PMID:24814286
Developing eThread pipeline using SAGA-pilot abstraction for large-scale structural bioinformatics.
Ragothaman, Anjani; Boddu, Sairam Chowdary; Kim, Nayong; Feinstein, Wei; Brylinski, Michal; Jha, Shantenu; Kim, Joohyun
2014-01-01
While most of computational annotation approaches are sequence-based, threading methods are becoming increasingly attractive because of predicted structural information that could uncover the underlying function. However, threading tools are generally compute-intensive and the number of protein sequences from even small genomes such as prokaryotes is large typically containing many thousands, prohibiting their application as a genome-wide structural systems biology tool. To leverage its utility, we have developed a pipeline for eThread--a meta-threading protein structure modeling tool, that can use computational resources efficiently and effectively. We employ a pilot-based approach that supports seamless data and task-level parallelism and manages large variation in workload and computational requirements. Our scalable pipeline is deployed on Amazon EC2 and can efficiently select resources based upon task requirements. We present runtime analysis to characterize computational complexity of eThread and EC2 infrastructure. Based on results, we suggest a pathway to an optimized solution with respect to metrics such as time-to-solution or cost-to-solution. Our eThread pipeline can scale to support a large number of sequences and is expected to be a viable solution for genome-scale structural bioinformatics and structure-based annotation, particularly, amenable for small genomes such as prokaryotes. The developed pipeline is easily extensible to other types of distributed cyberinfrastructure.
Genome-Wide Comparative Gene Family Classification
Frech, Christian; Chen, Nansheng
2010-01-01
Correct classification of genes into gene families is important for understanding gene function and evolution. Although gene families of many species have been resolved both computationally and experimentally with high accuracy, gene family classification in most newly sequenced genomes has not been done with the same high standard. This project has been designed to develop a strategy to effectively and accurately classify gene families across genomes. We first examine and compare the performance of computer programs developed for automated gene family classification. We demonstrate that some programs, including the hierarchical average-linkage clustering algorithm MC-UPGMA and the popular Markov clustering algorithm TRIBE-MCL, can reconstruct manual curation of gene families accurately. However, their performance is highly sensitive to parameter setting, i.e. different gene families require different program parameters for correct resolution. To circumvent the problem of parameterization, we have developed a comparative strategy for gene family classification. This strategy takes advantage of existing curated gene families of reference species to find suitable parameters for classifying genes in related genomes. To demonstrate the effectiveness of this novel strategy, we use TRIBE-MCL to classify chemosensory and ABC transporter gene families in C. elegans and its four sister species. We conclude that fully automated programs can establish biologically accurate gene families if parameterized accordingly. Comparative gene family classification finds optimal parameters automatically, thus allowing rapid insights into gene families of newly sequenced species. PMID:20976221
Developing eThread Pipeline Using SAGA-Pilot Abstraction for Large-Scale Structural Bioinformatics
Ragothaman, Anjani; Feinstein, Wei; Jha, Shantenu; Kim, Joohyun
2014-01-01
While most of computational annotation approaches are sequence-based, threading methods are becoming increasingly attractive because of predicted structural information that could uncover the underlying function. However, threading tools are generally compute-intensive and the number of protein sequences from even small genomes such as prokaryotes is large typically containing many thousands, prohibiting their application as a genome-wide structural systems biology tool. To leverage its utility, we have developed a pipeline for eThread—a meta-threading protein structure modeling tool, that can use computational resources efficiently and effectively. We employ a pilot-based approach that supports seamless data and task-level parallelism and manages large variation in workload and computational requirements. Our scalable pipeline is deployed on Amazon EC2 and can efficiently select resources based upon task requirements. We present runtime analysis to characterize computational complexity of eThread and EC2 infrastructure. Based on results, we suggest a pathway to an optimized solution with respect to metrics such as time-to-solution or cost-to-solution. Our eThread pipeline can scale to support a large number of sequences and is expected to be a viable solution for genome-scale structural bioinformatics and structure-based annotation, particularly, amenable for small genomes such as prokaryotes. The developed pipeline is easily extensible to other types of distributed cyberinfrastructure. PMID:24995285
Bellenguez, Céline; Strange, Amy; Freeman, Colin; Donnelly, Peter; Spencer, Chris C A
2012-01-01
High-throughput genotyping arrays provide an efficient way to survey single nucleotide polymorphisms (SNPs) across the genome in large numbers of individuals. Downstream analysis of the data, for example in genome-wide association studies (GWAS), often involves statistical models of genotype frequencies across individuals. The complexities of the sample collection process and the potential for errors in the experimental assay can lead to biases and artefacts in an individual's inferred genotypes. Rather than attempting to model these complications, it has become a standard practice to remove individuals whose genome-wide data differ from the sample at large. Here we describe a simple, but robust, statistical algorithm to identify samples with atypical summaries of genome-wide variation. Its use as a semi-automated quality control tool is demonstrated using several summary statistics, selected to identify different potential problems, and it is applied to two different genotyping platforms and sample collections. The algorithm is written in R and is freely available at www.well.ox.ac.uk/chris-spencer chris.spencer@well.ox.ac.uk Supplementary data are available at Bioinformatics online.
Developmental Stability Covaries with Genome-Wide and Single-Locus Heterozygosity in House Sparrows
Vangestel, Carl; Mergeay, Joachim; Dawson, Deborah A.; Vandomme, Viki; Lens, Luc
2011-01-01
Fluctuating asymmetry (FA), a measure of developmental instability, has been hypothesized to increase with genetic stress. Despite numerous studies providing empirical evidence for associations between FA and genome-wide properties such as multi-locus heterozygosity, support for single-locus effects remains scant. Here we test if, and to what extent, FA co-varies with single- and multilocus markers of genetic diversity in house sparrow (Passer domesticus) populations along an urban gradient. In line with theoretical expectations, FA was inversely correlated with genetic diversity estimated at genome level. However, this relationship was largely driven by variation at a single key locus. Contrary to our expectations, relationships between FA and genetic diversity were not stronger in individuals from urban populations that experience higher nutritional stress. We conclude that loss of genetic diversity adversely affects developmental stability in P. domesticus, and more generally, that the molecular basis of developmental stability may involve complex interactions between local and genome-wide effects. Further study on the relative effects of single-locus and genome-wide effects on the developmental stability of populations with different genetic properties is therefore needed. PMID:21747940
Xu, Dong; Zhang, Yang
2013-01-01
Genome-wide protein structure prediction and structure-based function annotation have been a long-term goal in molecular biology but not yet become possible due to difficulties in modeling distant-homology targets. We developed a hybrid pipeline combining ab initio folding and template-based modeling for genome-wide structure prediction applied to the Escherichia coli genome. The pipeline was tested on 43 known sequences, where QUARK-based ab initio folding simulation generated models with TM-score 17% higher than that by traditional comparative modeling methods. For 495 unknown hard sequences, 72 are predicted to have a correct fold (TM-score > 0.5) and 321 have a substantial portion of structure correctly modeled (TM-score > 0.35). 317 sequences can be reliably assigned to a SCOP fold family based on structural analogy to existing proteins in PDB. The presented results, as a case study of E. coli, represent promising progress towards genome-wide structure modeling and fold family assignment using state-of-the-art ab initio folding algorithms. PMID:23719418
Battlay, Paul; Schmidt, Joshua M; Fournier-Level, Alexandre; Robin, Charles
2016-08-09
Scans of the Drosophila melanogaster genome have identified organophosphate resistance loci among those with the most pronounced signature of positive selection. In this study, the molecular basis of resistance to the organophosphate insecticide azinphos-methyl was investigated using the Drosophila Genetic Reference Panel, and genome-wide association. Recently released full transcriptome data were used to extend the utility of the Drosophila Genetic Reference Panel resource beyond traditional genome-wide association studies to allow systems genetics analyses of phenotypes. We found that both genomic and transcriptomic associations independently identified Cyp6g1, a gene involved in resistance to DDT and neonicotinoid insecticides, as the top candidate for azinphos-methyl resistance. This was verified by transgenically overexpressing Cyp6g1 using natural regulatory elements from a resistant allele, resulting in a 6.5-fold increase in resistance. We also identified four novel candidate genes associated with azinphos-methyl resistance, all of which are involved in either regulation of fat storage, or nervous system development. In Cyp6g1, we find a demonstrable resistance locus, a verification that transcriptome data can be used to identify variants associated with insecticide resistance, and an overlap between peaks of a genome-wide association study, and a genome-wide selective sweep analysis. Copyright © 2016 Battlay et al.
Single-Cell Based Quantitative Assay of Chromosome Transmission Fidelity
Zhu, Jin; Heinecke, Dominic; Mulla, Wahid A.; Bradford, William D.; Rubinstein, Boris; Box, Andrew; Haug, Jeffrey S.; Li, Rong
2015-01-01
Errors in mitosis are a primary cause of chromosome instability (CIN), generating aneuploid progeny cells. Whereas a variety of factors can influence CIN, under most conditions mitotic errors are rare events that have been difficult to measure accurately. Here we report a green fluorescent protein−based quantitative chromosome transmission fidelity (qCTF) assay in budding yeast that allows sensitive and quantitative detection of CIN and can be easily adapted to high-throughput analysis. Using the qCTF assay, we performed genome-wide quantitative profiling of genes that affect CIN in a dosage-dependent manner and identified genes that elevate CIN when either increased (icCIN) or decreased in copy number (dcCIN). Unexpectedly, qCTF screening also revealed genes whose change in copy number quantitatively suppress CIN, suggesting that the basal error rate of the wild-type genome is not minimized, but rather, may have evolved toward an optimal level that balances both stability and low-level karyotype variation for evolutionary adaptation. PMID:25823586
Single-Cell Based Quantitative Assay of Chromosome Transmission Fidelity.
Zhu, Jin; Heinecke, Dominic; Mulla, Wahid A; Bradford, William D; Rubinstein, Boris; Box, Andrew; Haug, Jeffrey S; Li, Rong
2015-03-30
Errors in mitosis are a primary cause of chromosome instability (CIN), generating aneuploid progeny cells. Whereas a variety of factors can influence CIN, under most conditions mitotic errors are rare events that have been difficult to measure accurately. Here we report a green fluorescent protein-based quantitative chromosome transmission fidelity (qCTF) assay in budding yeast that allows sensitive and quantitative detection of CIN and can be easily adapted to high-throughput analysis. Using the qCTF assay, we performed genome-wide quantitative profiling of genes that affect CIN in a dosage-dependent manner and identified genes that elevate CIN when either increased (icCIN) or decreased in copy number (dcCIN). Unexpectedly, qCTF screening also revealed genes whose change in copy number quantitatively suppress CIN, suggesting that the basal error rate of the wild-type genome is not minimized, but rather, may have evolved toward an optimal level that balances both stability and low-level karyotype variation for evolutionary adaptation. Copyright © 2015 Zhu et al.
An interactive web-based application for Comprehensive Analysis of RNAi-screen Data.
Dutta, Bhaskar; Azhir, Alaleh; Merino, Louis-Henri; Guo, Yongjian; Revanur, Swetha; Madhamshettiwar, Piyush B; Germain, Ronald N; Smith, Jennifer A; Simpson, Kaylene J; Martin, Scott E; Buehler, Eugen; Beuhler, Eugen; Fraser, Iain D C
2016-02-23
RNAi screens are widely used in functional genomics. Although the screen data can be susceptible to a number of experimental biases, many of these can be corrected by computational analysis. For this purpose, here we have developed a web-based platform for integrated analysis and visualization of RNAi screen data named CARD (for Comprehensive Analysis of RNAi Data; available at https://card.niaid.nih.gov). CARD allows the user to seamlessly carry out sequential steps in a rigorous data analysis workflow, including normalization, off-target analysis, integration of gene expression data, optimal thresholds for hit selection and network/pathway analysis. To evaluate the utility of CARD, we describe analysis of three genome-scale siRNA screens and demonstrate: (i) a significant increase both in selection of subsequently validated hits and in rejection of false positives, (ii) an increased overlap of hits from independent screens of the same biology and (iii) insight to microRNA (miRNA) activity based on siRNA seed enrichment.
An interactive web-based application for Comprehensive Analysis of RNAi-screen Data
Dutta, Bhaskar; Azhir, Alaleh; Merino, Louis-Henri; Guo, Yongjian; Revanur, Swetha; Madhamshettiwar, Piyush B.; Germain, Ronald N.; Smith, Jennifer A.; Simpson, Kaylene J.; Martin, Scott E.; Beuhler, Eugen; Fraser, Iain D. C.
2016-01-01
RNAi screens are widely used in functional genomics. Although the screen data can be susceptible to a number of experimental biases, many of these can be corrected by computational analysis. For this purpose, here we have developed a web-based platform for integrated analysis and visualization of RNAi screen data named CARD (for Comprehensive Analysis of RNAi Data; available at https://card.niaid.nih.gov). CARD allows the user to seamlessly carry out sequential steps in a rigorous data analysis workflow, including normalization, off-target analysis, integration of gene expression data, optimal thresholds for hit selection and network/pathway analysis. To evaluate the utility of CARD, we describe analysis of three genome-scale siRNA screens and demonstrate: (i) a significant increase both in selection of subsequently validated hits and in rejection of false positives, (ii) an increased overlap of hits from independent screens of the same biology and (iii) insight to microRNA (miRNA) activity based on siRNA seed enrichment. PMID:26902267
New technology and resources for cryptococcal research
Zhang, Nannan; Park, Yoon-Dong; Williamson, Peter R.
2014-01-01
Rapid advances in molecular biology and genome sequencing have enabled the generation of new technology and resources for cryptococcal research. RNAi-mediated specific gene knock down has become routine and more efficient by utilizing modified shRNA plasmids and convergent promoter RNAi constructs. This system was recently applied in a high-throughput screen to identify genes involved in host-pathogen interactions. Gene deletion efficiencies have also been improved by increasing rates of homologous recombination through a number of approaches, including a combination of double-joint PCR with split-marker transformation, the use of dominant selectable markers and the introduction of Cre-Loxp systems into Cryptococcus. Moreover, visualization of cryptococcal proteins has become more facile using fusions with codon-optimized fluorescent tags, such as green or red fluorescent proteins or, mCherry. Using recent genome-wide analytical tools, new transcriptional factors and regulatory proteins have been identified in novel virulence-related signaling pathways by employing microarray analysis, RNA-sequencing and proteomic analysis. PMID:25460849
Keene, Keith L; Chen, Wei-Min; Chen, Fang; Williams, Stephen R; Elkhatib, Stacey D; Hsu, Fang-Chi; Mychaleckyj, Josyf C; Doheny, Kimberly F; Pugh, Elizabeth W; Ling, Hua; Laurie, Cathy C; Gogarten, Stephanie M; Madden, Ebony B; Worrall, Bradford B; Sale, Michele M
2014-01-01
B vitamins play an important role in homocysteine metabolism, with vitamin deficiencies resulting in increased levels of homocysteine and increased risk for stroke. We performed a genome-wide association study (GWAS) in 2,100 stroke patients from the Vitamin Intervention for Stroke Prevention (VISP) trial, a clinical trial designed to determine whether the daily intake of high-dose folic acid, vitamins B6, and B12 reduce recurrent cerebral infarction. Extensive quality control (QC) measures resulted in a total of 737,081 SNPs for analysis. Genome-wide association analyses for baseline quantitative measures of folate, Vitamins B12, and B6 were completed using linear regression approaches, implemented in PLINK. Six associations met or exceeded genome-wide significance (P ≤ 5 × 10(-08)). For baseline Vitamin B12, the strongest association was observed with a non-synonymous SNP (nsSNP) located in the CUBN gene (P = 1.76 × 10(-13)). Two additional CUBN intronic SNPs demonstrated strong associations with B12 (P = 2.92 × 10(-10) and 4.11 × 10(-10)), while a second nsSNP, located in the TCN1 gene, also reached genome-wide significance (P = 5.14 × 10(-11)). For baseline measures of Vitamin B6, we identified genome-wide significant associations for SNPs at the ALPL locus (rs1697421; P = 7.06 × 10(-10) and rs1780316; P = 2.25 × 10(-08)). In addition to the six genome-wide significant associations, nine SNPs (two for Vitamin B6, six for Vitamin B12, and one for folate measures) provided suggestive evidence for association (P ≤ 10(-07)). Our GWAS study has identified six genome-wide significant associations, nine suggestive associations, and successfully replicated 5 of 16 SNPs previously reported to be associated with measures of B vitamins. The six genome-wide significant associations are located in gene regions that have shown previous associations with measures of B vitamins; however, four of the nine suggestive associations represent novel finding and warrant further investigation in additional populations.
Brant, Steven R; Okou, David T; Simpson, Claire L; Cutler, David J; Haritunians, Talin; Bradfield, Jonathan P; Chopra, Pankaj; Prince, Jarod; Begum, Ferdouse; Kumar, Archana; Huang, Chengrui; Venkateswaran, Suresh; Datta, Lisa W; Wei, Zhi; Thomas, Kelly; Herrinton, Lisa J; Klapproth, Jan-Micheal A; Quiros, Antonio J; Seminerio, Jenifer; Liu, Zhenqiu; Alexander, Jonathan S; Baldassano, Robert N; Dudley-Brown, Sharon; Cross, Raymond K; Dassopoulos, Themistocles; Denson, Lee A; Dhere, Tanvi A; Dryden, Gerald W; Hanson, John S; Hou, Jason K; Hussain, Sunny Z; Hyams, Jeffrey S; Isaacs, Kim L; Kader, Howard; Kappelman, Michael D; Katz, Jeffry; Kellermayer, Richard; Kirschner, Barbara S; Kuemmerle, John F; Kwon, John H; Lazarev, Mark; Li, Ellen; Mack, David; Mannon, Peter; Moulton, Dedrick E; Newberry, Rodney D; Osuntokun, Bankole O; Patel, Ashish S; Saeed, Shehzad A; Targan, Stephan R; Valentine, John F; Wang, Ming-Hsi; Zonca, Martin; Rioux, John D; Duerr, Richard H; Silverberg, Mark S; Cho, Judy H; Hakonarson, Hakon; Zwick, Michael E; McGovern, Dermot P B; Kugathasan, Subra
2017-01-01
The inflammatory bowel diseases (IBD) ulcerative colitis (UC) and Crohn's disease (CD) cause significant morbidity and are increasing in prevalence among all populations, including African Americans. More than 200 susceptibility loci have been identified in populations of predominantly European ancestry, but few loci have been associated with IBD in other ethnicities. We performed 2 high-density, genome-wide scans comprising 2345 cases of African Americans with IBD (1646 with CD, 583 with UC, and 116 inflammatory bowel disease unclassified) and 5002 individuals without IBD (controls, identified from the Health Retirement Study and Kaiser Permanente database). Single-nucleotide polymorphisms (SNPs) associated at P < 5.0 × 10 -8 in meta-analysis with a nominal evidence (P < .05) in each scan were considered to have genome-wide significance. We detected SNPs at HLA-DRB1, and African-specific SNPs at ZNF649 and LSAMP, with associations of genome-wide significance for UC. We detected SNPs at USP25 with associations of genome-wide significance for IBD. No associations of genome-wide significance were detected for CD. In addition, 9 genes previously associated with IBD contained SNPs with significant evidence for replication (P < 1.6 × 10 -6 ): ADCY3, CXCR6, HLA-DRB1 to HLA-DQA1 (genome-wide significance on conditioning), IL12B,PTGER4, and TNC for IBD; IL23R, PTGER4, and SNX20 (in strong linkage disequilibrium with NOD2) for CD; and KCNQ2 (near TNFRSF6B) for UC. Several of these genes, such as TNC (near TNFSF15), CXCR6, and genes associated with IBD at the HLA locus, contained SNPs with unique association patterns with African-specific alleles. We performed a genome-wide association study of African Americans with IBD and identified loci associated with UC in only this population; we also replicated IBD, CD, and UC loci identified in European populations. The detection of variants associated with IBD risk in only people of African descent demonstrates the importance of studying the genetics of IBD and other complex diseases in populations beyond those of European ancestry. Copyright © 2017 AGA Institute. Published by Elsevier Inc. All rights reserved.
Optimizing and evaluating the reconstruction of Metagenome-assembled microbial genomes.
Papudeshi, Bhavya; Haggerty, J Matthew; Doane, Michael; Morris, Megan M; Walsh, Kevin; Beattie, Douglas T; Pande, Dnyanada; Zaeri, Parisa; Silva, Genivaldo G Z; Thompson, Fabiano; Edwards, Robert A; Dinsdale, Elizabeth A
2017-11-28
Microbiome/host interactions describe characteristics that affect the host's health. Shotgun metagenomics includes sequencing a random subset of the microbiome to analyze its taxonomic and metabolic potential. Reconstruction of DNA fragments into genomes from metagenomes (called metagenome-assembled genomes) assigns unknown fragments to taxa/function and facilitates discovery of novel organisms. Genome reconstruction incorporates sequence assembly and sorting of assembled sequences into bins, characteristic of a genome. However, the microbial community composition, including taxonomic and phylogenetic diversity may influence genome reconstruction. We determine the optimal reconstruction method for four microbiome projects that had variable sequencing platforms (IonTorrent and Illumina), diversity (high or low), and environment (coral reefs and kelp forests), using a set of parameters to select for optimal assembly and binning tools. We tested the effects of the assembly and binning processes on population genome reconstruction using 105 marine metagenomes from 4 projects. Reconstructed genomes were obtained from each project using 3 assemblers (IDBA, MetaVelvet, and SPAdes) and 2 binning tools (GroopM and MetaBat). We assessed the efficiency of assemblers using statistics that including contig continuity and contig chimerism and the effectiveness of binning tools using genome completeness and taxonomic identification. We concluded that SPAdes, assembled more contigs (143,718 ± 124 contigs) of longer length (N50 = 1632 ± 108 bp), and incorporated the most sequences (sequences-assembled = 19.65%). The microbial richness and evenness were maintained across the assembly, suggesting low contig chimeras. SPAdes assembly was responsive to the biological and technological variations within the project, compared with other assemblers. Among binning tools, we conclude that MetaBat produced bins with less variation in GC content (average standard deviation: 1.49), low species richness (4.91 ± 0.66), and higher genome completeness (40.92 ± 1.75) across all projects. MetaBat extracted 115 bins from the 4 projects of which 66 bins were identified as reconstructed metagenome-assembled genomes with sequences belonging to a specific genus. We identified 13 novel genomes, some of which were 100% complete, but show low similarity to genomes within databases. In conclusion, we present a set of biologically relevant parameters for evaluation to select for optimal assembly and binning tools. For the tools we tested, SPAdes assembler and MetaBat binning tools reconstructed quality metagenome-assembled genomes for the four projects. We also conclude that metagenomes from microbial communities that have high coverage of phylogenetically distinct, and low taxonomic diversity results in highest quality metagenome-assembled genomes.
The role of genomics in the neonatal ICU.
Maresso, Karen; Broeckel, Ulrich
2009-03-01
Results of both the Human Genome and International HapMap Projects have provided the technology and resources necessary to enable fundamental advances through the study of DNA sequence variation in almost all fields of medicine, including neonatology. Genome-wide association studies are now practical, and the first of these studies are appearing in the literature. This article provides the reader with an overview of the issues in technology and study design relating to genome-wide association studies and summarizes the current state of association studies in neonatal ICU populations with a brief review of the relevant literature. Future recommendations for genomic association studies in neonatal ICU populations are also provided.
Ensembl Genomes 2013: scaling up access to genome-wide data
USDA-ARS?s Scientific Manuscript database
Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species. The project exploits and extends technologies for genome annotation, analysis and dissemination, developed in the context of the vertebrate-focused Ensembl project, and provi...
Lyng, Heidi; Lando, Malin; Brøvig, Runar S; Svendsrud, Debbie H; Johansen, Morten; Galteland, Eivind; Brustugun, Odd T; Meza-Zepeda, Leonardo A; Myklebost, Ola; Kristensen, Gunnar B; Hovig, Eivind; Stokke, Trond
2008-01-01
Absolute tumor DNA copy numbers can currently be achieved only on a single gene basis by using fluorescence in situ hybridization (FISH). We present GeneCount, a method for genome-wide calculation of absolute copy numbers from clinical array comparative genomic hybridization data. The tumor cell fraction is reliably estimated in the model. Data consistent with FISH results are achieved. We demonstrate significant improvements over existing methods for exploring gene dosages and intratumor copy number heterogeneity in cancers. PMID:18500990
Research progress of plant population genomics based on high-throughput sequencing.
Wang, Yun-sheng
2016-08-01
Population genomics, a new paradigm for population genetics, combine the concepts and techniques of genomics with the theoretical system of population genetics and improve our understanding of microevolution through identification of site-specific effect and genome-wide effects using genome-wide polymorphic sites genotypeing. With the appearance and improvement of the next generation high-throughput sequencing technology, the numbers of plant species with complete genome sequences increased rapidly and large scale resequencing has also been carried out in recent years. Parallel sequencing has also been done in some plant species without complete genome sequences. These studies have greatly promoted the development of population genomics and deepened our understanding of the genetic diversity, level of linking disequilibium, selection effect, demographical history and molecular mechanism of complex traits of relevant plant population at a genomic level. In this review, I briely introduced the concept and research methods of population genomics and summarized the research progress of plant population genomics based on high-throughput sequencing. I also discussed the prospect as well as existing problems of plant population genomics in order to provide references for related studies.
Complex multi-enhancer contacts captured by Genome Architecture Mapping (GAM)
Beagrie, Robert A.; Scialdone, Antonio; Schueler, Markus; Kraemer, Dorothee C.A.; Chotalia, Mita; Xie, Sheila Q.; Barbieri, Mariano; de Santiago, Inês; Lavitas, Liron-Mark; Branco, Miguel R.; Fraser, James; Dostie, Josée; Game, Laurence; Dillon, Niall; Edwards, Paul A.W.; Nicodemi, Mario; Pombo, Ana
2017-01-01
Summary The organization of the genome in the nucleus and the interactions of genes with their regulatory elements are key features of transcriptional control and their disruption can cause disease. We developed a novel genome-wide method, Genome Architecture Mapping (GAM), for measuring chromatin contacts, and other features of three-dimensional chromatin topology, based on sequencing DNA from a large collection of thin nuclear sections. We apply GAM to mouse embryonic stem cells and identify an enrichment for specific interactions between active genes and enhancers across very large genomic distances, using a mathematical model ‘SLICE’ (Statistical Inference of Co-segregation). GAM also reveals an abundance of three-way contacts genome-wide, especially between regions that are highly transcribed or contain super-enhancers, highlighting a previously inaccessible complexity in genome architecture and a major role for gene-expression specific contacts in organizing the genome in mammalian nuclei. PMID:28273065
A Genome-Wide Scan for Breast Cancer Risk Haplotypes among African American Women
Song, Chi; Chen, Gary K.; Millikan, Robert C.; Ambrosone, Christine B.; John, Esther M.; Bernstein, Leslie; Zheng, Wei; Hu, Jennifer J.; Ziegler, Regina G.; Nyante, Sarah; Bandera, Elisa V.; Ingles, Sue A.; Press, Michael F.; Deming, Sandra L.; Rodriguez-Gil, Jorge L.; Chanock, Stephen J.; Wan, Peggy; Sheng, Xin; Pooler, Loreall C.; Van Den Berg, David J.; Le Marchand, Loic; Kolonel, Laurence N.; Henderson, Brian E.; Haiman, Chris A.; Stram, Daniel O.
2013-01-01
Genome-wide association studies (GWAS) simultaneously investigating hundreds of thousands of single nucleotide polymorphisms (SNP) have become a powerful tool in the investigation of new disease susceptibility loci. Haplotypes are sometimes thought to be superior to SNPs and are promising in genetic association analyses. The application of genome-wide haplotype analysis, however, is hindered by the complexity of haplotypes themselves and sophistication in computation. We systematically analyzed the haplotype effects for breast cancer risk among 5,761 African American women (3,016 cases and 2,745 controls) using a sliding window approach on the genome-wide scale. Three regions on chromosomes 1, 4 and 18 exhibited moderate haplotype effects. Furthermore, among 21 breast cancer susceptibility loci previously established in European populations, 10p15 and 14q24 are likely to harbor novel haplotype effects. We also proposed a heuristic of determining the significance level and the effective number of independent tests by the permutation analysis on chromosome 22 data. It suggests that the effective number was approximately half of the total (7,794 out of 15,645), thus the half number could serve as a quick reference to evaluating genome-wide significance if a similar sliding window approach of haplotype analysis is adopted in similar populations using similar genotype density. PMID:23468962
A recessive genetic model and runs of homozygosity in major depressive disorder
Power, Robert A.; Keller, Matthew C.; Ripke, Stephan; Abdellaoui, Abdel; Wray, Naomi R.; Sullivan, Patrick F; Breen, Gerome
2014-01-01
Genome-wide association studies (GWASs) of major depressive disorder (MDD) have yet to identify variants that surpass the threshold for genome-wide significance. A recent study reported that runs of homozygosity (ROH) are associated with schizophrenia, reflecting a novel genetic risk factor resulting from increased parental relatedness and recessive genetic effects. Here we undertake an analysis of ROH for MDD using the 9,238 MDD cases and 9,521 controls reported in a recent mega-analysis of 9 GWAS. Since evidence for association with ROH could reflect a recessive mode of action at loci, we also conducted a genome-wide association analyses under a recessive model. The genome-wide association analysis using a recessive model found no significant associations. Our analysis of ROH suggested that there was significant heterogeneity of effect across studies in effect (p=0.001), and it was associated with genotyping platform and country of origin. The results of the ROH analysis show that differences across studies can lead to conflicting systematic genome-wide differences between cases and controls that are unaccounted for by traditional covariates. They highlight the sensitivity of the ROH method to spurious associations, and the need to carefully control for potential confounds in such analyses. We found no strong evidence for a recessive model underlying MDD. PMID:24482242
Li, Ming-Rui; Shi, Feng-Xue; Li, Ya-Ling; Jiang, Peng; Jiao, Lili
2017-01-01
Abstract Chinese ginseng (Panax ginseng Meyer) is a medicinally important herb and plays crucial roles in traditional Chinese medicine. Pharmacological analyses identified diverse bioactive components from Chinese ginseng. However, basic biological attributes including domestication and selection of the ginseng plant remain under-investigated. Here, we presented a genome-wide view of the domestication and selection of cultivated ginseng based on the whole genome data. A total of 8,660 protein-coding genes were selected for genome-wide scanning of the 30 wild and cultivated ginseng accessions. In complement, the 45s rDNA, chloroplast and mitochondrial genomes were included to perform phylogenetic and population genetic analyses. The observed spatial genetic structure between northern cultivated ginseng (NCG) and southern cultivated ginseng (SCG) accessions suggested multiple independent origins of cultivated ginseng. Genome-wide scanning further demonstrated that NCG and SCG have undergone distinct selection pressures during the domestication process, with more genes identified in the NCG (97 genes) than in the SCG group (5 genes). Functional analyses revealed that these genes are involved in diverse pathways, including DNA methylation, lignin biosynthesis, and cell differentiation. These findings suggested that the SCG and NCG groups have distinct demographic histories. Candidate genes identified are useful for future molecular breeding of cultivated ginseng. PMID:28922794
McCoy, Thomas H; Castro, Victor M; Snapper, Leslie A; Hart, Kamber L; Perlis, Roy H
2017-08-31
Biobanks and national registries represent a powerful tool for genomic discovery, but rely on diagnostic codes that may be unreliable and fail to capture the relationship between related diagnoses. We developed an efficient means of conducting genome-wide association studies using combinations of diagnostic codes from electronic health records (EHR) for 10845 participants in a biobanking program at two large academic medical centers. Specifically, we applied latent Dirichilet allocation to fit 50 disease topics based on diagnostic codes, then conducted genome-wide common-variant association for each topic. In sensitivity analysis, these results were contrasted with those obtained from traditional single-diagnosis phenome-wide association analysis, as well as those in which only a subset of diagnostic codes are included per topic. In meta-analysis across three biobank cohorts, we identified 23 disease-associated loci with p<1e-15, including previously associated autoimmune disease loci. In all cases, observed significant associations were of greater magnitude than for single phenome-wide diagnostic codes, and incorporation of less strongly-loading diagnostic codes enhanced association. This strategy provides a more efficient means of phenome-wide association in biobanks with coded clinical data.
McCoy, Thomas H; Castro, Victor M; Snapper, Leslie A; Hart, Kamber L; Perlis, Roy H
2017-01-01
Biobanks and national registries represent a powerful tool for genomic discovery, but rely on diagnostic codes that can be unreliable and fail to capture relationships between related diagnoses. We developed an efficient means of conducting genome-wide association studies using combinations of diagnostic codes from electronic health records for 10,845 participants in a biobanking program at two large academic medical centers. Specifically, we applied latent Dirichilet allocation to fit 50 disease topics based on diagnostic codes, then conducted a genome-wide common-variant association for each topic. In sensitivity analysis, these results were contrasted with those obtained from traditional single-diagnosis phenome-wide association analysis, as well as those in which only a subset of diagnostic codes were included per topic. In meta-analysis across three biobank cohorts, we identified 23 disease-associated loci with p < 1e-15, including previously associated autoimmune disease loci. In all cases, observed significant associations were of greater magnitude than single phenome-wide diagnostic codes, and incorporation of less strongly loading diagnostic codes enhanced association. This strategy provides a more efficient means of identifying phenome-wide associations in biobanks with coded clinical data. PMID:28861588
A survey of copy number variation in the porcine genome detected from whole-genome sequence
USDA-ARS?s Scientific Manuscript database
An important challenge to post-genomic biology is relating observed phenotypic variation to the underlying genotypic variation. Genome-wide association studies (GWAS) have made thousands of connections between single nucleotide polymorphisms (SNPs) and phenotypes, implicating regions of the genome t...
Pooled genome wide association detects association upstream of FCRL3 with Graves' disease.
Khong, Jwu Jin; Burdon, Kathryn P; Lu, Yi; Laurie, Kate; Leonardos, Lefta; Baird, Paul N; Sahebjada, Srujana; Walsh, John P; Gajdatsy, Adam; Ebeling, Peter R; Hamblin, Peter Shane; Wong, Rosemary; Forehan, Simon P; Fourlanos, Spiros; Roberts, Anthony P; Doogue, Matthew; Selva, Dinesh; Montgomery, Grant W; Macgregor, Stuart; Craig, Jamie E
2016-11-18
Graves' disease is an autoimmune thyroid disease of complex inheritance. Multiple genetic susceptibility loci are thought to be involved in Graves' disease and it is therefore likely that these can be identified by genome wide association studies. This study aimed to determine if a genome wide association study, using a pooling methodology, could detect genomic loci associated with Graves' disease. Nineteen of the top ranking single nucleotide polymorphisms including HLA-DQA1 and C6orf10, were clustered within the Major Histo-compatibility Complex region on chromosome 6p21, with rs1613056 reaching genome wide significance (p = 5 × 10 -8 ). Technical validation of top ranking non-Major Histo-compatablity complex single nucleotide polymorphisms with individual genotyping in the discovery cohort revealed four single nucleotide polymorphisms with p ≤ 10 -4 . Rs17676303 on chromosome 1q23.1, located upstream of FCRL3, showed evidence of association with Graves' disease across the discovery, replication and combined cohorts. A second single nucleotide polymorphism rs9644119 downstream of DPYSL2 showed some evidence of association supported by finding in the replication cohort that warrants further study. Pooled genome wide association study identified a genetic variant upstream of FCRL3 as a susceptibility locus for Graves' disease in addition to those identified in the Major Histo-compatibility Complex. A second locus downstream of DPYSL2 is potentially a novel genetic variant in Graves' disease that requires further confirmation.
Secure distributed genome analysis for GWAS and sequence comparison computation.
Zhang, Yihua; Blanton, Marina; Almashaqbeh, Ghada
2015-01-01
The rapid increase in the availability and volume of genomic data makes significant advances in biomedical research possible, but sharing of genomic data poses challenges due to the highly sensitive nature of such data. To address the challenges, a competition for secure distributed processing of genomic data was organized by the iDASH research center. In this work we propose techniques for securing computation with real-life genomic data for minor allele frequency and chi-squared statistics computation, as well as distance computation between two genomic sequences, as specified by the iDASH competition tasks. We put forward novel optimizations, including a generalization of a version of mergesort, which might be of independent interest. We provide implementation results of our techniques based on secret sharing that demonstrate practicality of the suggested protocols and also report on performance improvements due to our optimization techniques. This work describes our techniques, findings, and experimental results developed and obtained as part of iDASH 2015 research competition to secure real-life genomic computations and shows feasibility of securely computing with genomic data in practice.
Secure distributed genome analysis for GWAS and sequence comparison computation
2015-01-01
Background The rapid increase in the availability and volume of genomic data makes significant advances in biomedical research possible, but sharing of genomic data poses challenges due to the highly sensitive nature of such data. To address the challenges, a competition for secure distributed processing of genomic data was organized by the iDASH research center. Methods In this work we propose techniques for securing computation with real-life genomic data for minor allele frequency and chi-squared statistics computation, as well as distance computation between two genomic sequences, as specified by the iDASH competition tasks. We put forward novel optimizations, including a generalization of a version of mergesort, which might be of independent interest. Results We provide implementation results of our techniques based on secret sharing that demonstrate practicality of the suggested protocols and also report on performance improvements due to our optimization techniques. Conclusions This work describes our techniques, findings, and experimental results developed and obtained as part of iDASH 2015 research competition to secure real-life genomic computations and shows feasibility of securely computing with genomic data in practice. PMID:26733307
Hamidi Hay, E; Roberts, A
2017-04-01
Longevity is a highly important trait to the efficiency of beef cattle production. The objective of this study was to evaluate the genomic prediction of longevity and identify genomic regions associated with this trait. The data used in this study consisted of 547 Composite Gene Combination cows (1/2 Red Angus, 1/4 Charolais, 1/4 Tarentaise) born from 2002 to 2011 genotyped with Illumina BovineSNP50 BeadChip. Three models were used to assess genomic prediction: Bayes A, Bayes B and GBLUP using a genomic relationship matrix. To identify genomic regions associated with longevity 2 approaches were adopted: single marker genome wide association and Bayesian approach using GenSel software. The genomic prediction accuracy was low 0.28, 0.25, and 0.22 for Bayes A, Bayes B and GBLUP, respectively. The single-marker genome wide association study (GWAS)identified 5 loci with -value less than 0.05 after false discovery correction: UA-IFASA-7571 on chromosome 19 (58.03 Mb), ARS-BFGL-BAC-15059 on BTA 1 (28.8 Mb), ARS-BFGL-NGS-104159 on BTA3 (29.4 Mb), ARS-BFGL-NGS-32882 on BTA9 (104.07 Mb) and ARS-BFGL-NGS-32883 on BTA25 (33.77 Mb). The Bayesian GWAS yielded 4 genomic regions overlapping with the single marker GWAS results. The region with the highest percentage of genomic variance (3.73%) was detected on chromosome 19. Both GWAS approaches adopted in this study showed evidence for association with various chromosomal locations.
Syring, John V; Tennessen, Jacob A; Jennings, Tara N; Wegrzyn, Jill; Scelfo-Dalbey, Camille; Cronn, Richard
2016-01-01
Whitebark pine (Pinus albicaulis) inhabits an expansive range in western North America, and it is a keystone species of subalpine environments. Whitebark is susceptible to multiple threats - climate change, white pine blister rust, mountain pine beetle, and fire exclusion - and it is suffering significant mortality range-wide, prompting the tree to be listed as 'globally endangered' by the International Union for Conservation of Nature and 'endangered' by the Canadian government. Conservation collections (in situ and ex situ) are being initiated to preserve the genetic legacy of the species. Reliable, transferrable, and highly variable genetic markers are essential for quantifying the genetic profiles of seed collections relative to natural stands, and ensuring the completeness of conservation collections. We evaluated the use of hybridization-based target capture to enrich specific genomic regions from the 27 GB genome of whitebark pine, and to evaluate genetic variation across loci, trees, and geography. Probes were designed to capture 7,849 distinct genes, and screening was performed on 48 trees. Despite the inclusion of repetitive elements in the probe pool, the resulting dataset provided information on 4,452 genes and 32% of targeted positions (528,873 bp), and we were able to identify 12,390 segregating sites from 47 trees. Variations reveal strong geographic trends in heterozygosity and allelic richness, with trees from the southern Cascade and Sierra Range showing the greatest distinctiveness and differentiation. Our results show that even under non-optimal conditions (low enrichment efficiency; inclusion of repetitive elements in baits), targeted enrichment produces high quality, codominant genotypes from large genomes. The resulting data can be readily integrated into management and gene conservation activities for whitebark pine, and have the potential to be applied to other members of 5-needle pine group (Pinus subsect. Quinquefolia) due to their limited genetic divergence.
2010-01-01
Background The information provided by dense genome-wide markers using high throughput technology is of considerable potential in human disease studies and livestock breeding programs. Genome-wide association studies relate individual single nucleotide polymorphisms (SNP) from dense SNP panels to individual measurements of complex traits, with the underlying assumption being that any association is caused by linkage disequilibrium (LD) between SNP and quantitative trait loci (QTL) affecting the trait. Often SNP are in genomic regions of no trait variation. Whole genome Bayesian models are an effective way of incorporating this and other important prior information into modelling. However a full Bayesian analysis is often not feasible due to the large computational time involved. Results This article proposes an expectation-maximization (EM) algorithm called emBayesB which allows only a proportion of SNP to be in LD with QTL and incorporates prior information about the distribution of SNP effects. The posterior probability of being in LD with at least one QTL is calculated for each SNP along with estimates of the hyperparameters for the mixture prior. A simulated example of genomic selection from an international workshop is used to demonstrate the features of the EM algorithm. The accuracy of prediction is comparable to a full Bayesian analysis but the EM algorithm is considerably faster. The EM algorithm was accurate in locating QTL which explained more than 1% of the total genetic variation. A computational algorithm for very large SNP panels is described. Conclusions emBayesB is a fast and accurate EM algorithm for implementing genomic selection and predicting complex traits by mapping QTL in genome-wide dense SNP marker data. Its accuracy is similar to Bayesian methods but it takes only a fraction of the time. PMID:20969788
Resolving the tips of the tree of life: How much mitochondrialdata doe we need?
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bonett, Ronald M.; Macey, J. Robert; Boore, Jeffrey L.
2005-04-29
Mitochondrial (mt) DNA sequences are used extensively to reconstruct evolutionary relationships among recently diverged animals,and have constituted the most widely used markers for species- and generic-level relationships for the last decade or more. However, most studies to date have employed relatively small portions of the mt-genome. In contrast, complete mt-genomes primarily have been used to investigate deep divergences, including several studies of the amount of mt sequence necessary to recover ancient relationships. We sequenced and analyzed 24 complete mt-genomes from a group of salamander species exhibiting divergences typical of those in many species-level studies. We present the first comprehensive investigationmore » of the amount of mt sequence data necessary to consistently recover the mt-genome tree at this level, using parsimony and Bayesian methods. Both methods of phylogenetic analysis revealed extremely similar results. A surprising number of well supported, yet conflicting, relationships were found in trees based on fragments less than {approx}2000 nucleotides (nt), typical of the vast majority of the thousands of mt-based studies published to date. Large amounts of data (11,500+ nt) were necessary to consistently recover the whole mt-genome tree. Some relationships consistently were recovered with fragments of all sizes, but many nodes required the majority of the mt-genome to stabilize, particularly those associated with short internal branches. Although moderate amounts of data (2000-3000 nt) were adequate to recover mt-based relationships for which most nodes were congruent with the whole mt-genome tree, many thousands of nucleotides were necessary to resolve rapid bursts of evolution. Recent advances in genomics are making collection of large amounts of sequence data highly feasible, and our results provide the basis for comparative studies of other closely related groups to optimize mt sequence sampling and phylogenetic resolution at the ''tips'' of the Tree of Life.« less
Darling, Aaron E.
2009-01-01
Inversions are among the most common mutations acting on the order and orientation of genes in a genome, and polynomial-time algorithms exist to obtain a minimal length series of inversions that transform one genome arrangement to another. However, the minimum length series of inversions (the optimal sorting path) is often not unique as many such optimal sorting paths exist. If we assume that all optimal sorting paths are equally likely, then statistical inference on genome arrangement history must account for all such sorting paths and not just a single estimate. No deterministic polynomial algorithm is known to count the number of optimal sorting paths nor sample from the uniform distribution of optimal sorting paths. Here, we propose a stochastic method that uniformly samples the set of all optimal sorting paths. Our method uses a novel formulation of parallel Markov chain Monte Carlo. In practice, our method can quickly estimate the total number of optimal sorting paths. We introduce a variant of our approach in which short inversions are modeled to be more likely, and we show how the method can be used to estimate the distribution of inversion lengths and breakpoint usage in pathogenic Yersinia pestis. The proposed method has been implemented in a program called “MC4Inversion.” We draw comparison of MC4Inversion to the sampler implemented in BADGER and a previously described importance sampling (IS) technique. We find that on high-divergence data sets, MC4Inversion finds more optimal sorting paths per second than BADGER and the IS technique and simultaneously avoids bias inherent in the IS technique. PMID:20333186
Taber, Jennifer M; Klein, William M P; Ferrer, Rebecca A; Lewis, Katie L; Biesecker, Leslie G; Biesecker, Barbara B
2015-07-01
Dispositional optimism and risk perceptions are each associated with health-related behaviors and decisions and other outcomes, but little research has examined how these constructs interact, particularly in consequential health contexts. The predictive validity of risk perceptions for health-related information seeking and intentions may be improved by examining dispositional optimism as a moderator, and by testing alternate types of risk perceptions, such as comparative and experiential risk. Participants (n = 496) had their genomes sequenced as part of a National Institutes of Health pilot cohort study (ClinSeq®). Participants completed a cross-sectional baseline survey of various types of risk perceptions and intentions to learn genome sequencing results for differing disease risks (e.g., medically actionable, nonmedically actionable, carrier status) and to use this information to change their lifestyle/health behaviors. Risk perceptions (absolute, comparative, and experiential) were largely unassociated with intentions to learn sequencing results. Dispositional optimism and comparative risk perceptions interacted, however, such that individuals higher in optimism reported greater intentions to learn all 3 types of sequencing results when comparative risk was perceived to be higher than when it was perceived to be lower. This interaction was inconsistent for experiential risk and absent for absolute risk. Independent of perceived risk, participants high in dispositional optimism reported greater interest in learning risks for nonmedically actionable disease and carrier status, and greater intentions to use genome information to change their lifestyle/health behaviors. The relationship between risk perceptions and intentions may depend on how risk perceptions are assessed and on degree of optimism. (c) 2015 APA, all rights reserved.
Taber, Jennifer M.; Klein, William M. P.; Ferrer, Rebecca A.; Lewis, Katie L.; Biesecker, Leslie G.; Biesecker, Barbara B.
2015-01-01
Objective Dispositional optimism and risk perceptions are each associated with health-related behaviors and decisions and other outcomes, but little research has examined how these constructs interact, particularly in consequential health contexts. The predictive validity of risk perceptions for health-related information seeking and intentions may be improved by examining dispositional optimism as a moderator, and by testing alternate types of risk perceptions, such as comparative and experiential risk. Method Participants (n = 496) had their genomes sequenced as part of a National Institutes of Health pilot cohort study (ClinSeq®). Participants completed a cross-sectional baseline survey of various types of risk perceptions and intentions to learn genome sequencing results for differing disease risks (e.g., medically actionable, nonmedically actionable, carrier status) and to use this information to change their lifestyle/health behaviors. Results Risk perceptions (absolute, comparative, and experiential) were largely unassociated with intentions to learn sequencing results. Dispositional optimism and comparative risk perceptions interacted, however, such that individuals higher in optimism reported greater intentions to learn all 3 types of sequencing results when comparative risk was perceived to be higher than when it was perceived to be lower. This interaction was inconsistent for experiential risk and absent for absolute risk. Independent of perceived risk, participants high in dispositional optimism reported greater interest in learning risks for nonmedically actionable disease and carrier status, and greater intentions to use genome information to change their lifestyle/health behaviors. Conclusions The relationship between risk perceptions and intentions may depend on how risk perceptions are assessed and on degree of optimism. PMID:25313897
Schönhals, E M; Ortega, F; Barandalla, L; Aragones, A; Ruiz de Galarreta, J I; Liao, J-C; Sanetomo, R; Walkemeier, B; Tacke, E; Ritter, E; Gebhardt, C
2016-04-01
SNPs in candidate genes Pain - 1, InvCD141 (invertases), SSIV (starch synthase), StCDF1 (transcription factor), LapN (leucine aminopeptidase), and cytoplasm type are associated with potato tuber yield, starch content and/or starch yield. Tuber yield (TY), starch content (TSC), and starch yield (TSY) are complex characters of high importance for the potato crop in general and for industrial starch production in particular. DNA markers associated with superior alleles of genes that control the natural variation of TY, TSC, and TSY could increase precision and speed of breeding new cultivars optimized for potato starch production. Diagnostic DNA markers are identified by association mapping in populations of tetraploid potato varieties and advanced breeding clones. A novel association mapping population of 282 genotypes including varieties, breeding clones and Andean landraces was assembled and field evaluated in Northern Spain for TY, TSC, TSY, tuber number (TN) and tuber weight (TW). The landraces had lower mean values of TY, TW, TN, and TSY. The population was genotyped for 183 microsatellite alleles, 221 single nucleotide polymorphisms (SNPs) in fourteen candidate genes and eight known diagnostic markers for TSC and TSY. Association test statistics including kinship and population structure reproduced five known marker-trait associations of candidate genes and discovered new ones, particularly for tuber yield and starch yield. The inclusion of landraces increased the number of detected marker-trait associations. Integration of the present association mapping results with previous QTL linkage mapping studies for TY, TSC, TSY, TW, TN, and tuberization revealed some hot spots of QTL for these traits in the potato genome. The genomic positions of markers linked or associated with QTL for complex tuber traits suggest high multiplicity and genome wide distribution of the underlying genes.
Reddy, Umesh K.; Nimmakayala, Padma; Abburi, Venkata Lakshmi; Reddy, C. V. C. M.; Saminathan, Thangasamy; Percy, Richard G.; Yu, John Z.; Frelichowski, James; Udall, Joshua A.; Page, Justin T.; Zhang, Dong; Shehzad, Tariq; Paterson, Andrew H.
2017-01-01
Use of 10,129 singleton SNPs of known genomic location in tetraploid cotton provided unique opportunities to characterize genome-wide diversity among 440 Gossypium hirsutum and 219 G. barbadense cultivars and landrace accessions of widespread origin. Using the SNPs distributed genome-wide, we examined genetic diversity, haplotype distribution and linkage disequilibrium patterns in the G. hirsutum and G. barbadense genomes to clarify population demographic history. Diversity and identity-by-state analyses have revealed little sharing of alleles between the two cultivated allotetraploid genomes, with a few exceptions that indicated sporadic gene flow. We found a high number of new alleles, representing increased nucleotide diversity, on chromosomes 1 and 2 in cultivated G. hirsutum as compared with low nucleotide diversity on these chromosomes in landrace G. hirsutum. In contrast, G. barbadense chromosomes showed negative Tajima’s D on several chromosomes for both cultivated and landrace types, which indicate that speciation of G. barbadense itself, might have occurred with relatively narrow genetic diversity. The presence of conserved linkage disequilibrium (LD) blocks and haplotypes between G. hirsutum and G. barbadense provides strong evidence for comparable patterns of evolution in their domestication processes. Our study illustrates the potential use of population genetic techniques to identify genomic regions for domestication. PMID:28128280
Impacts of Genome-Wide Analyses on Our Understanding of Human Herpesvirus Diversity and Evolution.
Renner, Daniel W; Szpara, Moriah L
2018-01-01
Until fairly recently, genome-wide evolutionary dynamics and within-host diversity were more commonly examined in the context of small viruses than in the context of large double-stranded DNA viruses such as herpesviruses. The high mutation rates and more compact genomes of RNA viruses have inspired the investigation of population dynamics for these species, and recent data now suggest that herpesviruses might also be considered candidates for population modeling. High-throughput sequencing (HTS) and bioinformatics have expanded our understanding of herpesviruses through genome-wide comparisons of sequence diversity, recombination, allele frequency, and selective pressures. Here we discuss recent data on the mechanisms that generate herpesvirus genomic diversity and underlie the evolution of these virus families. We focus on human herpesviruses, with key insights drawn from veterinary herpesviruses and other large DNA virus families. We consider the impacts of cell culture on herpesvirus genomes and how to accurately describe the viral populations under study. The need for a strong foundation of high-quality genomes is also discussed, since it underlies all secondary genomic analyses such as RNA sequencing (RNA-Seq), chromatin immunoprecipitation, and ribosome profiling. Areas where we foresee future progress, such as the linking of viral genetic differences to phenotypic or clinical outcomes, are highlighted as well. Copyright © 2017 Renner and Szpara.
Plant Enhancers: A Call for Discovery.
Weber, Blaise; Zicola, Johan; Oka, Rurika; Stam, Maike
2016-11-01
Higher eukaryotes typically contain many different cell types, displaying different cellular functions that are influenced by biotic and abiotic cues. The different functions are characterized by specific gene expression patterns mediated by regulatory sequences such as transcriptional enhancers. Recent genome-wide approaches have identified thousands of enhancers in animals, reviving interest in enhancers in gene regulation. Although the regulatory roles of plant enhancers are as crucial as those in animals, genome-wide approaches have only very recently been applied to plants. Here we review characteristics of enhancers at the DNA and chromatin level in plants and other species, their similarities and differences, and techniques widely used for genome-wide discovery of enhancers in animal systems that can be implemented in plants. Copyright © 2016 Elsevier Ltd. All rights reserved.
Genome-wide association studies and epigenome-wide association studies go together in cancer control
Verma, Mukesh
2016-01-01
Completion of the human genome a decade ago laid the foundation for: using genetic information in assessing risk to identify individuals and populations that are likely to develop cancer, and designing treatments based on a person's genetic profiling (precision medicine). Genome-wide association studies (GWAS) completed during the past few years have identified risk-associated single nucleotide polymorphisms that can be used as screening tools in epidemiologic studies of a variety of tumor types. This led to the conduct of epigenome-wide association studies (EWAS). This article discusses the current status, challenges and research opportunities in GWAS and EWAS. Information gained from GWAS and EWAS has potential applications in cancer control and treatment. PMID:27079684
Genome-wide selection components analysis in a fish with male pregnancy.
Flanagan, Sarah P; Jones, Adam G
2017-04-01
A major goal of evolutionary biology is to identify the genome-level targets of natural and sexual selection. With the advent of next-generation sequencing, whole-genome selection components analysis provides a promising avenue in the search for loci affected by selection in nature. Here, we implement a genome-wide selection components analysis in the sex role reversed Gulf pipefish, Syngnathus scovelli. Our approach involves a double-digest restriction-site associated DNA sequencing (ddRAD-seq) technique, applied to adult females, nonpregnant males, pregnant males, and their offspring. An F ST comparison of allele frequencies among these groups reveals 47 genomic regions putatively experiencing sexual selection, as well as 468 regions showing a signature of differential viability selection between males and females. A complementary likelihood ratio test identifies similar patterns in the data as the F ST analysis. Sexual selection and viability selection both tend to favor the rare alleles in the population. Ultimately, we conclude that genome-wide selection components analysis can be a useful tool to complement other approaches in the effort to pinpoint genome-level targets of selection in the wild. © 2017 The Author(s). Evolution © 2017 The Society for the Study of Evolution.
Transcription facilitated genome-wide recruitment of topoisomerase I and DNA gyrase.
Ahmed, Wareed; Sala, Claudia; Hegde, Shubhada R; Jha, Rajiv Kumar; Cole, Stewart T; Nagaraja, Valakunja
2017-05-01
Movement of the transcription machinery along a template alters DNA topology resulting in the accumulation of supercoils in DNA. The positive supercoils generated ahead of transcribing RNA polymerase (RNAP) and the negative supercoils accumulating behind impose severe topological constraints impeding transcription process. Previous studies have implied the role of topoisomerases in the removal of torsional stress and the maintenance of template topology but the in vivo interaction of functionally distinct topoisomerases with heterogeneous chromosomal territories is not deciphered. Moreover, how the transcription-induced supercoils influence the genome-wide recruitment of DNA topoisomerases remains to be explored in bacteria. Using ChIP-Seq, we show the genome-wide occupancy profile of both topoisomerase I and DNA gyrase in conjunction with RNAP in Mycobacterium tuberculosis taking advantage of minimal topoisomerase representation in the organism. The study unveils the first in vivo genome-wide interaction of both the topoisomerases with the genomic regions and establishes that transcription-induced supercoils govern their recruitment at genomic sites. Distribution profiles revealed co-localization of RNAP and the two topoisomerases on the active transcriptional units (TUs). At a given locus, topoisomerase I and DNA gyrase were localized behind and ahead of RNAP, respectively, correlating with the twin-supercoiled domains generated. The recruitment of topoisomerases was higher at the genomic loci with higher transcriptional activity and/or at regions under high torsional stress compared to silent genomic loci. Importantly, the occupancy of DNA gyrase, sole type II topoisomerase in Mtb, near the Ter domain of the Mtb chromosome validates its function as a decatenase.
Transcription facilitated genome-wide recruitment of topoisomerase I and DNA gyrase
Ahmed, Wareed; Sala, Claudia; Hegde, Shubhada R.; Jha, Rajiv Kumar
2017-01-01
Movement of the transcription machinery along a template alters DNA topology resulting in the accumulation of supercoils in DNA. The positive supercoils generated ahead of transcribing RNA polymerase (RNAP) and the negative supercoils accumulating behind impose severe topological constraints impeding transcription process. Previous studies have implied the role of topoisomerases in the removal of torsional stress and the maintenance of template topology but the in vivo interaction of functionally distinct topoisomerases with heterogeneous chromosomal territories is not deciphered. Moreover, how the transcription-induced supercoils influence the genome-wide recruitment of DNA topoisomerases remains to be explored in bacteria. Using ChIP-Seq, we show the genome-wide occupancy profile of both topoisomerase I and DNA gyrase in conjunction with RNAP in Mycobacterium tuberculosis taking advantage of minimal topoisomerase representation in the organism. The study unveils the first in vivo genome-wide interaction of both the topoisomerases with the genomic regions and establishes that transcription-induced supercoils govern their recruitment at genomic sites. Distribution profiles revealed co-localization of RNAP and the two topoisomerases on the active transcriptional units (TUs). At a given locus, topoisomerase I and DNA gyrase were localized behind and ahead of RNAP, respectively, correlating with the twin-supercoiled domains generated. The recruitment of topoisomerases was higher at the genomic loci with higher transcriptional activity and/or at regions under high torsional stress compared to silent genomic loci. Importantly, the occupancy of DNA gyrase, sole type II topoisomerase in Mtb, near the Ter domain of the Mtb chromosome validates its function as a decatenase. PMID:28463980
Nicholson, Wayne L; Zhalnina, Kateryna; de Oliveira, Rafael R; Triplett, Eric W
2015-02-01
A novel, psychrotolerant facultative anaerobe, strain WN1359(T), was isolated from a permafrost borehole sample collected at the right bank of the Kolyma River in Siberia, Russia. Gram-positive-staining, non-motile, rod-shaped cells were observed with sizes of 1-2 µm long and 0.4-0.5 µm wide. Growth occurred in the range of pH 5.8-9.0 with optimal growth at pH 7.8-8.6 (pH optimum 8.2). The novel isolate grew at temperatures from 0-37 °C and optimal growth occurred at 25 °C. The novel isolate does not require NaCl; growth was observed between 0 and 8.8 % (1.5 M) NaCl with optimal growth at 0.5 % (w/v) NaCl. The isolate was a catalase-negative, facultatively anaerobic chemo-organoheterotroph that used sugars but not several single amino acids or dipeptides as substrates. The major metabolic end-product was lactic acid in the ratio of 86 % l-lactate : 14 % d-lactate. Strain WN1359(T) was sensitive to ampicillin, chloramphenicol, fusidic acid, lincomycin, monocycline, rifampicin, rifamycin SV, spectinomycin, streptomycin, troleandomycin and vancomycin, and resistant to nalidixic acid and aztreonam. The fatty acid content was predominantly unsaturated (70.2 %), branched-chain unsaturated (11.7 %) and saturated (12.5 %). The DNA G+C content was 35.3 mol% by whole genome sequence analysis. 16S rRNA gene sequence analysis showed 98.7 % sequence identity between strain WN1359(T) and Carnobacterium inhibens. Genome relatedness was computed using both Genome-to-Genome Distance Analysis (GGDA) and Average Nucleotide Identity (ANI), which both strongly supported strain WN1359(T) belonging to the species C. inhibens. On the basis of these results, the permafrost isolate WN1359(T) represents a novel subspecies of C. inhibens, for which the name Carnobacterium inhibens subsp. gilichinskyi subsp. nov. is proposed. The type strain is WN1359(T) ( = ATCC BAA-2557(T) = DSM 27470(T)). The subspecies Carnobacterium inhibens subsp. inhibens subsp. nov. is created automatically. An emended description of C. inhibens is also provided. © 2015 IUMS.
Genome wide approaches to identify protein-DNA interactions.
Ma, Tao; Ye, Zhenqing; Wang, Liguo
2018-05-29
Transcription factors are DNA-binding proteins that play key roles in many fundamental biological processes. Unraveling their interactions with DNA is essential to identify their target genes and understand the regulatory network. Genome-wide identification of their binding sites became feasible thanks to recent progress in experimental and computational approaches. ChIP-chip, ChIP-seq, and ChIP-exo are three widely used techniques to demarcate genome-wide transcription factor binding sites. This review aims to provide an overview of these three techniques including their experiment procedures, computational approaches, and popular analytic tools. ChIP-chip, ChIP-seq, and ChIP-exo have been the major techniques to study genome-wide in vivo protein-DNA interaction. Due to the rapid development of next-generation sequencing technology, array-based ChIP-chip is deprecated and ChIP-seq has become the most widely used technique to identify transcription factor binding sites in genome-wide. The newly developed ChIP-exo further improves the spatial resolution to single nucleotide. Numerous tools have been developed to analyze ChIP-chip, ChIP-seq and ChIP-exo data. However, different programs may employ different mechanisms or underlying algorithms thus each will inherently include its own set of statistical assumption and bias. So choosing the most appropriate analytic program for a given experiment needs careful considerations. Moreover, most programs only have command line interface so their installation and usage will require basic computation expertise in Unix/Linux. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
A Genome-Wide Association Study of Chronic Obstructive Pulmonary Disease in Hispanics
Chen, Wei; Brehm, John M.; Manichaikul, Ani; Cho, Michael H.; Boutaoui, Nadia; Yan, Qi; Burkart, Kristin M.; Enright, Paul L.; Rotter, Jerome I.; Petersen, Hans; Leng, Shuguang; Obeidat, Ma’en; Bossé, Yohan; Brandsma, Corry-Anke; Hao, Ke; Rich, Stephen S.; Powell, Rhea; Avila, Lydiana; Soto-Quiros, Manuel; Silverman, Edwin K.; Tesfaigzi, Yohannes; Barr, R. Graham
2015-01-01
Rationale: Genome-wide association studies (GWAS) of chronic obstructive pulmonary disease (COPD) have identified disease-susceptibility loci, mostly in subjects of European descent. Objectives: We hypothesized that by studying Hispanic populations we would be able to identify unique loci that contribute to COPD pathogenesis in Hispanics but remain undetected in GWAS of non-Hispanic populations. Methods: We conducted a metaanalysis of two GWAS of COPD in independent cohorts of Hispanics in Costa Rica and the United States (Multi-Ethnic Study of Atherosclerosis [MESA]). We performed a replication study of the top single-nucleotide polymorphisms in an independent Hispanic cohort in New Mexico (the Lovelace Smokers Cohort). We also attempted to replicate prior findings from genome-wide studies in non-Hispanic populations in Hispanic cohorts. Measurements and Main Results: We found no genome-wide significant association with COPD in our metaanalysis of Costa Rica and MESA. After combining the top results from this metaanalysis with those from our replication study in the Lovelace Smokers Cohort, we identified two single-nucleotide polymorphisms approaching genome-wide significance for an association with COPD. The first (rs858249, combined P value = 6.1 × 10−8) is near the genes KLHL7 and NUPL2 on chromosome 7. The second (rs286499, combined P value = 8.4 × 10−8) is located in an intron of DLG2. The two most significant single-nucleotide polymorphisms in FAM13A from a previous genome-wide study in non-Hispanics were associated with COPD in Hispanics. Conclusions: We have identified two novel loci (in or near the genes KLHL7/NUPL2 and DLG2) that may play a role in COPD pathogenesis in Hispanic populations. PMID:25584925
A genome-wide association study of chronic obstructive pulmonary disease in Hispanics.
Chen, Wei; Brehm, John M; Manichaikul, Ani; Cho, Michael H; Boutaoui, Nadia; Yan, Qi; Burkart, Kristin M; Enright, Paul L; Rotter, Jerome I; Petersen, Hans; Leng, Shuguang; Obeidat, Ma'en; Bossé, Yohan; Brandsma, Corry-Anke; Hao, Ke; Rich, Stephen S; Powell, Rhea; Avila, Lydiana; Soto-Quiros, Manuel; Silverman, Edwin K; Tesfaigzi, Yohannes; Barr, R Graham; Celedón, Juan C
2015-03-01
Genome-wide association studies (GWAS) of chronic obstructive pulmonary disease (COPD) have identified disease-susceptibility loci, mostly in subjects of European descent. We hypothesized that by studying Hispanic populations we would be able to identify unique loci that contribute to COPD pathogenesis in Hispanics but remain undetected in GWAS of non-Hispanic populations. We conducted a metaanalysis of two GWAS of COPD in independent cohorts of Hispanics in Costa Rica and the United States (Multi-Ethnic Study of Atherosclerosis [MESA]). We performed a replication study of the top single-nucleotide polymorphisms in an independent Hispanic cohort in New Mexico (the Lovelace Smokers Cohort). We also attempted to replicate prior findings from genome-wide studies in non-Hispanic populations in Hispanic cohorts. We found no genome-wide significant association with COPD in our metaanalysis of Costa Rica and MESA. After combining the top results from this metaanalysis with those from our replication study in the Lovelace Smokers Cohort, we identified two single-nucleotide polymorphisms approaching genome-wide significance for an association with COPD. The first (rs858249, combined P value = 6.1 × 10(-8)) is near the genes KLHL7 and NUPL2 on chromosome 7. The second (rs286499, combined P value = 8.4 × 10(-8)) is located in an intron of DLG2. The two most significant single-nucleotide polymorphisms in FAM13A from a previous genome-wide study in non-Hispanics were associated with COPD in Hispanics. We have identified two novel loci (in or near the genes KLHL7/NUPL2 and DLG2) that may play a role in COPD pathogenesis in Hispanic populations.
Pattin, Kristine A.; Moore, Jason H.
2009-01-01
One of the central goals of human genetics is the identification of loci with alleles or genotypes that confer increased susceptibility. The availability of dense maps of single-nucleotide polymorphisms (SNPs) along with high-throughput genotyping technologies has set the stage for routine genome-wide association studies that are expected to significantly improve our ability to identify susceptibility loci. Before this promise can be realized, there are some significant challenges that need to be addressed. We address here the challenge of detecting epistasis or gene-gene interactions in genome-wide association studies. Discovering epistatic interactions in high dimensional datasets remains a challenge due to the computational complexity resulting from the analysis of all possible combinations of SNPs. One potential way to overcome the computational burden of a genome-wide epistasis analysis would be to devise a logical way to prioritize the many SNPs in a dataset so that the data may be analyzed more efficiently and yet still retain important biological information. One of the strongest demonstrations of the functional relationship between genes is protein-protein interaction. Thus, it is plausible that the expert knowledge extracted from protein interaction databases may allow for a more efficient analysis of genome-wide studies as well as facilitate the biological interpretation of the data. In this review we will discuss the challenges of detecting epistasis in genome-wide genetic studies and the means by which we propose to apply expert knowledge extracted from protein interaction databases to facilitate this process. We explore some of the fundamentals of protein interactions and the databases that are publicly available. PMID:18551320
Citalopram and escitalopram plasma drug and metabolite concentrations: genome-wide associations
Ji, Yuan; Schaid, Daniel J; Desta, Zeruesenay; Kubo, Michiaki; Batzler, Anthony J; Snyder, Karen; Mushiroda, Taisei; Kamatani, Naoyuki; Ogburn, Evan; Hall-Flavin, Daniel; Flockhart, David; Nakamura, Yusuke; Mrazek, David A; Weinshilboum, Richard M
2014-01-01
Aims Citalopram (CT) and escitalopram (S-CT) are among the most widely prescribed selective serotonin reuptake inhibitors used to treat major depressive disorder (MDD). We applied a genome-wide association study to identify genetic factors that contribute to variation in plasma concentrations of CT or S-CT and their metabolites in MDD patients treated with CT or S-CT. Methods Our genome-wide association study was performed using samples from 435 MDD patients. Linear mixed models were used to account for within-subject correlations of longitudinal measures of plasma drug/metabolite concentrations (4 and 8 weeks after the initiation of drug therapy), and single-nucleotide polymorphisms (SNPs) were modelled as additive allelic effects. Results Genome-wide significant associations were observed for S-CT concentration with SNPs in or near the CYP2C19 gene on chromosome 10 (rs1074145, P = 4.1 × 10−9) and with S-didesmethylcitalopram concentration for SNPs near the CYP2D6 locus on chromosome 22 (rs1065852, P = 2.0 × 10−16), supporting the important role of these cytochrome P450 (CYP) enzymes in biotransformation of citalopram. After adjustment for the effect of CYP2C19 functional alleles, the analyses also identified novel loci that will require future replication and functional validation. Conclusions In vitro and in vivo studies have suggested that the biotransformation of CT to monodesmethylcitalopram and didesmethylcitalopram is mediated by CYP isozymes. The results of our genome-wide association study performed in MDD patients treated with CT or S-CT have confirmed those observations but also identified novel genomic loci that might play a role in variation in plasma levels of CT or its metabolites during the treatment of MDD patients with these selective serotonin reuptake inhibitors. PMID:24528284
Citalopram and escitalopram plasma drug and metabolite concentrations: genome-wide associations.
Ji, Yuan; Schaid, Daniel J; Desta, Zeruesenay; Kubo, Michiaki; Batzler, Anthony J; Snyder, Karen; Mushiroda, Taisei; Kamatani, Naoyuki; Ogburn, Evan; Hall-Flavin, Daniel; Flockhart, David; Nakamura, Yusuke; Mrazek, David A; Weinshilboum, Richard M
2014-08-01
Citalopram (CT) and escitalopram (S-CT) are among the most widely prescribed selective serotonin reuptake inhibitors used to treat major depressive disorder (MDD). We applied a genome-wide association study to identify genetic factors that contribute to variation in plasma concentrations of CT or S-CT and their metabolites in MDD patients treated with CT or S-CT. Our genome-wide association study was performed using samples from 435 MDD patients. Linear mixed models were used to account for within-subject correlations of longitudinal measures of plasma drug/metabolite concentrations (4 and 8 weeks after the initiation of drug therapy), and single-nucleotide polymorphisms (SNPs) were modelled as additive allelic effects. Genome-wide significant associations were observed for S-CT concentration with SNPs in or near the CYP2C19 gene on chromosome 10 (rs1074145, P = 4.1 × 10(-9) ) and with S-didesmethylcitalopram concentration for SNPs near the CYP2D6 locus on chromosome 22 (rs1065852, P = 2.0 × 10(-16) ), supporting the important role of these cytochrome P450 (CYP) enzymes in biotransformation of citalopram. After adjustment for the effect of CYP2C19 functional alleles, the analyses also identified novel loci that will require future replication and functional validation. In vitro and in vivo studies have suggested that the biotransformation of CT to monodesmethylcitalopram and didesmethylcitalopram is mediated by CYP isozymes. The results of our genome-wide association study performed in MDD patients treated with CT or S-CT have confirmed those observations but also identified novel genomic loci that might play a role in variation in plasma levels of CT or its metabolites during the treatment of MDD patients with these selective serotonin reuptake inhibitors. © 2014 The British Pharmacological Society.
A genome-wide approach to children's aggressive behavior: The EAGLE consortium.
Pappa, Irene; St Pourcain, Beate; Benke, Kelly; Cavadino, Alana; Hakulinen, Christian; Nivard, Michel G; Nolte, Ilja M; Tiesler, Carla M T; Bakermans-Kranenburg, Marian J; Davies, Gareth E; Evans, David M; Geoffroy, Marie-Claude; Grallert, Harald; Groen-Blokhuis, Maria M; Hudziak, James J; Kemp, John P; Keltikangas-Järvinen, Liisa; McMahon, George; Mileva-Seitz, Viara R; Motazedi, Ehsan; Power, Christine; Raitakari, Olli T; Ring, Susan M; Rivadeneira, Fernando; Rodriguez, Alina; Scheet, Paul A; Seppälä, Ilkka; Snieder, Harold; Standl, Marie; Thiering, Elisabeth; Timpson, Nicholas J; Veenstra, René; Velders, Fleur P; Whitehouse, Andrew J O; Smith, George Davey; Heinrich, Joachim; Hypponen, Elina; Lehtimäki, Terho; Middeldorp, Christel M; Oldehinkel, Albertine J; Pennell, Craig E; Boomsma, Dorret I; Tiemeier, Henning
2016-07-01
Individual differences in aggressive behavior emerge in early childhood and predict persisting behavioral problems and disorders. Studies of antisocial and severe aggression in adulthood indicate substantial underlying biology. However, little attention has been given to genome-wide approaches of aggressive behavior in children. We analyzed data from nine population-based studies and assessed aggressive behavior using well-validated parent-reported questionnaires. This is the largest sample exploring children's aggressive behavior to date (N = 18,988), with measures in two developmental stages (N = 15,668 early childhood and N = 16,311 middle childhood/early adolescence). First, we estimated the additive genetic variance of children's aggressive behavior based on genome-wide SNP information, using genome-wide complex trait analysis (GCTA). Second, genetic associations within each study were assessed using a quasi-Poisson regression approach, capturing the highly right-skewed distribution of aggressive behavior. Third, we performed meta-analyses of genome-wide associations for both the total age-mixed sample and the two developmental stages. Finally, we performed a gene-based test using the summary statistics of the total sample. GCTA quantified variance tagged by common SNPs (10-54%). The meta-analysis of the total sample identified one region in chromosome 2 (2p12) at near genome-wide significance (top SNP rs11126630, P = 5.30 × 10(-8) ). The separate meta-analyses of the two developmental stages revealed suggestive evidence of association at the same locus. The gene-based analysis indicated association of variation within AVPR1A with aggressive behavior. We conclude that common variants at 2p12 show suggestive evidence for association with childhood aggression. Replication of these initial findings is needed, and further studies should clarify its biological meaning. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.
GWAS and admixture mapping identify different asthma-associated loci in Latinos: The GALA II Study
Galanter, Joshua M; Gignoux, Christopher R; Torgerson, Dara G; Roth, Lindsey A; Eng, Celeste; Oh, Sam S; Nguyen, Elizabeth A; Drake, Katherine A; Huntsman, Scott; Hu, Donglei; Sen, Saunak; Davis, Adam; Farber, Harold J.; Avila, Pedro C.; Brigino-Buenaventura, Emerita; LeNoir, Michael A.; Meade, Kelley; Serebrisky, Denise; Borrell, Luisa N; Rodríguez-Cintrón, William; Estrada, Andres Moreno; Mendoza, Karla Sandoval; Winkler, Cheryl A.; Klitz, William; Romieu, Isabelle; London, Stephanie J.; Gilliland, Frank; Martinez, Fernando; Bustamante, Carlos; Williams, L Keoki; Kumar, Rajesh; Rodríguez-Santana, José R.; Burchard, and Esteban G.
2013-01-01
Background Asthma is a complex disease with both genetic and environmental causes. Genome-wide association studies of asthma have mostly involved European populations and replication of positive associations has been inconsistent. Objective To identify asthma-associated genes in a large Latino population with genome-wide association analysis and admixture mapping. Methods Latino children with asthma (n = 1,893) and healthy controls (n = 1,881) were recruited from five sites in the United States: Puerto Rico, New York, Chicago, Houston, and the San Francisco Bay Area. Subjects were genotyped on an Affymetrix World Array IV chip. We performed genome-wide association and admixture mapping to identify asthma-associated loci. Results We identified a significant association between ancestry and asthma at 6p21 (lowest p-value: rs2523924, p < 5 × 10−6). This association replicates in a meta-analysis of the EVE Asthma Consortium (p = 0.01). Fine mapping of the region in this study and the EVE Asthma Consortium suggests an association between PSORS1C1 and asthma. We confirmed the strong allelic association between the 17q21 asthma in Latinos (IKZF3, lowest p-value: rs90792, OR: 0.67, 95% CI 0.61 – 0.75, p = 6 × 10−13) and replicated associations in several genes that had previously been associated with asthma in genome-wide association studies. Conclusions Admixture mapping and genome-wide association are complementary techniques that provide evidence for multiple asthma-associated loci in Latinos. Admixture mapping identifies a novel locus on 6p21 that replicates in a meta-analysis of several Latino populations, while genome-wide association confirms the previously identified locus on 17q21. PMID:24406073
2013-01-01
Background Optimization procedures to identify gene knockouts for targeted biochemical overproduction have been widely in use in modern metabolic engineering. Flux balance analysis (FBA) framework has provided conceptual simplifications for genome-scale dynamic analysis at steady states. Based on FBA, many current optimization methods for targeted bio-productions have been developed under the maximum cell growth assumption. The optimization problem to derive gene knockout strategies recently has been formulated as a bi-level programming problem in OptKnock for maximum targeted bio-productions with maximum growth rates. However, it has been shown that knockout mutants in fact reach the steady states with the minimization of metabolic adjustment (MOMA) from the corresponding wild-type strains instead of having maximal growth rates after genetic or metabolic intervention. In this work, we propose a new bi-level computational framework--MOMAKnock--which can derive robust knockout strategies under the MOMA flux distribution approximation. Methods In this new bi-level optimization framework, we aim to maximize the production of targeted chemicals by identifying candidate knockout genes or reactions under phenotypic constraints approximated by the MOMA assumption. Hence, the targeted chemical production is the primary objective of MOMAKnock while the MOMA assumption is formulated as the inner problem of constraining the knockout metabolic flux to be as close as possible to the steady-state phenotypes of wide-type strains. As this new inner problem becomes a quadratic programming problem, a novel adaptive piecewise linearization algorithm is developed in this paper to obtain the exact optimal solution to this new bi-level integer quadratic programming problem for MOMAKnock. Results Our new MOMAKnock model and the adaptive piecewise linearization solution algorithm are tested with a small E. coli core metabolic network and a large-scale iAF1260 E. coli metabolic network. The derived knockout strategies are compared with those from OptKnock. Our preliminary experimental results show that MOMAKnock can provide improved targeted productions with more robust knockout strategies. PMID:23368729
Ren, Shaogang; Zeng, Bo; Qian, Xiaoning
2013-01-01
Optimization procedures to identify gene knockouts for targeted biochemical overproduction have been widely in use in modern metabolic engineering. Flux balance analysis (FBA) framework has provided conceptual simplifications for genome-scale dynamic analysis at steady states. Based on FBA, many current optimization methods for targeted bio-productions have been developed under the maximum cell growth assumption. The optimization problem to derive gene knockout strategies recently has been formulated as a bi-level programming problem in OptKnock for maximum targeted bio-productions with maximum growth rates. However, it has been shown that knockout mutants in fact reach the steady states with the minimization of metabolic adjustment (MOMA) from the corresponding wild-type strains instead of having maximal growth rates after genetic or metabolic intervention. In this work, we propose a new bi-level computational framework--MOMAKnock--which can derive robust knockout strategies under the MOMA flux distribution approximation. In this new bi-level optimization framework, we aim to maximize the production of targeted chemicals by identifying candidate knockout genes or reactions under phenotypic constraints approximated by the MOMA assumption. Hence, the targeted chemical production is the primary objective of MOMAKnock while the MOMA assumption is formulated as the inner problem of constraining the knockout metabolic flux to be as close as possible to the steady-state phenotypes of wide-type strains. As this new inner problem becomes a quadratic programming problem, a novel adaptive piecewise linearization algorithm is developed in this paper to obtain the exact optimal solution to this new bi-level integer quadratic programming problem for MOMAKnock. Our new MOMAKnock model and the adaptive piecewise linearization solution algorithm are tested with a small E. coli core metabolic network and a large-scale iAF1260 E. coli metabolic network. The derived knockout strategies are compared with those from OptKnock. Our preliminary experimental results show that MOMAKnock can provide improved targeted productions with more robust knockout strategies.
Genome-scale biological models for industrial microbial systems.
Xu, Nan; Ye, Chao; Liu, Liming
2018-04-01
The primary aims and challenges associated with microbial fermentation include achieving faster cell growth, higher productivity, and more robust production processes. Genome-scale biological models, predicting the formation of an interaction among genetic materials, enzymes, and metabolites, constitute a systematic and comprehensive platform to analyze and optimize the microbial growth and production of biological products. Genome-scale biological models can help optimize microbial growth-associated traits by simulating biomass formation, predicting growth rates, and identifying the requirements for cell growth. With regard to microbial product biosynthesis, genome-scale biological models can be used to design product biosynthetic pathways, accelerate production efficiency, and reduce metabolic side effects, leading to improved production performance. The present review discusses the development of microbial genome-scale biological models since their emergence and emphasizes their pertinent application in improving industrial microbial fermentation of biological products.
Copy Number Variations in Tilapia Genomes.
Li, Bi Jun; Li, Hong Lian; Meng, Zining; Zhang, Yong; Lin, Haoran; Yue, Gen Hua; Xia, Jun Hong
2017-02-01
Discovering the nature and pattern of genome variation is fundamental in understanding phenotypic diversity among populations. Although several millions of single nucleotide polymorphisms (SNPs) have been discovered in tilapia, the genome-wide characterization of larger structural variants, such as copy number variation (CNV) regions has not been carried out yet. We conducted a genome-wide scan for CNVs in 47 individuals from three tilapia populations. Based on 254 Gb of high-quality paired-end sequencing reads, we identified 4642 distinct high-confidence CNVs. These CNVs account for 1.9% (12.411 Mb) of the used Nile tilapia reference genome. A total of 1100 predicted CNVs were found overlapping with exon regions of protein genes. Further association analysis based on linear model regression found 85 CNVs ranging between 300 and 27,000 base pairs significantly associated to population types (R 2 > 0.9 and P > 0.001). Our study sheds first insights on genome-wide CNVs in tilapia. These CNVs among and within tilapia populations may have functional effects on phenotypes and specific adaptation to particular environments.
Algama, Manjula; Tasker, Edward; Williams, Caitlin; Parslow, Adam C; Bryson-Richardson, Robert J; Keith, Jonathan M
2017-03-27
Computational identification of non-coding RNAs (ncRNAs) is a challenging problem. We describe a genome-wide analysis using Bayesian segmentation to identify intronic elements highly conserved between three evolutionarily distant vertebrate species: human, mouse and zebrafish. We investigate the extent to which these elements include ncRNAs (or conserved domains of ncRNAs) and regulatory sequences. We identified 655 deeply conserved intronic sequences in a genome-wide analysis. We also performed a pathway-focussed analysis on genes involved in muscle development, detecting 27 intronic elements, of which 22 were not detected in the genome-wide analysis. At least 87% of the genome-wide and 70% of the pathway-focussed elements have existing annotations indicative of conserved RNA secondary structure. The expression of 26 of the pathway-focused elements was examined using RT-PCR, providing confirmation that they include expressed ncRNAs. Consistent with previous studies, these elements are significantly over-represented in the introns of transcription factors. This study demonstrates a novel, highly effective, Bayesian approach to identifying conserved non-coding sequences. Our results complement previous findings that these sequences are enriched in transcription factors. However, in contrast to previous studies which suggest the majority of conserved sequences are regulatory factor binding sites, the majority of conserved sequences identified using our approach contain evidence of conserved RNA secondary structures, and our laboratory results suggest most are expressed. Functional roles at DNA and RNA levels are not mutually exclusive, and many of our elements possess evidence of both. Moreover, ncRNAs play roles in transcriptional and post-transcriptional regulation, and this may contribute to the over-representation of these elements in introns of transcription factors. We attribute the higher sensitivity of the pathway-focussed analysis compared to the genome-wide analysis to improved alignment quality, suggesting that enhanced genomic alignments may reveal many more conserved intronic sequences.
Genome-wide prediction of cis-regulatory regions using supervised deep learning methods.
Li, Yifeng; Shi, Wenqiang; Wasserman, Wyeth W
2018-05-31
In the human genome, 98% of DNA sequences are non-protein-coding regions that were previously disregarded as junk DNA. In fact, non-coding regions host a variety of cis-regulatory regions which precisely control the expression of genes. Thus, Identifying active cis-regulatory regions in the human genome is critical for understanding gene regulation and assessing the impact of genetic variation on phenotype. The developments of high-throughput sequencing and machine learning technologies make it possible to predict cis-regulatory regions genome wide. Based on rich data resources such as the Encyclopedia of DNA Elements (ENCODE) and the Functional Annotation of the Mammalian Genome (FANTOM) projects, we introduce DECRES based on supervised deep learning approaches for the identification of enhancer and promoter regions in the human genome. Due to their ability to discover patterns in large and complex data, the introduction of deep learning methods enables a significant advance in our knowledge of the genomic locations of cis-regulatory regions. Using models for well-characterized cell lines, we identify key experimental features that contribute to the predictive performance. Applying DECRES, we delineate locations of 300,000 candidate enhancers genome wide (6.8% of the genome, of which 40,000 are supported by bidirectional transcription data), and 26,000 candidate promoters (0.6% of the genome). The predicted annotations of cis-regulatory regions will provide broad utility for genome interpretation from functional genomics to clinical applications. The DECRES model demonstrates potentials of deep learning technologies when combined with high-throughput sequencing data, and inspires the development of other advanced neural network models for further improvement of genome annotations.
Li, Yingying; Chen, Wu; Wang, Yunsheng; Luo, Kun; Li, Yue; Bai, Lianyang; Luo, Feng
2017-01-01
Quinclorac is a widely used herbicide in rice filed. Unfortunately, quinclorac residues are phytotoxic to many crops/vegetables. The degradation of quinclorac in nature is very slow. On the other hand, degradation of quinclorac using bacteria can be an effective and efficient method to reduce its contamination. In this study, we isolated a quinclorac bioremediation bacterium strain F4 from quinclorac contaminated soils. Based on morphological characteristics and 16S rRNA gene sequence analysis, we identified strain F4 as Mycobacterium sp. We investigated the effects of temperature, pH, inoculation size and initial quinclorac concentration on growth and degrading efficiency of F4 and determined the optimal quinclorac degrading condition of F4. Under optimal degrading conditions, F4 degraded 97.38% of quinclorac from an initial concentration of 50 mg/L in seven days. Our indoor pot experiment demonstrated that the degradation products were non-phytotoxic to tobacco. After analyzing the quinclorac degradation products of F4, we proposed that F4 could employ two pathways to degrade quinclorac: one is through methylation, the other is through dechlorination. Furthermore, we reconstructed the whole genome of F4 through single molecular sequencing and de novo assembly. We identified 77 methyltransferases and eight dehalogenases in the F4 genome to support our hypothesized degradation path.
Li, Yingying; Chen, Wu; Wang, Yunsheng; Luo, Kun; Li, Yue; Bai, Lianyang
2017-01-01
Quinclorac is a widely used herbicide in rice filed. Unfortunately, quinclorac residues are phytotoxic to many crops/vegetables. The degradation of quinclorac in nature is very slow. On the other hand, degradation of quinclorac using bacteria can be an effective and efficient method to reduce its contamination. In this study, we isolated a quinclorac bioremediation bacterium strain F4 from quinclorac contaminated soils. Based on morphological characteristics and 16S rRNA gene sequence analysis, we identified strain F4 as Mycobacterium sp. We investigated the effects of temperature, pH, inoculation size and initial quinclorac concentration on growth and degrading efficiency of F4 and determined the optimal quinclorac degrading condition of F4. Under optimal degrading conditions, F4 degraded 97.38% of quinclorac from an initial concentration of 50 mg/L in seven days. Our indoor pot experiment demonstrated that the degradation products were non-phytotoxic to tobacco. After analyzing the quinclorac degradation products of F4, we proposed that F4 could employ two pathways to degrade quinclorac: one is through methylation, the other is through dechlorination. Furthermore, we reconstructed the whole genome of F4 through single molecular sequencing and de novo assembly. We identified 77 methyltransferases and eight dehalogenases in the F4 genome to support our hypothesized degradation path. PMID:28968436
A Genome-Wide Breast Cancer Scan in African Americans
2011-06-01
cancer in women of African ancestry. 13 References 1. Easton DF, P.K., Dunning AM, Pharoah PDP, Thompson D, Ballinger DG, et al . Genome...M, Hankinson, SE, et al . A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer...Millikan, R.C. Race, breast cancer subtypes, and survival in the Carolina Breast Cancer Study. Jama 295, 2492-502 ( 2006 ). 16 17. Huo, D., Ikpatt
Farrar, Kerrie; Bryant, David; Cope-Selby, Naomi
2014-01-01
Plant production systems globally must be optimized to produce stable high yields from limited land under changing and variable climates. Demands for food, animal feed, and feedstocks for bioenergy and biorefining applications, are increasing with population growth, urbanization and affluence. Low-input, sustainable, alternatives to petrochemical-derived fertilizers and pesticides are required to reduce input costs and maintain or increase yields, with potential biological solutions having an important role to play. In contrast to crops that have been bred for food, many bioenergy crops are largely undomesticated, and so there is an opportunity to harness beneficial plant–microbe relationships which may have been inadvertently lost through intensive crop breeding. Plant–microbe interactions span a wide range of relationships in which one or both of the organisms may have a beneficial, neutral or negative effect on the other partner. A relatively small number of beneficial plant–microbe interactions are well understood and already exploited; however, others remain understudied and represent an untapped reservoir for optimizing plant production. There may be near-term applications for bacterial strains as microbial biopesticides and biofertilizers to increase biomass yield from energy crops grown on land unsuitable for food production. Longer term aims involve the design of synthetic genetic circuits within and between the host and microbes to optimize plant production. A highly exciting prospect is that endosymbionts comprise a unique resource of reduced complexity microbial genomes with adaptive traits of great interest for a wide variety of applications. PMID:25431199
Integrative Genomics Viewer (IGV) | Informatics Technology for Cancer Research (ITCR)
The Integrative Genomics Viewer (IGV) is a high-performance visualization tool for interactive exploration of large, integrated genomic datasets. It supports a wide variety of data types, including array-based and next-generation sequence data, and genomic annotations.
Mark J. Statham; James Murdoch; Jan Janecka; Keith B. Aubry; Ceiridwen J. Edwards; Carl D. Soulsbury; Oliver Berry; Zhenghuan Wang; David Harrison; Malcolm Pearch; Louise Tomsett; Judith Chupasko; Benjamin N. Sacks
2014-01-01
Widely distributed taxa provide an opportunity to compare biogeographic responses to climatic fluctuations on multiple continents and to investigate speciation. We conducted the most geographically and genomically comprehensive study to date of the red fox (Vulpes vulpes), the worldâs most widely distributed wild terrestrial carnivore. Analyses of 697 bp of...
ERIC Educational Resources Information Center
Nijmeijer, Judith S.; Arias-Vásquez, Alejandro; Rommelse, Nanda N.; Altink, Marieke E.; Buschgens, Cathelijne J.; Fliers, Ellen A.; Franke, Barbara; Minderaa, Ruud B.; Sergeant, Joseph A.; Buitelaar, Jan K.; Hoekstra, Pieter J.; Hartman, Catharina A.
2014-01-01
We studied 261 ADHD probands and 354 of their siblings to assess quantitative trait loci associated with autism spectrum disorder symptoms (as measured by the Children's Social Behavior Questionnaire (CSBQ) using a genome-wide linkage approach, followed by locus-wide association analysis. A genome-wide significant locus for the CSBQ subscale…
Quantifying Temporal Genomic Erosion in Endangered Species.
Díez-Del-Molino, David; Sánchez-Barreiro, Fatima; Barnes, Ian; Gilbert, M Thomas P; Dalén, Love
2018-03-01
Many species have undergone dramatic population size declines over the past centuries. Although stochastic genetic processes during and after such declines are thought to elevate the risk of extinction, comparative analyses of genomic data from several endangered species suggest little concordance between genome-wide diversity and current population sizes. This is likely because species-specific life-history traits and ancient bottlenecks overshadow the genetic effect of recent demographic declines. Therefore, we advocate that temporal sampling of genomic data provides a more accurate approach to quantify genetic threats in endangered species. Specifically, genomic data from predecline museum specimens will provide valuable baseline data that enable accurate estimation of recent decreases in genome-wide diversity, increases in inbreeding levels, and accumulation of deleterious genetic variation. Copyright © 2017 Elsevier Ltd. All rights reserved.
Shen, Xia; De Jonge, Jennifer; Forsberg, Simon K. G.; Pettersson, Mats E.; Sheng, Zheya; Hennig, Lars; Carlborg, Örjan
2014-01-01
As Arabidopsis thaliana has colonized a wide range of habitats across the world it is an attractive model for studying the genetic mechanisms underlying environmental adaptation. Here, we used public data from two collections of A. thaliana accessions to associate genetic variability at individual loci with differences in climates at the sampling sites. We use a novel method to screen the genome for plastic alleles that tolerate a broader climate range than the major allele. This approach reduces confounding with population structure and increases power compared to standard genome-wide association methods. Sixteen novel loci were found, including an association between Chromomethylase 2 (CMT2) and temperature seasonality where the genome-wide CHH methylation was different for the group of accessions carrying the plastic allele. Cmt2 mutants were shown to be more tolerant to heat-stress, suggesting genetic regulation of epigenetic modifications as a likely mechanism underlying natural adaptation to variable temperatures, potentially through differential allelic plasticity to temperature-stress. PMID:25503602
Protocol matters: which methylome are you actually studying?
Robinson, Mark D; Statham, Aaron L; Speed, Terence P; Clark, Susan J
2011-01-01
The field of epigenetics is now capitalizing on the vast number of emerging technologies, largely based on second-generation sequencing, which interrogate DNA methylation status and histone modifications genome-wide. However, getting an exhaustive and unbiased view of a methylome at a reasonable cost is proving to be a significant challenge. In this article, we take a closer look at the impact of the DNA sequence and bias effects introduced to datasets by genome-wide DNA methylation technologies and where possible, explore the bioinformatics tools that deconvolve them. There remains much to be learned about the performance of genome-wide technologies, the data we mine from these assays and how it reflects the actual biology. While there are several methods to interrogate the DNA methylation status genome-wide, our opinion is that no single technique suitably covers the minimum criteria of high coverage and, high resolution at a reasonable cost. In fact, the fraction of the methylome that is studied currently depends entirely on the inherent biases of the protocol employed. There is promise for this to change, as the third generation of sequencing technologies is expected to again ‘revolutionize’ the way that we study genomes and epigenomes. PMID:21566704
Dichgans, Martin; Malik, Rainer; König, Inke R.; Rosand, Jonathan; Clarke, Robert; Gretarsdottir, Solveig; Thorleifsson, Gudmar; Mitchell, Braxton D.; Assimes, Themistocles L.; Levi, Christopher; O′Donnell, Christopher J.; Fornage, Myriam; Thorsteinsdottir, Unnur; Psaty, Bruce M.; Hengstenberg, Christian; Seshadri, Sudha; Erdmann, Jeanette; Bis, Joshua C.; Peters, Annette; Boncoraglio, Giorgio B.; März, Winfried; Meschia, James F.; Kathiresan, Sekar; Ikram, M. Arfan; McPherson, Ruth; Stefansson, Kari; Sudlow, Cathie; Reilly, Muredach P.; Thompson, John R.; Sharma, Pankaj; Hopewell, Jemma C.; Chambers, John C.; Watkins, Hugh; Rothwell, Peter M.; Roberts, Robert; Markus, Hugh S.; Samani, Nilesh J.; Farrall, Martin; Schunkert, Heribert
2014-01-01
Summary Background and Purpose Ischemic stroke (IS) and coronary artery disease (CAD) share several risk factors and each have a substantial heritability. We conducted a genome-wide analysis to evaluate the extent of shared genetic determination of the two diseases. Methods Genome-wide association data were obtained from the METASTROKE, CARDIoGRAM, and C4D consortia. We first analyzed common variants reaching a nominal threshold of significance (p<0.01) for CAD for their association with IS and vice versa. We then examined specific overlap across phenotypes for variants that reached a high threshold of significance. Finally, we conducted a joint meta-analysis on the combined phenotype of IS or CAD. Corresponding analyses were performed restricted to the 2,167 individuals with the ischemic large artery stroke (LAS) subtype. Results Common variants associated with CAD at p<0.01 were associated with a significant excess risk for IS and for LAS and vice versa. Among the 42 known genome-wide significant loci for CAD, three and five loci were significantly associated with IS and LAS, respectively. In the joint meta-analyses, 15 loci passed genome-wide significance (p<5×10-8) for the combined phenotype of IS or CAD and 17 loci passed genome-wide significance for LAS or CAD. Since these loci had prior evidence for genome-wide significance for CAD we specifically analyzed the respective signals for IS and LAS and found evidence for association at chr12q24/SH2B3 (pIS=1.62×10-07) and ABO (pIS =2.6×10-4) as well as at HDAC9 (pLAS=2.32×10-12), 9p21 (pLAS =3.70×10-6), RAI1-PEMT-RASD1 (pLAS =2.69×10-5), EDNRA (pLAS =7.29×10-4), and CYP17A1-CNNM2-NT5C2 (pLAS =4.9×10-4). Conclusions Our results demonstrate substantial overlap in the genetic risk of ischemic stroke and particularly the large artery stroke subtype with coronary artery disease. PMID:24262325
Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes.
Riechmann, J L; Heard, J; Martin, G; Reuber, L; Jiang, C; Keddie, J; Adam, L; Pineda, O; Ratcliffe, O J; Samaha, R R; Creelman, R; Pilgrim, M; Broun, P; Zhang, J Z; Ghandehari, D; Sherman, B K; Yu, G
2000-12-15
The completion of the Arabidopsis thaliana genome sequence allows a comparative analysis of transcriptional regulators across the three eukaryotic kingdoms. Arabidopsis dedicates over 5% of its genome to code for more than 1500 transcription factors, about 45% of which are from families specific to plants. Arabidopsis transcription factors that belong to families common to all eukaryotes do not share significant similarity with those of the other kingdoms beyond the conserved DNA binding domains, many of which have been arranged in combinations specific to each lineage. The genome-wide comparison reveals the evolutionary generation of diversity in the regulation of transcription.
MIPS plant genome information resources.
Spannagl, Manuel; Haberer, Georg; Ernst, Rebecca; Schoof, Heiko; Mayer, Klaus F X
2007-01-01
The Munich Institute for Protein Sequences (MIPS) has been involved in maintaining plant genome databases since the Arabidopsis thaliana genome project. Genome databases and analysis resources have focused on individual genomes and aim to provide flexible and maintainable data sets for model plant genomes as a backbone against which experimental data, for example from high-throughput functional genomics, can be organized and evaluated. In addition, model genomes also form a scaffold for comparative genomics, and much can be learned from genome-wide evolutionary studies.
Comprehensive evaluation of genome-wide 5-hydroxymethylcytosine profiling approaches in human DNA.
Skvortsova, Ksenia; Zotenko, Elena; Luu, Phuc-Loi; Gould, Cathryn M; Nair, Shalima S; Clark, Susan J; Stirzaker, Clare
2017-01-01
The discovery that 5-methylcytosine (5mC) can be oxidized to 5-hydroxymethylcytosine (5hmC) by the ten-eleven translocation (TET) proteins has prompted wide interest in the potential role of 5hmC in reshaping the mammalian DNA methylation landscape. The gold-standard bisulphite conversion technologies to study DNA methylation do not distinguish between 5mC and 5hmC. However, new approaches to mapping 5hmC genome-wide have advanced rapidly, although it is unclear how the different methods compare in accurately calling 5hmC. In this study, we provide a comparative analysis on brain DNA using three 5hmC genome-wide approaches, namely whole-genome bisulphite/oxidative bisulphite sequencing (WG Bis/OxBis-seq), Infinium HumanMethylation450 BeadChip arrays coupled with oxidative bisulphite (HM450K Bis/OxBis) and antibody-based immunoprecipitation and sequencing of hydroxymethylated DNA (hMeDIP-seq). We also perform loci-specific TET-assisted bisulphite sequencing (TAB-seq) for validation of candidate regions. We show that whole-genome single-base resolution approaches are advantaged in providing precise 5hmC values but require high sequencing depth to accurately measure 5hmC, as this modification is commonly in low abundance in mammalian cells. HM450K arrays coupled with oxidative bisulphite provide a cost-effective representation of 5hmC distribution, at CpG sites with 5hmC levels >~10%. However, 5hmC analysis is restricted to the genomic location of the probes, which is an important consideration as 5hmC modification is commonly enriched at enhancer elements. Finally, we show that the widely used hMeDIP-seq method provides an efficient genome-wide profile of 5hmC and shows high correlation with WG Bis/OxBis-seq 5hmC distribution in brain DNA. However, in cell line DNA with low levels of 5hmC, hMeDIP-seq-enriched regions are not detected by WG Bis/OxBis or HM450K, either suggesting misinterpretation of 5hmC calls by hMeDIP or lack of sensitivity of the latter methods. We highlight both the advantages and caveats of three commonly used genome-wide 5hmC profiling technologies and show that interpretation of 5hmC data can be significantly influenced by the sensitivity of methods used, especially as the levels of 5hmC are low and vary in different cell types and different genomic locations.
Meneco, a Topology-Based Gap-Filling Tool Applicable to Degraded Genome-Wide Metabolic Networks
Prigent, Sylvain; Frioux, Clémence; Dittami, Simon M.; Larhlimi, Abdelhalim; Collet, Guillaume; Gutknecht, Fabien; Got, Jeanne; Eveillard, Damien; Bourdon, Jérémie; Plewniak, Frédéric; Tonon, Thierry; Siegel, Anne
2017-01-01
Increasing amounts of sequence data are becoming available for a wide range of non-model organisms. Investigating and modelling the metabolic behaviour of those organisms is highly relevant to understand their biology and ecology. As sequences are often incomplete and poorly annotated, draft networks of their metabolism largely suffer from incompleteness. Appropriate gap-filling methods to identify and add missing reactions are therefore required to address this issue. However, current tools rely on phenotypic or taxonomic information, or are very sensitive to the stoichiometric balance of metabolic reactions, especially concerning the co-factors. This type of information is often not available or at least prone to errors for newly-explored organisms. Here we introduce Meneco, a tool dedicated to the topological gap-filling of genome-scale draft metabolic networks. Meneco reformulates gap-filling as a qualitative combinatorial optimization problem, omitting constraints raised by the stoichiometry of a metabolic network considered in other methods, and solves this problem using Answer Set Programming. Run on several artificial test sets gathering 10,800 degraded Escherichia coli networks Meneco was able to efficiently identify essential reactions missing in networks at high degradation rates, outperforming the stoichiometry-based tools in scalability. To demonstrate the utility of Meneco we applied it to two case studies. Its application to recent metabolic networks reconstructed for the brown algal model Ectocarpus siliculosus and an associated bacterium Candidatus Phaeomarinobacter ectocarpi revealed several candidate metabolic pathways for algal-bacterial interactions. Then Meneco was used to reconstruct, from transcriptomic and metabolomic data, the first metabolic network for the microalga Euglena mutabilis. These two case studies show that Meneco is a versatile tool to complete draft genome-scale metabolic networks produced from heterogeneous data, and to suggest relevant reactions that explain the metabolic capacity of a biological system. PMID:28129330
The Acid Phosphatase-Encoding Gene GmACP1 Contributes to Soybean Tolerance to Low-Phosphorus Stress
Hao, Derong; Wang, Hui; Kan, Guizhen; Jin, Hangxia; Yu, Deyue
2014-01-01
Phosphorus (P) is essential for all living cells and organisms, and low-P stress is a major factor constraining plant growth and yield worldwide. In plants, P efficiency is a complex quantitative trait involving multiple genes, and the mechanisms underlying P efficiency are largely unknown. Combining linkage analysis, genome-wide and candidate-gene association analyses, and plant transformation, we identified a soybean gene related to P efficiency, determined its favorable haplotypes and developed valuable functional markers. First, six major genomic regions associated with P efficiency were detected by performing genome-wide associations (GWAs) in various environments. A highly significant region located on chromosome 8, qPE8, was identified by both GWAs and linkage mapping and explained 41% of the phenotypic variation. Then, a regional mapping study was performed with 40 surrounding markers in 192 diverse soybean accessions. A strongly associated haplotype (P = 10−7) consisting of the markers Sat_233 and BARC-039899-07603 was identified, and qPE8 was located in a region of approximately 250 kb, which contained a candidate gene GmACP1 that encoded an acid phosphatase. GmACP1 overexpression in soybean hairy roots increased P efficiency by 11–20% relative to the control. A candidate-gene association analysis indicated that six natural GmACP1 polymorphisms explained 33% of the phenotypic variation. The favorable alleles and haplotypes of GmACP1 associated with increased transcript expression correlated with higher enzyme activity. The discovery of the optimal haplotype of GmACP1 will now enable the accurate selection of soybeans with higher P efficiencies and improve our understanding of the molecular mechanisms underlying P efficiency in plants. PMID:24391523
Jung, Hyungtaek; Yoon, Byung-Ha; Kim, Woo-Jin; Kim, Dong-Wook; Hurwood, David A; Lyons, Russell E; Salin, Krishna R; Kim, Heui-Soo; Baek, Ilseon; Chand, Vincent; Mather, Peter B
2016-05-07
The giant freshwater prawn, Macrobrachium rosenbergii, a sexually dimorphic decapod crustacean is currently the world's most economically important cultured freshwater crustacean species. Despite its economic importance, there is currently a lack of genomic resources available for this species, and this has limited exploration of the molecular mechanisms that control the M. rosenbergii sex-differentiation system more widely in freshwater prawns. Here, we present the first hybrid transcriptome from M. rosenbergii applying RNA-Seq technologies directed at identifying genes that have potential functional roles in reproductive-related traits. A total of 13,733,210 combined raw reads (1720 Mbp) were obtained from Ion-Torrent PGM and 454 FLX. Bioinformatic analyses based on three state-of-the-art assemblers, the CLC Genomic Workbench, Trans-ABySS, and Trinity, that use single and multiple k-mer methods respectively, were used to analyse the data. The influence of multiple k-mers on assembly performance was assessed to gain insight into transcriptome assembly from short reads. After optimisation, de novo assembly resulted in 44,407 contigs with a mean length of 437 bp, and the assembled transcripts were further functionally annotated to detect single nucleotide polymorphisms and simple sequence repeat motifs. Gene expression analysis was also used to compare expression patterns from ovary and testis tissue libraries to identify genes with potential roles in reproduction and sex differentiation. The large transcript set assembled here represents the most comprehensive set of transcriptomic resources ever developed for reproduction traits in M. rosenbergii, and the large number of genetic markers predicted should constitute an invaluable resource for future genetic research studies on M. rosenbergii and can be applied more widely on other freshwater prawn species in the genus Macrobrachium.
Jung, Hyungtaek; Yoon, Byung-Ha; Kim, Woo-Jin; Kim, Dong-Wook; Hurwood, David A.; Lyons, Russell E.; Salin, Krishna R.; Kim, Heui-Soo; Baek, Ilseon; Chand, Vincent; Mather, Peter B.
2016-01-01
The giant freshwater prawn, Macrobrachium rosenbergii, a sexually dimorphic decapod crustacean is currently the world’s most economically important cultured freshwater crustacean species. Despite its economic importance, there is currently a lack of genomic resources available for this species, and this has limited exploration of the molecular mechanisms that control the M. rosenbergii sex-differentiation system more widely in freshwater prawns. Here, we present the first hybrid transcriptome from M. rosenbergii applying RNA-Seq technologies directed at identifying genes that have potential functional roles in reproductive-related traits. A total of 13,733,210 combined raw reads (1720 Mbp) were obtained from Ion-Torrent PGM and 454 FLX. Bioinformatic analyses based on three state-of-the-art assemblers, the CLC Genomic Workbench, Trans-ABySS, and Trinity, that use single and multiple k-mer methods respectively, were used to analyse the data. The influence of multiple k-mers on assembly performance was assessed to gain insight into transcriptome assembly from short reads. After optimisation, de novo assembly resulted in 44,407 contigs with a mean length of 437 bp, and the assembled transcripts were further functionally annotated to detect single nucleotide polymorphisms and simple sequence repeat motifs. Gene expression analysis was also used to compare expression patterns from ovary and testis tissue libraries to identify genes with potential roles in reproduction and sex differentiation. The large transcript set assembled here represents the most comprehensive set of transcriptomic resources ever developed for reproduction traits in M. rosenbergii, and the large number of genetic markers predicted should constitute an invaluable resource for future genetic research studies on M. rosenbergii and can be applied more widely on other freshwater prawn species in the genus Macrobrachium. PMID:27164098
Meneco, a Topology-Based Gap-Filling Tool Applicable to Degraded Genome-Wide Metabolic Networks.
Prigent, Sylvain; Frioux, Clémence; Dittami, Simon M; Thiele, Sven; Larhlimi, Abdelhalim; Collet, Guillaume; Gutknecht, Fabien; Got, Jeanne; Eveillard, Damien; Bourdon, Jérémie; Plewniak, Frédéric; Tonon, Thierry; Siegel, Anne
2017-01-01
Increasing amounts of sequence data are becoming available for a wide range of non-model organisms. Investigating and modelling the metabolic behaviour of those organisms is highly relevant to understand their biology and ecology. As sequences are often incomplete and poorly annotated, draft networks of their metabolism largely suffer from incompleteness. Appropriate gap-filling methods to identify and add missing reactions are therefore required to address this issue. However, current tools rely on phenotypic or taxonomic information, or are very sensitive to the stoichiometric balance of metabolic reactions, especially concerning the co-factors. This type of information is often not available or at least prone to errors for newly-explored organisms. Here we introduce Meneco, a tool dedicated to the topological gap-filling of genome-scale draft metabolic networks. Meneco reformulates gap-filling as a qualitative combinatorial optimization problem, omitting constraints raised by the stoichiometry of a metabolic network considered in other methods, and solves this problem using Answer Set Programming. Run on several artificial test sets gathering 10,800 degraded Escherichia coli networks Meneco was able to efficiently identify essential reactions missing in networks at high degradation rates, outperforming the stoichiometry-based tools in scalability. To demonstrate the utility of Meneco we applied it to two case studies. Its application to recent metabolic networks reconstructed for the brown algal model Ectocarpus siliculosus and an associated bacterium Candidatus Phaeomarinobacter ectocarpi revealed several candidate metabolic pathways for algal-bacterial interactions. Then Meneco was used to reconstruct, from transcriptomic and metabolomic data, the first metabolic network for the microalga Euglena mutabilis. These two case studies show that Meneco is a versatile tool to complete draft genome-scale metabolic networks produced from heterogeneous data, and to suggest relevant reactions that explain the metabolic capacity of a biological system.
BLISS is a versatile and quantitative method for genome-wide profiling of DNA double-strand breaks.
Yan, Winston X; Mirzazadeh, Reza; Garnerone, Silvano; Scott, David; Schneider, Martin W; Kallas, Tomasz; Custodio, Joaquin; Wernersson, Erik; Li, Yinqing; Gao, Linyi; Federova, Yana; Zetsche, Bernd; Zhang, Feng; Bienko, Magda; Crosetto, Nicola
2017-05-12
Precisely measuring the location and frequency of DNA double-strand breaks (DSBs) along the genome is instrumental to understanding genomic fragility, but current methods are limited in versatility, sensitivity or practicality. Here we present Breaks Labeling In Situ and Sequencing (BLISS), featuring the following: (1) direct labelling of DSBs in fixed cells or tissue sections on a solid surface; (2) low-input requirement by linear amplification of tagged DSBs by in vitro transcription; (3) quantification of DSBs through unique molecular identifiers; and (4) easy scalability and multiplexing. We apply BLISS to profile endogenous and exogenous DSBs in low-input samples of cancer cells, embryonic stem cells and liver tissue. We demonstrate the sensitivity of BLISS by assessing the genome-wide off-target activity of two CRISPR-associated RNA-guided endonucleases, Cas9 and Cpf1, observing that Cpf1 has higher specificity than Cas9. Our results establish BLISS as a versatile, sensitive and efficient method for genome-wide DSB mapping in many applications.
Garst, Andrew D; Bassalo, Marcelo C; Pines, Gur; Lynch, Sean A; Halweg-Edwards, Andrea L; Liu, Rongming; Liang, Liya; Wang, Zhiwen; Zeitoun, Ramsey; Alexander, William G; Gill, Ryan T
2017-01-01
Improvements in DNA synthesis and sequencing have underpinned comprehensive assessment of gene function in bacteria and eukaryotes. Genome-wide analyses require high-throughput methods to generate mutations and analyze their phenotypes, but approaches to date have been unable to efficiently link the effects of mutations in coding regions or promoter elements in a highly parallel fashion. We report that CRISPR-Cas9 gene editing in combination with massively parallel oligomer synthesis can enable trackable editing on a genome-wide scale. Our method, CRISPR-enabled trackable genome engineering (CREATE), links each guide RNA to homologous repair cassettes that both edit loci and function as barcodes to track genotype-phenotype relationships. We apply CREATE to site saturation mutagenesis for protein engineering, reconstruction of adaptive laboratory evolution experiments, and identification of stress tolerance and antibiotic resistance genes in bacteria. We provide preliminary evidence that CREATE will work in yeast. We also provide a webtool to design multiplex CREATE libraries.
Creating a RAW264.7 CRISPR-Cas9 Genome Wide Library
Napier, Brooke A; Monack, Denise M
2017-01-01
The bacterial clustered regularly interspaced short palindromic repeats (CRISPR)-Cas9 genome editing tools are used in mammalian cells to knock-out specific genes of interest to elucidate gene function. The CRISPR-Cas9 system requires that the mammalian cell expresses Cas9 endonuclease, guide RNA (gRNA) to lead the endonuclease to the gene of interest, and the PAM sequence that links the Cas9 to the gRNA. CRISPR-Cas9 genome wide libraries are used to screen the effect of each gene in the genome on the cellular phenotype of interest, in an unbiased high-throughput manner. In this protocol, we describe our method of creating a CRISPR-Cas9 genome wide library in a transformed murine macrophage cell-line (RAW264.7). We have employed this library to identify novel mediators in the caspase-11 cell death pathway (Napier et al., 2016); however, this library can then be used to screen the importance of specific genes in multiple murine macrophage cellular pathways. PMID:28868328
Marcelletti, Simone; Scortichini, Marco
2016-10-01
A total of 21 Xylella fastidiosa strains were assessed by comparing their genomes to infer their taxonomic relationships. The whole-genome-based average nucleotide identity and tetranucleotide frequency correlation coefficient analyses were performed. In addition, a consensus tree based on comparisons of 956 core gene families, and a genome-wide phylogenetic tree and a Neighbor-net network were constructed with 820,088 nucleotides (i.e., approximately 30-33 % of the entire X. fastidiosa genome). All approaches revealed the occurrence of three well-demarcated genetic clusters that represent X. fastidiosa subspecies fastidiosa, multiplex and pauca, with the latter appeared to diverge. We suggest that the proposed but never formally described subspecies 'sandyi' and 'morus' are instead members of the subspecies fastidiosa. These analyses support the view that the Xylella strain isolated from Pyrus pyrifolia in Taiwan is likely to be a new species. A widely used multilocus sequence typing analysis yielded conflicting results.
Polstein, Lauren R.; Perez-Pinera, Pablo; Kocak, D. Dewran; Vockley, Christopher M.; Bledsoe, Peggy; Song, Lingyun; Safi, Alexias; Crawford, Gregory E.; Reddy, Timothy E.; Gersbach, Charles A.
2015-01-01
Genome engineering technologies based on the CRISPR/Cas9 and TALE systems are enabling new approaches in science and biotechnology. However, the specificity of these tools in complex genomes and the role of chromatin structure in determining DNA binding are not well understood. We analyzed the genome-wide effects of TALE- and CRISPR-based transcriptional activators in human cells using ChIP-seq to assess DNA-binding specificity and RNA-seq to measure the specificity of perturbing the transcriptome. Additionally, DNase-seq was used to assess genome-wide chromatin remodeling that occurs as a result of their action. Our results show that these transcription factors are highly specific in both DNA binding and gene regulation and are able to open targeted regions of closed chromatin independent of gene activation. Collectively, these results underscore the potential for these technologies to make precise changes to gene expression for gene and cell therapies or fundamental studies of gene function. PMID:26025803
DNA Breaks and End Resection Measured Genome-wide by End Sequencing.
Canela, Andres; Sridharan, Sriram; Sciascia, Nicholas; Tubbs, Anthony; Meltzer, Paul; Sleckman, Barry P; Nussenzweig, André
2016-09-01
DNA double-strand breaks (DSBs) arise during physiological transcription, DNA replication, and antigen receptor diversification. Mistargeting or misprocessing of DSBs can result in pathological structural variation and mutation. Here we describe a sensitive method (END-seq) to monitor DNA end resection and DSBs genome-wide at base-pair resolution in vivo. We utilized END-seq to determine the frequency and spectrum of restriction-enzyme-, zinc-finger-nuclease-, and RAG-induced DSBs. Beyond sequence preference, chromatin features dictate the repertoire of these genome-modifying enzymes. END-seq can detect at least one DSB per cell among 10,000 cells not harboring DSBs, and we estimate that up to one out of 60 cells contains off-target RAG cleavage. In addition to site-specific cleavage, we detect DSBs distributed over extended regions during immunoglobulin class-switch recombination. Thus, END-seq provides a snapshot of DNA ends genome-wide, which can be utilized for understanding genome-editing specificities and the influence of chromatin on DSB pathway choice. Published by Elsevier Inc.
The Genomic Basis of Evolutionary Innovation in Pseudomonas aeruginosa
Wagner, Andreas; MacLean, R. Craig
2016-01-01
Novel traits play a key role in evolution, but their origins remain poorly understood. Here we address this problem by using experimental evolution to study bacterial innovation in real time. We allowed 380 populations of Pseudomonas aeruginosa to adapt to 95 different carbon sources that challenged bacteria with either evolving novel metabolic traits or optimizing existing traits. Whole genome sequencing of more than 80 clones revealed profound differences in the genetic basis of innovation and optimization. Innovation was associated with the rapid acquisition of mutations in genes involved in transcription and metabolism. Mutations in pre-existing duplicate genes in the P. aeruginosa genome were common during innovation, but not optimization. These duplicate genes may have been acquired by P. aeruginosa due to either spontaneous gene amplification or horizontal gene transfer. High throughput phenotype assays revealed that novelty was associated with increased pleiotropic costs that are likely to constrain innovation. However, mutations in duplicate genes with close homologs in the P. aeruginosa genome were associated with low pleiotropic costs compared to mutations in duplicate genes with distant homologs in the P. aeruginosa genome, suggesting that functional redundancy between duplicates facilitates innovation by buffering pleiotropic costs. PMID:27149698
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lan, Yemin; Rosen, Gail; Hershberg, Ruth
The 16s rRNA gene is so far the most widely used marker for taxonomical classification and separation of prokaryotes. Since it is universally conserved among prokaryotes, it is possible to use this gene to classify a broad range of prokaryotic organisms. At the same time, it has often been noted that the 16s rRNA gene is too conserved to separate between prokaryotes at finer taxonomic levels. In this paper, we examine how well levels of similarity of 16s rRNA and 73 additional universal or nearly universal marker genes correlate with genome-wide levels of gene sequence similarity. We demonstrate that themore » percent identity of 16s rRNA predicts genome-wide levels of similarity very well for distantly related prokaryotes, but not for closely related ones. In closely related prokaryotes, we find that there are many other marker genes for which levels of similarity are much more predictive of genome-wide levels of gene sequence similarity. Finally, we show that the identities of the markers that are most useful for predicting genome-wide levels of similarity within closely related prokaryotic lineages vary greatly between lineages. However, the most useful markers are always those that are least conserved in their sequences within each lineage. In conclusion, our results show that by choosing markers that are less conserved in their sequences within a lineage of interest, it is possible to better predict genome-wide gene sequence similarity between closely related prokaryotes than is possible using the 16s rRNA gene. We point readers towards a database we have created (POGO-DB) that can be used to easily establish which markers show lowest levels of sequence conservation within different prokaryotic lineages.« less
Lan, Yemin; Rosen, Gail; Hershberg, Ruth
2016-05-03
The 16s rRNA gene is so far the most widely used marker for taxonomical classification and separation of prokaryotes. Since it is universally conserved among prokaryotes, it is possible to use this gene to classify a broad range of prokaryotic organisms. At the same time, it has often been noted that the 16s rRNA gene is too conserved to separate between prokaryotes at finer taxonomic levels. In this paper, we examine how well levels of similarity of 16s rRNA and 73 additional universal or nearly universal marker genes correlate with genome-wide levels of gene sequence similarity. We demonstrate that themore » percent identity of 16s rRNA predicts genome-wide levels of similarity very well for distantly related prokaryotes, but not for closely related ones. In closely related prokaryotes, we find that there are many other marker genes for which levels of similarity are much more predictive of genome-wide levels of gene sequence similarity. Finally, we show that the identities of the markers that are most useful for predicting genome-wide levels of similarity within closely related prokaryotic lineages vary greatly between lineages. However, the most useful markers are always those that are least conserved in their sequences within each lineage. In conclusion, our results show that by choosing markers that are less conserved in their sequences within a lineage of interest, it is possible to better predict genome-wide gene sequence similarity between closely related prokaryotes than is possible using the 16s rRNA gene. We point readers towards a database we have created (POGO-DB) that can be used to easily establish which markers show lowest levels of sequence conservation within different prokaryotic lineages.« less
Post-Genome Era Pedagogy: How a BS Biotechnology Program Benefits the Liberal Arts Institution
ERIC Educational Resources Information Center
Eden, Peter
2005-01-01
Genomics profoundly affects society, because genome sequence information is widely used in such areas as genetic testing, genomic medicine/vaccine development, and so forth. Therefore, a responsibility to modernize science curricula exists for "post-genome era" educators. At my university, we developed a BS biotechnology program within a…
USDA-ARS?s Scientific Manuscript database
Genomic structural variations are an important source of genetic diversity. Copy number variations (CNVs), gains and losses of large regions of genomic sequence between individuals of a species, are known to be associated with both diseases and phenotypic traits. Deeply sequenced genomes are often u...
Privacy Preserving PCA on Distributed Bioinformatics Datasets
ERIC Educational Resources Information Center
Li, Xin
2011-01-01
In recent years, new bioinformatics technologies, such as gene expression microarray, genome-wide association study, proteomics, and metabolomics, have been widely used to simultaneously identify a huge number of human genomic/genetic biomarkers, generate a tremendously large amount of data, and dramatically increase the knowledge on human…
Genome-wide association studies in maize: praise and stargaze
USDA-ARS?s Scientific Manuscript database
Genome-wide association study (GWAS) has appeared as a widespread strategy in decoding genotype-phenotype associations in many species thanks to technical advances in next-generation sequencing (NGS) applications. Maize is an ideal crop for GWAS and significant progress has been made in the last dec...
A population structure and genome-wide association analysis on the USDA soybean germplasm collection
USDA-ARS?s Scientific Manuscript database
Genotype-phenotype associations within the soybean (Glycine max) germplasm collection could provide valuable information on the frequency and distribution of alleles affecting economically important traits. Here we performed a genome-wide association study (GWAS) for seed protein and oil content in ...
Microfluidics for genome-wide studies involving next generation sequencing
Murphy, Travis W.; Lu, Chang
2017-01-01
Next-generation sequencing (NGS) has revolutionized how molecular biology studies are conducted. Its decreasing cost and increasing throughput permit profiling of genomic, transcriptomic, and epigenomic features for a wide range of applications. Microfluidics has been proven to be highly complementary to NGS technology with its unique capabilities for handling small volumes of samples and providing platforms for automation, integration, and multiplexing. In this article, we review recent progress on applying microfluidics to facilitate genome-wide studies. We emphasize on several technical aspects of NGS and how they benefit from coupling with microfluidic technology. We also summarize recent efforts on developing microfluidic technology for genomic, transcriptomic, and epigenomic studies, with emphasis on single cell analysis. We envision rapid growth in these directions, driven by the needs for testing scarce primary cell samples from patients in the context of precision medicine. PMID:28396707
Economic evaluation of genomic selection in small ruminants: a sheep meat breeding program.
Shumbusho, F; Raoul, J; Astruc, J M; Palhiere, I; Lemarié, S; Fugeray-Scarbel, A; Elsen, J M
2016-06-01
Recent genomic evaluation studies using real data and predicting genetic gain by modeling breeding programs have reported moderate expected benefits from the replacement of classic selection schemes by genomic selection (GS) in small ruminants. The objectives of this study were to compare the cost, monetary genetic gain and economic efficiency of classic selection and GS schemes in the meat sheep industry. Deterministic methods were used to model selection based on multi-trait indices from a sheep meat breeding program. Decisional variables related to male selection candidates and progeny testing were optimized to maximize the annual monetary genetic gain (AMGG), that is, a weighted sum of meat and maternal traits annual genetic gains. For GS, a reference population of 2000 individuals was assumed and genomic information was available for evaluation of male candidates only. In the classic selection scheme, males breeding values were estimated from own and offspring phenotypes. In GS, different scenarios were considered, differing by the information used to select males (genomic only, genomic+own performance, genomic+offspring phenotypes). The results showed that all GS scenarios were associated with higher total variable costs than classic selection (if the cost of genotyping was 123 euros/animal). In terms of AMGG and economic returns, GS scenarios were found to be superior to classic selection only if genomic information was combined with their own meat phenotypes (GS-Pheno) or with their progeny test information. The predicted economic efficiency, defined as returns (proportional to number of expressions of AMGG in the nucleus and commercial flocks) minus total variable costs, showed that the best GS scenario (GS-Pheno) was up to 15% more efficient than classic selection. For all selection scenarios, optimization increased the overall AMGG, returns and economic efficiency. As a conclusion, our study shows that some forms of GS strategies are more advantageous than classic selection, provided that GS is already initiated (i.e. the initial reference population is available). Optimizing decisional variables of the classic selection scheme could be of greater benefit than including genomic information in optimized designs.
Sandoval-Castillo, Jonathan; Jenner, K. Curt S.; Gill, Peter C.; Jenner, Micheline-Nicole M.; Morrice, Margaret G.
2018-01-01
Genetic datasets of tens of markers have been superseded through next-generation sequencing technology with genome-wide datasets of thousands of markers. Genomic datasets improve our power to detect low population structure and identify adaptive divergence. The increased population-level knowledge can inform the conservation management of endangered species, such as the blue whale (Balaenoptera musculus). In Australia, there are two known feeding aggregations of the pygmy blue whale (B. m. brevicauda) which have shown no evidence of genetic structure based on a small dataset of 10 microsatellites and mtDNA. Here, we develop and implement a high-resolution dataset of 8294 genome-wide filtered single nucleotide polymorphisms, the first of its kind for blue whales. We use these data to assess whether the Australian feeding aggregations constitute one population and to test for the first time whether there is adaptive divergence between the feeding aggregations. We found no evidence of neutral population structure and negligible evidence of adaptive divergence. We propose that individuals likely travel widely between feeding areas and to breeding areas, which would require them to be adapted to a wide range of environmental conditions. This has important implications for their conservation as this blue whale population is likely vulnerable to a range of anthropogenic threats both off Australia and elsewhere. PMID:29410806
Wei, Wen-Hua; Massey, Jonathan; Worthington, Jane; Barton, Anne; Warren, Richard B
2018-03-01
Genome-wide association studies (GWASs) have identified a number of loci for psoriasis but largely ignored non-additive effects. We report a genotypic variability-based GWAS (vGWAS) that can prioritize non-additive loci without requiring prior knowledge of interaction types or interacting factors in two steps, using a mixed model to partition dichotomous phenotypes into an additive component and non-additive environmental residuals on the liability scale and then the Levene's (Brown-Forsythe) test to assess equality of the residual variances across genotype groups genome widely. The vGWAS identified two genome-wide significant (P < 5.0e-08) non-additive loci HLA-C and IL12B that were also genome-wide significant in an accompanying GWAS in the discovery cohort. Both loci were statistically replicated in vGWAS of an independent cohort with a small sample size. HLA-C and IL12B were reported in moderate gene-gene and/or gene-environment interactions in several occasions. We found a moderate interaction with age-of-onset of psoriasis, which was replicated indirectly. The vGWAS also revealed five suggestive loci (P < 6.76e-05) including FUT2 that was associated with psoriasis with environmental aspects triggered by virus infection and/or metabolic factors. Replication and functional investigation are needed to validate the suggestive vGWAS loci.
Hattori, Hiroyoshi; Janky, Rekin's; Nietfeld, Wilfried; Aerts, Stein; Madan Babu, M; Venkitaraman, Ashok R
2014-01-01
The human DNA damage response (DDR) triggers profound changes in gene expression, whose nature and regulation remain uncertain. Although certain micro-(mi)RNA species including miR34, miR-18, miR-16 and miR-143 have been implicated in the DDR, there is as yet no comprehensive description of genome-wide changes in the expression of miRNAs triggered by DNA breakage in human cells. We have used next-generation sequencing (NGS), combined with rigorous integrative computational analyses, to describe genome-wide changes in the expression of miRNAs during the human DDR. The changes affect 150 of 1523 miRNAs known in miRBase v18 from 4-24 h after the induction of DNA breakage, in cell-type dependent patterns. The regulatory regions of the most-highly regulated miRNA species are enriched in conserved binding sites for p53. Indeed, genome-wide changes in miRNA expression during the DDR are markedly altered in TP53-/- cells compared to otherwise isogenic controls. The expression levels of certain damage-induced, p53-regulated miRNAs in cancer samples correlate with patient survival. Our work reveals genome-wide and cell type-specific alterations in miRNA expression during the human DDR, which are regulated by the tumor suppressor protein p53. These findings provide a genomic resource to identify new molecules and mechanisms involved in the DDR, and to examine their role in tumor suppression and the clinical outcome of cancer patients.
Shen, Wei; Paxton, Christian N; Szankasi, Philippe; Longhurst, Maria; Schumacher, Jonathan A; Frizzell, Kimberly A; Sorrells, Shelly M; Clayton, Adam L; Jattani, Rakhi P; Patel, Jay L; Toydemir, Reha; Kelley, Todd W; Xu, Xinjie
2018-04-01
Genetic abnormalities, including copy number variants (CNV), copy number neutral loss of heterozygosity (CN-LOH) and gene mutations, underlie the pathogenesis of myeloid malignancies and serve as important diagnostic, prognostic and/or therapeutic markers. Currently, multiple testing strategies are required for comprehensive genetic testing in myeloid malignancies. The aim of this proof-of-principle study was to investigate the feasibility of combining detection of genome-wide large CNVs, CN-LOH and targeted gene mutations into a single assay using next-generation sequencing (NGS). For genome-wide CNV detection, we designed a single nucleotide polymorphism (SNP) sequencing backbone with 22 762 SNP regions evenly distributed across the entire genome. For targeted mutation detection, 62 frequently mutated genes in myeloid malignancies were targeted. We combined this SNP sequencing backbone with a targeted mutation panel, and sequenced 9 healthy individuals and 16 patients with myeloid malignancies using NGS. We detected 52 somatic CNVs, 11 instances of CN-LOH and 39 oncogenic mutations in the 16 patients with myeloid malignancies, and none in the 9 healthy individuals. All CNVs and CN-LOH were confirmed by SNP microarray analysis. We describe a genome-wide SNP sequencing backbone which allows for sensitive detection of genome-wide CNVs and CN-LOH using NGS. This proof-of-principle study has demonstrated that this strategy can provide more comprehensive genetic profiling for patients with myeloid malignancies using a single assay. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Meta-analysis of 32 genome-wide linkage studies of schizophrenia
Ng, MYM; Levinson, DF; Faraone, SV; Suarez, BK; DeLisi, LE; Arinami, T; Riley, B; Paunio, T; Pulver, AE; Irmansyah; Holmans, PA; Escamilla, M; Wildenauer, DB; Williams, NM; Laurent, C; Mowry, BJ; Brzustowicz, LM; Maziade, M; Sklar, P; Garver, DL; Abecasis, GR; Lerer, B; Fallin, MD; Gurling, HMD; Gejman, PV; Lindholm, E; Moises, HW; Byerley, W; Wijsman, EM; Forabosco, P; Tsuang, MT; Hwu, H-G; Okazaki, Y; Kendler, KS; Wormley, B; Fanous, A; Walsh, D; O’Neill, FA; Peltonen, L; Nestadt, G; Lasseter, VK; Liang, KY; Papadimitriou, GM; Dikeos, DG; Schwab, SG; Owen, MJ; O’Donovan, MC; Norton, N; Hare, E; Raventos, H; Nicolini, H; Albus, M; Maier, W; Nimgaonkar, VL; Terenius, L; Mallet, J; Jay, M; Godard, S; Nertney, D; Alexander, M; Crowe, RR; Silverman, JM; Bassett, AS; Roy, M-A; Mérette, C; Pato, CN; Pato, MT; Roos, J Louw; Kohn, Y; Amann-Zalcenstein, D; Kalsi, G; McQuillin, A; Curtis, D; Brynjolfson, J; Sigmundsson, T; Petursson, H; Sanders, AR; Duan, J; Jazin, E; Myles-Worsley, M; Karayiorgou, M; Lewis, CM
2009-01-01
A genome scan meta-analysis (GSMA) was carried out on 32 independent genome-wide linkage scan analyses that included 3255 pedigrees with 7413 genotyped cases affected with schizophrenia (SCZ) or related disorders. The primary GSMA divided the autosomes into 120 bins, rank-ordered the bins within each study according to the most positive linkage result in each bin, summed these ranks (weighted for study size) for each bin across studies and determined the empirical probability of a given summed rank (PSR) by simulation. Suggestive evidence for linkage was observed in two single bins, on chromosomes 5q (142-168 Mb) and 2q (103-134 Mb). Genome-wide evidence for linkage was detected on chromosome 2q (119-152 Mb) when bin boundaries were shifted to the middle of the previous bins. The primary analysis met empirical criteria for ‘aggregate’ genome-wide significance, indicating that some or all of 10 bins are likely to contain loci linked to SCZ, including regions of chromosomes 1, 2q, 3q, 4q, 5q, 8p and 10q. In a secondary analysis of 22 studies of European-ancestry samples, suggestive evidence for linkage was observed on chromosome 8p (16-33 Mb). Although the newer genome-wide association methodology has greater power to detect weak associations to single common DNA sequence variants, linkage analysis can detect diverse genetic effects that segregate in families, including multiple rare variants within one locus or several weakly associated loci in the same region. Therefore, the regions supported by this meta-analysis deserve close attention in future studies. PMID:19349958
HIV Genome-Wide Protein Associations: a Review of 30 Years of Research
2016-01-01
SUMMARY The HIV genome encodes a small number of viral proteins (i.e., 16), invariably establishing cooperative associations among HIV proteins and between HIV and host proteins, to invade host cells and hijack their internal machineries. As a known example, the HIV envelope glycoprotein GP120 is closely associated with GP41 for viral entry. From a genome-wide perspective, a hypothesis can be worked out to determine whether 16 HIV proteins could develop 120 possible pairwise associations either by physical interactions or by functional associations mediated via HIV or host molecules. Here, we present the first systematic review of experimental evidence on HIV genome-wide protein associations using a large body of publications accumulated over the past 3 decades. Of 120 possible pairwise associations between 16 HIV proteins, at least 34 physical interactions and 17 functional associations have been identified. To achieve efficient viral replication and infection, HIV protein associations play essential roles (e.g., cleavage, inhibition, and activation) during the HIV life cycle. In either a dispensable or an indispensable manner, each HIV protein collaborates with another viral protein to accomplish specific activities that precisely take place at the proper stages of the HIV life cycle. In addition, HIV genome-wide protein associations have an impact on anti-HIV inhibitors due to the extensive cross talk between drug-inhibited proteins and other HIV proteins. Overall, this study presents for the first time a comprehensive overview of HIV genome-wide protein associations, highlighting meticulous collaborations between all viral proteins during the HIV life cycle. PMID:27357278
iCLIP: Protein–RNA interactions at nucleotide resolution
Huppertz, Ina; Attig, Jan; D’Ambrogio, Andrea; Easton, Laura E.; Sibley, Christopher R.; Sugimoto, Yoichiro; Tajnik, Mojca; König, Julian; Ule, Jernej
2014-01-01
RNA-binding proteins (RBPs) are key players in the post-transcriptional regulation of gene expression. Precise knowledge about their binding sites is therefore critical to unravel their molecular function and to understand their role in development and disease. Individual-nucleotide resolution UV crosslinking and immunoprecipitation (iCLIP) identifies protein–RNA crosslink sites on a genome-wide scale. The high resolution and specificity of this method are achieved by an intramolecular cDNA circularization step that enables analysis of cDNAs that truncated at the protein–RNA crosslink sites. Here, we describe the improved iCLIP protocol and discuss critical optimization and control experiments that are required when applying the method to new RBPs. PMID:24184352
A Genome-Wide Investigation of Autozygosity and Breast Cancer Risk
2011-07-01
cases than in controls, using logistic regression methods. Using genome-wide SNP data (525,000 SNPs) on 1,647 non-Hispanic white, early-onset...premenopausal breast cancer cases and 1,556 matched controls we identified over 65,000 individual RoHs and 423 genomic regions harbor RoHs for at least 10...we hypothesize that germline autozygosity is more common in breast cancer cases than in controls. More specifically, we hypothesize that there are
Kanai, Masahiro; Tanaka, Toshihiro; Okada, Yukinori
2016-10-01
To assess the statistical significance of associations between variants and traits, genome-wide association studies (GWAS) should employ an appropriate threshold that accounts for the massive burden of multiple testing in the study. Although most studies in the current literature commonly set a genome-wide significance threshold at the level of P=5.0 × 10 -8 , the adequacy of this value for respective populations has not been fully investigated. To empirically estimate thresholds for different ancestral populations, we conducted GWAS simulations using the 1000 Genomes Phase 3 data set for Africans (AFR), Europeans (EUR), Admixed Americans (AMR), East Asians (EAS) and South Asians (SAS). The estimated empirical genome-wide significance thresholds were P sig =3.24 × 10 -8 (AFR), 9.26 × 10 -8 (EUR), 1.83 × 10 -7 (AMR), 1.61 × 10 -7 (EAS) and 9.46 × 10 -8 (SAS). We additionally conducted trans-ethnic meta-analyses across all populations (ALL) and all populations except for AFR (ΔAFR), which yielded P sig =3.25 × 10 -8 (ALL) and 4.20 × 10 -8 (ΔAFR). Our results indicate that the current threshold (P=5.0 × 10 -8 ) is overly stringent for all ancestral populations except for Africans; however, we should employ a more stringent threshold when conducting a meta-analysis, regardless of the presence of African samples.
Tabas-Madrid, Daniel; Méndez-Vigo, Belén; Arteaga, Noelia; Marcer, Arnald; Pascual-Montano, Alberto; Weigel, Detlef; Xavier Picó, F; Alonso-Blanco, Carlos
2018-03-08
Current global change is fueling an interest to understand the genetic and molecular mechanisms of plant adaptation to climate. In particular, altered flowering time is a common strategy for escape from unfavourable climate temperature. In order to determine the genomic bases underlying flowering time adaptation to this climatic factor, we have systematically analysed a collection of 174 highly diverse Arabidopsis thaliana accessions from the Iberian Peninsula. Analyses of 1.88 million single nucleotide polymorphisms provide evidence for a spatially heterogeneous contribution of demographic and adaptive processes to geographic patterns of genetic variation. Mountains appear to be allele dispersal barriers, whereas the relationship between flowering time and temperature depended on the precise temperature range. Environmental genome-wide associations supported an overall genome adaptation to temperature, with 9.4% of the genes showing significant associations. Furthermore, phenotypic genome-wide associations provided a catalogue of candidate genes underlying flowering time variation. Finally, comparison of environmental and phenotypic genome-wide associations identified known (Twin Sister of FT, FRIGIDA-like 1, and Casein Kinase II Beta chain 1) and new (Epithiospecifer Modifier 1 and Voltage-Dependent Anion Channel 5) genes as candidates for adaptation to climate temperature by altered flowering time. Thus, this regional collection provides an excellent resource to address the spatial complexity of climate adaptation in annual plants. © 2018 John Wiley & Sons Ltd.
Li, Ming-Rui; Shi, Feng-Xue; Li, Ya-Ling; Jiang, Peng; Jiao, Lili; Liu, Bao; Li, Lin-Feng
2017-09-01
Chinese ginseng (Panax ginseng Meyer) is a medicinally important herb and plays crucial roles in traditional Chinese medicine. Pharmacological analyses identified diverse bioactive components from Chinese ginseng. However, basic biological attributes including domestication and selection of the ginseng plant remain under-investigated. Here, we presented a genome-wide view of the domestication and selection of cultivated ginseng based on the whole genome data. A total of 8,660 protein-coding genes were selected for genome-wide scanning of the 30 wild and cultivated ginseng accessions. In complement, the 45s rDNA, chloroplast and mitochondrial genomes were included to perform phylogenetic and population genetic analyses. The observed spatial genetic structure between northern cultivated ginseng (NCG) and southern cultivated ginseng (SCG) accessions suggested multiple independent origins of cultivated ginseng. Genome-wide scanning further demonstrated that NCG and SCG have undergone distinct selection pressures during the domestication process, with more genes identified in the NCG (97 genes) than in the SCG group (5 genes). Functional analyses revealed that these genes are involved in diverse pathways, including DNA methylation, lignin biosynthesis, and cell differentiation. These findings suggested that the SCG and NCG groups have distinct demographic histories. Candidate genes identified are useful for future molecular breeding of cultivated ginseng. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Adaptation of a commercial robot for genome library replication
DOE Office of Scientific and Technical Information (OSTI.GOV)
Uber, D.C.; Searles, W.L.
1994-01-01
This report describes tools and fixtures developed at the Human Genome Center at Lawrence Berkeley Laboratory for the Hewlett-Packard ORCA{trademark} (Optimized Robot for Chemical Analysis) to replicate large genome libraries. Photographs and engineering drawings of the various custom-designed components are included.
2017-01-01
Recent advances in next-generation sequencing approaches have revolutionized our understanding of transcriptional expression in diverse systems. However, measurements of transcription do not necessarily reflect gene translation, the process of ultimate importance in understanding cellular function. To circumvent this limitation, biochemical tagging of ribosome subunits to isolate ribosome-associated mRNA has been developed. However, this approach, called TRAP, lacks quantitative resolution compared to a superior technology, ribosome profiling. Here, we report the development of an optimized ribosome profiling approach in Drosophila. We first demonstrate successful ribosome profiling from a specific tissue, larval muscle, with enhanced resolution compared to conventional TRAP approaches. We next validate the ability of this technology to define genome-wide translational regulation. This technology is leveraged to test the relative contributions of transcriptional and translational mechanisms in the postsynaptic muscle that orchestrate the retrograde control of presynaptic function at the neuromuscular junction. Surprisingly, we find no evidence that significant changes in the transcription or translation of specific genes are necessary to enable retrograde homeostatic signaling, implying that post-translational mechanisms ultimately gate instructive retrograde communication. Finally, we show that a global increase in translation induces adaptive responses in both transcription and translation of protein chaperones and degradation factors to promote cellular proteostasis. Together, this development and validation of tissue-specific ribosome profiling enables sensitive and specific analysis of translation in Drosophila. PMID:29194454
Photoperiod-H1 (Ppd-H1) Controls Leaf Size.
Digel, Benedikt; Tavakol, Elahe; Verderio, Gabriele; Tondelli, Alessandro; Xu, Xin; Cattivelli, Luigi; Rossini, Laura; von Korff, Maria
2016-09-01
Leaf size is a major determinant of plant photosynthetic activity and biomass; however, it is poorly understood how leaf size is genetically controlled in cereal crop plants like barley (Hordeum vulgare). We conducted a genome-wide association scan for flowering time, leaf width, and leaf length in a diverse panel of European winter cultivars grown in the field and genotyped with a single-nucleotide polymorphism array. The genome-wide association scan identified PHOTOPERIOD-H1 (Ppd-H1) as a candidate gene underlying the major quantitative trait loci for flowering time and leaf size in the barley population. Microscopic phenotyping of three independent introgression lines confirmed the effect of Ppd-H1 on leaf size. Differences in the duration of leaf growth and consequent variation in leaf cell number were responsible for the leaf size differences between the Ppd-H1 variants. The Ppd-H1-dependent induction of the BARLEY MADS BOX genes BM3 and BM8 in the leaf correlated with reductions in leaf size and leaf number. Our results indicate that leaf size is controlled by the Ppd-H1- and photoperiod-dependent progression of plant development. The coordination of leaf growth with flowering may be part of a reproductive strategy to optimize resource allocation to the developing inflorescences and seeds. © 2016 American Society of Plant Biologists. All rights reserved.
Metabolism and evolution: A comparative study of reconstructed genome-level metabolic networks
NASA Astrophysics Data System (ADS)
Almaas, Eivind
2008-03-01
The availability of high-quality annotations of sequenced genomes has made it possible to generate organism-specific comprehensive maps of cellular metabolism. Currently, more than twenty such metabolic reconstructions are publicly available, with the majority focused on bacteria. A typical metabolic reconstruction for a bacterium results in a complex network containing hundreds of metabolites (nodes) and reactions (links), while some even contain more than a thousand. The constrain-based optimization approach of flux-balance analysis (FBA) is used to investigate the functional characteristics of such large-scale metabolic networks, making it possible to estimate an organism's growth behavior in a wide variety of nutrient environments, as well as its robustness to gene loss. We have recently completed the genome-level metabolic reconstruction of Yersinia pseudotuberculosis, as well as the three Yersinia pestis biovars Antiqua, Mediaevalis, and Orientalis. While Y. pseudotuberculosis typically only causes fever and abdominal pain that can mimic appendicitis, the evolutionary closely related Y. pestis strains are the aetiological agents of the bubonic plague. In this presentation, I will discuss our results and conclusions from a comparative study on the evolution of metabolic function in the four Yersiniae networks using FBA and related techniques, and I will give particular focus to the interplay between metabolic network topology and evolutionary flexibility.
Singh, Pradip Kumar; Chittpurna; Ashish; Sharma, Vikas; Patil, Prabhu B.; Korpole, Suresh
2012-01-01
Background Bacteriocins are antimicrobial peptides that are produced by bacteria as a defense mechanism in complex environments. Identification and characterization of novel bacteriocins in novel strains of bacteria is one of the important fields in bacteriology. Methodology/Findings The strain GI-9 was identified as Brevibacillus sp. by 16 S rRNA gene sequence analysis. The bacteriocin produced by strain GI-9, namely, laterosporulin was purified from supernatant of the culture grown under optimal conditions using hydrophobic interaction chromatography and reverse-phase HPLC. The bacteriocin was active against a wide range of Gram-positive and Gram-negative bacteria. MALDI-TOF experiments determined the precise molecular mass of the peptide to be of 5.6 kDa and N-terminal sequencing of the thermo-stable peptide revealed low similarity with existing antimicrobial peptides. The putative open reading frame (ORF) encoding laterosporulin and its surrounding genomic region was fished out from the draft genome sequence of GI-9. Sequence analysis of the putative bacteriocin gene did not show significant similarity to any reported bacteriocin producing genes in database. Conclusions We have identified a bacteriocin producing strain GI-9, belonging to the genus Brevibacillus sp. Biochemical and genomic characterization of laterosporulin suggests it as a novel bacteriocin with broad spectrum antibacterial activity. PMID:22403615
Singh, Pradip Kumar; Chittpurna; Ashish; Sharma, Vikas; Patil, Prabhu B; Korpole, Suresh
2012-01-01
Bacteriocins are antimicrobial peptides that are produced by bacteria as a defense mechanism in complex environments. Identification and characterization of novel bacteriocins in novel strains of bacteria is one of the important fields in bacteriology. The strain GI-9 was identified as Brevibacillus sp. by 16 S rRNA gene sequence analysis. The bacteriocin produced by strain GI-9, namely, laterosporulin was purified from supernatant of the culture grown under optimal conditions using hydrophobic interaction chromatography and reverse-phase HPLC. The bacteriocin was active against a wide range of Gram-positive and Gram-negative bacteria. MALDI-TOF experiments determined the precise molecular mass of the peptide to be of 5.6 kDa and N-terminal sequencing of the thermo-stable peptide revealed low similarity with existing antimicrobial peptides. The putative open reading frame (ORF) encoding laterosporulin and its surrounding genomic region was fished out from the draft genome sequence of GI-9. Sequence analysis of the putative bacteriocin gene did not show significant similarity to any reported bacteriocin producing genes in database. We have identified a bacteriocin producing strain GI-9, belonging to the genus Brevibacillus sp. Biochemical and genomic characterization of laterosporulin suggests it as a novel bacteriocin with broad spectrum antibacterial activity.
Guan, Ningzi; Zhuge, Xin; Li, Jianghua; Shin, Hyun-Dong; Wu, Jing; Shi, Zhongping; Liu, Long
2015-01-01
Propionibacteria are actinobacteria consisting of two principal groups: cutaneous and dairy. Cutaneous propionibacteria are considered primary pathogens to humans, whereas dairy propionibacteria are widely used in the food and pharmaceutical industries. Increasing attention has been focused on improving the performance of dairy propionibacteria for the production of industrially important chemicals, and significant advances have been made through strain engineering and process optimization in the production of flavor compounds, nutraceuticals, and antimicrobial compounds. In addition, genome sequencing of several propionibacteria species has been completed, deepening understanding of the metabolic and physiological features of these organisms. However, the metabolic engineering of propionibacteria still faces several challenges owing to the lack of efficient genome manipulation tools and the existence of various types of strong restriction-modification systems. The emergence of systems and synthetic biology provides new opportunities to overcome these bottlenecks. In this review, we first introduce the major species of propionibacteria and their properties and provide an overview of their functions and applications. We then discuss advances in the genome sequencing and metabolic engineering of these bacteria. Finally, we discuss systems and synthetic biology approaches for engineering propionibacteria as efficient and robust cell factories for the production of industrially important chemicals.
Barría, Agustín; Christensen, Kris A.; Yoshida, Grazyella M.; Correa, Katharina; Jedlicki, Ana; Lhorente, Jean P.; Davidson, William S.; Yáñez, José M.
2018-01-01
Piscirickettsia salmonis is one of the main infectious diseases affecting coho salmon (Oncorhynchus kisutch) farming, and current treatments have been ineffective for the control of this disease. Genetic improvement for P. salmonis resistance has been proposed as a feasible alternative for the control of this infectious disease in farmed fish. Genotyping by sequencing (GBS) strategies allow genotyping of hundreds of individuals with thousands of single nucleotide polymorphisms (SNPs), which can be used to perform genome wide association studies (GWAS) and predict genetic values using genome-wide information. We used double-digest restriction-site associated DNA (ddRAD) sequencing to dissect the genetic architecture of resistance against P. salmonis in a farmed coho salmon population and to identify molecular markers associated with the trait. We also evaluated genomic selection (GS) models in order to determine the potential to accelerate the genetic improvement of this trait by means of using genome-wide molecular information. A total of 764 individuals from 33 full-sib families (17 highly resistant and 16 highly susceptible) were experimentally challenged against P. salmonis and their genotypes were assayed using ddRAD sequencing. A total of 9,389 SNPs markers were identified in the population. These markers were used to test genomic selection models and compare different GWAS methodologies for resistance measured as day of death (DD) and binary survival (BIN). Genomic selection models showed higher accuracies than the traditional pedigree-based best linear unbiased prediction (PBLUP) method, for both DD and BIN. The models showed an improvement of up to 95% and 155% respectively over PBLUP. One SNP related with B-cell development was identified as a potential functional candidate associated with resistance to P. salmonis defined as DD. PMID:29440129
Genomic newborn screening: public health policy considerations and recommendations.
Friedman, Jan M; Cornel, Martina C; Goldenberg, Aaron J; Lister, Karla J; Sénécal, Karine; Vears, Danya F
2017-02-21
The use of genome-wide (whole genome or exome) sequencing for population-based newborn screening presents an opportunity to detect and treat or prevent many more serious early-onset health conditions than is possible today. The Paediatric Task Team of the Global Alliance for Genomics and Health's Regulatory and Ethics Working Group reviewed current understanding and concerns regarding the use of genomic technologies for population-based newborn screening and developed, by consensus, eight recommendations for clinicians, clinical laboratory scientists, and policy makers. Before genome-wide sequencing can be implemented in newborn screening programs, its clinical utility and cost-effectiveness must be demonstrated, and the ability to distinguish disease-causing and benign variants of all genes screened must be established. In addition, each jurisdiction needs to resolve ethical and policy issues regarding the disclosure of incidental or secondary findings to families and ownership, appropriate storage and sharing of genomic data. The best interests of children should be the basis for all decisions regarding the implementation of genomic newborn screening.
USDA-ARS?s Scientific Manuscript database
To balance the demand for uptake of essential elements with their potential toxicity living cells have complex regulatory mechanisms. Here, we describe a genome-wide screen to identify genes that impact the elemental composition (‘ionome’) of yeast Saccharomyces cerevisiae. Using inductively coupled...
Copy number variation of individual cattle genomes using next-generation sequencing
USDA-ARS?s Scientific Manuscript database
Copy number variations (CNVs) affect a wide range of phenotypic traits; however, CNVs in or near segmental duplication regions are often intractable. Using a read depth approach based on next-generation sequencing, we examined genome-wide copy number differences among five taurine (three Angus, one ...
Copy number variation of individual cattle genomes using next-generation sequencing
USDA-ARS?s Scientific Manuscript database
Copy Number Variations (CNVs) affect a wide range of phenotypic traits; however, CNVs in or near segmental duplication regions are often difficult to track. Using a read depth approach based on next generation sequencing, we examined genome-wide copy number differences among five taurine (three Angu...
Linkage Disequilibrium And Genome-Wide Association Studies In O. sativa
USDA-ARS?s Scientific Manuscript database
There is increasing evidence that genome-wide association studies provide a powerful approach to find the genetic basis of complex phenotypic variation in all kinds of species. For this purpose, we developed the first generation 44K Affymetrix SNP array in rice (see Tung et al. poster). We genotyped...
Genome-wide interactions with dairy intake for body mass index in adults of European descent
USDA-ARS?s Scientific Manuscript database
Scope: Body weight responds variably to the intake of dairy foods. Genetic variation may contribute to inter-individual variability in associations between body weight and dairy consumption. Methods and results: We conducted a genome-wide interaction study to discover genetic variants that account f...
USDA-ARS?s Scientific Manuscript database
Multi-locus genome-wide association studies has become the state-of-the-art procedure to identify quantitative trait loci (QTL) associated with traits simultaneously. However, implementation of multi-locus model is still difficult. In this study, we integrated least angle regression with empirical B...
Software engineering the mixed model for genome-wide association studies on large samples
USDA-ARS?s Scientific Manuscript database
Mixed models improve the ability to detect phenotype-genotype associations in the presence of population stratification and multiple levels of relatedness in genome-wide association studies (GWAS), but for large data sets the resource consumption becomes impractical. At the same time, the sample siz...
Genome-wide characterization of Mediator recruitment, function, and regulation.
Grünberg, Sebastian; Zentner, Gabriel E
2017-05-27
Mediator is a conserved and essential coactivator complex broadly required for RNA polymerase II (RNAPII) transcription. Recent genome-wide studies of Mediator binding in budding yeast have revealed new insights into the functions of this critical complex and raised new questions about its role in the regulation of gene expression.
Genome-wide association mapping of partial resistance to Aphanomyces euteiches in pea
USDA-ARS?s Scientific Manuscript database
Genome-wide association mapping has recently emerged as a valuable approach to refine genetic basis of polygenic resistance to plant diseases, which are increasingly used in integrated strategies for durable crop protection. Aphanomyces euteiches is a soil borne pathogen of pea and other legumes wor...
USDA-ARS?s Scientific Manuscript database
Single nucleotide polymorphisms (SNPs) are ideally suited for the construction of high-resolution genetic maps, studying population evolutionary history and performing genome-wide association mapping experiments. Here we used a genome-wide set of 1536 SNPs to study linkage disequilibrium (LD) and po...
USDA-ARS?s Scientific Manuscript database
Genome-wide association studies (GWAS) are a powerful method to dissect the genetic basis of traits, though in practice the effects of complex genetic architecture and population structure remain poorly understood. To compare mapping strategies we dissect the genetic control of flavonoid pigmentatio...
A genome-wide association study platform built on iPlant cyber-infrastructure
USDA-ARS?s Scientific Manuscript database
We demonstrated a flexible Genome-Wide Association (GWA) Study (GWAS) platform built upon the iPlant Collaborative Cyber-infrastructure. The platform supports big data management, sharing, and large scale study of both genotype and phenotype data on clusters. End users can add their own analysis too...
Genome wide association analysis for seedling response traits to thermal stress in sorghum germplasm
USDA-ARS?s Scientific Manuscript database
The sorghum association panel exhibited extensive variation for seedling traits under cold and heat stress. Genome-wide analyses identified thirty single nucleotide polymorphisms (SNPs) that were strongly associated with traits measured at seedling stage under cold stress and tagged genes that act a...
Genome-wide association study for carcass traits in a composite beef cattle breed
USDA-ARS?s Scientific Manuscript database
Improvement of carcass traits is highly emphasized in beef cattle production in order to meet consumer demands. Discovering and understanding genes and genetic variants that control these traits is of paramount importance. In this study, different genome wide association approaches (ssGWAS, Bayes A...
Genome-wide association analysis of symbiotic nitrogen fixation in common bean
USDA-ARS?s Scientific Manuscript database
A genome-wide association study (GWAS) was conducted to explore the genetic basis of variation for symbiotic nitrogen fixation (SNF) and related traits in the Andean diversity panel (ADP) comprised of 259 common bean (Phaseolus vulgaris) genotypes. The ADP was evaluated for SNF and related traits in...
Genome-wide association study of swine farrowing traits. Part II: Bayesian analysis of marker data
USDA-ARS?s Scientific Manuscript database
Reproductive efficiency has a great impact on the economic success of pork production. Number born alive (NBA) and average piglet birth weight (ABW) contribute greatly to reproductive efficiency. To better understand the underlying genetics of birth traits, a genome wide association study (GWAS) w...
Migault, Vincent; Pallas, Benoît; Costes, Evelyne
2016-01-01
In crops, optimizing target traits in breeding programs can be fostered by selecting appropriate combinations of architectural traits which determine light interception and carbon acquisition. In apple tree, architectural traits were observed to be under genetic control. However, architectural traits also result from many organogenetic and morphological processes interacting with the environment. The present study aimed at combining a FSPM built for apple tree, MAppleT, with genetic determinisms of architectural traits, previously described in a bi-parental population. We focused on parameters related to organogenesis (phyllochron and immediate branching) and morphogenesis processes (internode length and leaf area) during the first year of tree growth. Two independent datasets collected in 2004 and 2007 on 116 genotypes, issued from a 'Starkrimson' × 'Granny Smith' cross, were used. The phyllochron was estimated as a function of thermal time and sylleptic branching was modeled subsequently depending on phyllochron. From a genetic map built with SNPs, marker effects were estimated on four MAppleT parameters with rrBLUP, using 2007 data. These effects were then considered in MAppleT to simulate tree development in the two climatic conditions. The genome wide prediction model gave consistent estimations of parameter values with correlation coefficients between observed values and estimated values from SNP markers ranging from 0.79 to 0.96. However, the accuracy of the prediction model following cross validation schemas was lower. Three integrative traits (the number of leaves, trunk length, and number of sylleptic laterals) were considered for validating MAppleT simulations. In 2007 climatic conditions, simulated values were close to observations, highlighting the correct simulation of genetic variability. However, in 2004 conditions which were not used for model calibration, the simulations differed from observations. This study demonstrates the possibility of integrating genome-based information in a FSPM for a perennial fruit tree. It also showed that further improvements are required for improving the prediction ability. Especially temperature effect should be extended and other factors taken into account for modeling GxE interactions. Improvements could also be expected by considering larger populations and by testing other genome wide prediction models. Despite these limitations, this study opens new possibilities for supporting plant breeding by in silico evaluations of the impact of genotypic polymorphisms on plant integrative phenotypes.
A genome-wide SNP scan accelerates trait-regulatory genomic loci identification in chickpea
Kujur, Alice; Bajaj, Deepak; Upadhyaya, Hari D.; Das, Shouvik; Ranjan, Rajeev; Shree, Tanima; Saxena, Maneesha S.; Badoni, Saurabh; Kumar, Vinod; Tripathi, Shailesh; Gowda, C.L.L.; Sharma, Shivali; Singh, Sube; Tyagi, Akhilesh K.; Parida, Swarup K.
2015-01-01
We identified 44844 high-quality SNPs by sequencing 92 diverse chickpea accessions belonging to a seed and pod trait-specific association panel using reference genome- and de novo-based GBS (genotyping-by-sequencing) assays. A GWAS (genome-wide association study) in an association panel of 211, including the 92 sequenced accessions, identified 22 major genomic loci showing significant association (explaining 23–47% phenotypic variation) with pod and seed number/plant and 100-seed weight. Eighteen trait-regulatory major genomic loci underlying 13 robust QTLs were validated and mapped on an intra-specific genetic linkage map by QTL mapping. A combinatorial approach of GWAS, QTL mapping and gene haplotype-specific LD mapping and transcript profiling uncovered one superior haplotype and favourable natural allelic variants in the upstream regulatory region of a CesA-type cellulose synthase (Ca_Kabuli_CesA3) gene regulating high pod and seed number/plant (explaining 47% phenotypic variation) in chickpea. The up-regulation of this superior gene haplotype correlated with increased transcript expression of Ca_Kabuli_CesA3 gene in the pollen and pod of high pod/seed number accession, resulting in higher cellulose accumulation for normal pollen and pollen tube growth. A rapid combinatorial genome-wide SNP genotyping-based approach has potential to dissect complex quantitative agronomic traits and delineate trait-regulatory genomic loci (candidate genes) for genetic enhancement in crop plants, including chickpea. PMID:26058368
The function and evolution of the Aspergillus genome
Gibbons, John G.; Rokas, Antonis
2012-01-01
Species in the filamentous fungal genus Aspergillus display a wide diversity of lifestyles and are of great importance to humans. The decoding of genome sequences from a dozen species that vary widely in their degree of evolutionary affinity has galvanized studies of the function and evolution of the Aspergillus genome in clinical, industrial, and agricultural environments. Here, we synthesize recent key findings that shed light on the architecture of the Aspergillus genome, on the molecular foundations of the genus’ astounding dexterity and diversity in secondary metabolism, and on the genetic underpinnings of virulence in Aspergillus fumigatus, one of the most lethal fungal pathogens. Many of these insights dramatically expand our knowledge of fungal and microbial eukaryote genome evolution and function and argue that Aspergillus constitutes a superb model clade for the study of functional and comparative genomics. PMID:23084572
Global Implementation of Genomic Medicine: We Are Not Alone
Manolio, Teri A.; Abramowicz, Marc; Al-Mulla, Fahd; Anderson, Warwick; Balling, Rudi; Berger, Adam C.; Bleyl, Steven; Chakravarti, Aravinda; Chantratita, Wasun; Chisholm, Rex L.; Dissanayake, Vajira H. W.; Dunn, Michael; Dzau, Victor J.; Han, Bok-Ghee; Hubbard, Tim; Kolbe, Anne; Korf, Bruce; Kubo, Michiaki; Lasko, Paul; Leego, Erkki; Mahasirimongkol, Surakameth; Majumdar, Partha P.; Matthijs, Gert; McLeod, Howard L.; Metspalu, Andres; Meulien, Pierre; Miyano, Satoru; Naparstek, Yaakov; O’Rourke, P. Pearl; Patrinos, George P.; Rehm, Heidi L.; Relling, Mary V.; Rennert, Gad; Rodriguez, Laura Lyman; Roden, Dan M.; Shuldiner, Alan R.; Sinha, Sukdev; Tan, Patrick; Ulfendahl, Mats; Ward, Robyn; Williams, Marc S.; Wong, John E.L.; Green, Eric D.; Ginsburg, Geoffrey S.
2016-01-01
Advances in high-throughput genomic technologies coupled with a growing number of genomic results potentially useful in clinical care have led to ground-breaking genomic medicine implementation programs in various nations. Many of these innovative programs capitalize on unique local capabilities arising from the structure of their health care systems or their cultural or political milieu, as well as from unusual burdens of disease or risk alleles. Many such programs are being conducted in relative isolation and might benefit from sharing of approaches and lessons learned in other nations. The National Human Genome Research Institute recently brought together 25 of these groups from around the world to describe and compare projects, examine the current state of implementation and desired near-term capabilities, and identify opportunities for collaboration to promote the responsible implementation of genomic medicine. The wide variety of nascent programs in diverse settings demonstrates that implementation of genomic medicine is expanding globally in varied and highly innovative ways. Opportunities for collaboration abound in the areas of evidence generation, health information technology, education, workforce development, pharmacogenomics, and policy and regulatory issues. Several international organizations that are already facilitating effective research collaborations should engage to ensure implementation proceeds collaboratively without potentially wasteful duplication. Efforts to coalesce these groups around concrete but compelling signature projects, such as global eradication of genetically-mediated drug reactions or developing a truly global genomic variant data resource across a wide number of ethnicities, would accelerate appropriate implementation of genomics to improve clinical care world-wide. PMID:26041702
Elkins, James G; Hamilton-Brehm, Scott D; Lucas, Susan; Han, James; Lapidus, Alla; Cheng, Jan-Fang; Goodwin, Lynne A; Pitluck, Sam; Peters, Lin; Mikhailova, Natalia; Davenport, Karen W; Detter, John C; Han, Cliff S; Tapia, Roxanne; Land, Miriam L; Hauser, Loren; Kyrpides, Nikos C; Ivanova, Natalia N; Pagani, Ioanna; Bruce, David; Woyke, Tanja; Cottingham, Robert W
2013-04-11
Thermodesulfobacterium geofontis OPF15(T) (ATCC BAA-2454, JCM 18567) was isolated from Obsidian Pool, Yellowstone National Park, and grows optimally at 83°C. The 1.6-Mb genome sequence was finished at the Joint Genome Institute and has been deposited for future genomic studies pertaining to microbial processes and nutrient cycles in high-temperature environments.