defining genomic differences: Topics by Science.gov

Sample records for defining genomic differences

Initiation of a pan-genomic research project for Xylella fastidiosa

USDA-ARS?s Scientific Manuscript database

Differences in genomic structure and nucleotide polymorphism among strains form the genetic basis for adaptability of a bacterial species. This can be described by a bacterial pan-genome, which is defined as the full complement of genes in all strains of a species. The pan-genome is composed of a "c...
Visualization of IAV Genomes at the Single-Cell Level.

PubMed

Wang, Dan; Ma, Wenjun

2017-10-01

Different influenza A viruses (IAVs) infect the same cell in a host, and can subsequently produce new viruses through genome reassortment. By combining padlock probe RNA labeling with a single-cell analysis, a new approach effectively captures IAV genome trafficking and defines a time window for genome reassortment from same-cell coinfections. Copyright © 2017 Elsevier Ltd. All rights reserved.
A diversity study of Saccharomycopsis fibuligera in rice wine starter nuruk, reveals the evolutionary process associated with its interspecies hybrid.

PubMed

Farh, Mohamed El-Agamy; Cho, Yunjoo; Lim, Jae Yun; Seo, Jeong-Ah

2017-05-01

The amylolytic yeast Saccharomycopsis fibuligera is the predominant yeast in the starter product, nuruk, which is utilized for rice wine production in South Korea. Latest molecular studies explore a recently developed interspecific hybridization among stains of S. fibuligera with a unique genetic feature. However, the origin of the natural hybridization occurrence is still unclear. Thus, to respectively distinguish parental and hybrid strains, specific primer sets were applied on 141 yeast strains isolated from different nuruk samples fermented in different provinces. Sixty-seven strains were defined accordingly as parental species with genome A while 8 strains were defined as hybrid strains. Unexpectedly, another parental species with genome B could not be found among the strain pools yet. Furthermore, it was observed that hybrid strains are phenotypically different from A genome strains; asci containing tetrad ascospores were observed in A genome strains more frequent than in hybrid strains. Nevertheless, hybrid strains were slightly more thermotolerant than A genome strains. Interestingly, all hybrid strains were located only in Jeju province. Based on these sets of data, we speculated that the unique climate of Jeju province might play an evolutionary role in the interspecific hybridization between A genome strains, as well as the unculturable allopatric B genome strains.
GenomeFingerprinter: the genome fingerprint and the universal genome fingerprint analysis for systematic comparative genomics.

PubMed

Ai, Yuncan; Ai, Hannan; Meng, Fanmei; Zhao, Lei

2013-01-01

No attention has been paid on comparing a set of genome sequences crossing genetic components and biological categories with far divergence over large size range. We define it as the systematic comparative genomics and aim to develop the methodology. First, we create a method, GenomeFingerprinter, to unambiguously produce a set of three-dimensional coordinates from a sequence, followed by one three-dimensional plot and six two-dimensional trajectory projections, to illustrate the genome fingerprint of a given genome sequence. Second, we develop a set of concepts and tools, and thereby establish a method called the universal genome fingerprint analysis (UGFA). Particularly, we define the total genetic component configuration (TGCC) (including chromosome, plasmid, and phage) for describing a strain as a systematic unit, the universal genome fingerprint map (UGFM) of TGCC for differentiating strains as a universal system, and the systematic comparative genomics (SCG) for comparing a set of genomes crossing genetic components and biological categories. Third, we construct a method of quantitative analysis to compare two genomes by using the outcome dataset of genome fingerprint analysis. Specifically, we define the geometric center and its geometric mean for a given genome fingerprint map, followed by the Euclidean distance, the differentiate rate, and the weighted differentiate rate to quantitatively describe the difference between two genomes of comparison. Moreover, we demonstrate the applications through case studies on various genome sequences, giving tremendous insights into the critical issues in microbial genomics and taxonomy. We have created a method, GenomeFingerprinter, for rapidly computing, geometrically visualizing, intuitively comparing a set of genomes at genome fingerprint level, and hence established a method called the universal genome fingerprint analysis, as well as developed a method of quantitative analysis of the outcome dataset. These have set up the methodology of systematic comparative genomics based on the genome fingerprint analysis.
Comparative genomics of Lactobacillus

PubMed Central

Kant, Ravi; Blom, Jochen; Palva, Airi; Siezen, Roland J.; de Vos, Willem M.

2011-01-01

Summary The genus Lactobacillus includes a diverse group of bacteria consisting of many species that are associated with fermentations of plants, meat or milk. In addition, various lactobacilli are natural inhabitants of the intestinal tract of humans and other animals. Finally, several Lactobacillus strains are marketed as probiotics as their consumption can confer a health benefit to host. Presently, 154 Lactobacillus species are known and a growing fraction of these are subject to draft genome sequencing. However, complete genome sequences are needed to provide a platform for detailed genomic comparisons. Therefore, we selected a total of 20 genomes of various Lactobacillus strains for which complete genomic sequences have been reported. These genomes had sizes varying from 1.8 to 3.3 Mb and other characteristic features, such as G+C content that ranged from 33% to 51%. The Lactobacillus pan genome was found to consist of approximately 14 000 protein‐encoding genes while all 20 genomes shared a total of 383 sets of orthologous genes that defined the Lactobacillus core genome (LCG). Based on advanced phylogeny of the proteins encoded by this LCG, we grouped the 20 strains into three main groups and defined core group genes present in all genomes of a single group, signature group genes shared in all genomes of one group but absent in all other Lactobacillus genomes, and Group‐specific ORFans present in core group genes of one group and absent in all other complete genomes. The latter are of specific value in defining the different groups of genomes. The study provides a platform for present individual comparisons as well as future analysis of new Lactobacillus genomes. PMID:21375712
International network of cancer genome projects

PubMed Central

2010-01-01

The International Cancer Genome Consortium (ICGC) was launched to coordinate large-scale cancer genome studies in tumors from 50 different cancer types and/or subtypes that are of clinical and societal importance across the globe. Systematic studies of over 25,000 cancer genomes at the genomic, epigenomic, and transcriptomic levels will reveal the repertoire of oncogenic mutations, uncover traces of the mutagenic influences, define clinically-relevant subtypes for prognosis and therapeutic management, and enable the development of new cancer therapies. PMID:20393554
CRISPR/Cas9-Mediated Mutagenesis of Human Pluripotent Stem Cells in Defined Xeno-Free E8 Medium.

PubMed

Soh, Chew-Li; Huangfu, Danwei

2017-01-01

The recent advent of engineered nucleases including the CRISPR/Cas9 system has greatly facilitated genome manipulation in human pluripotent stem cells (hPSCs). In addition to facilitating hPSC-based disease studies, the application of genome engineering in hPSCs has also opened up new avenues for cell replacement therapy. To improve consistency and reproducibility of hPSC-based studies, and to meet the safety and regulatory requirements for clinical translation, it is necessary to use a defined, xeno-free cell culture system. This chapter describes protocols for CRISPR/Cas9 genome editing in an inducible Cas9 hPSC-based system, using cells cultured in chemically defined, xeno-free E8 Medium on a recombinant human vitronectin substrate. We detail procedures for the design and transfection of CRISPR guide RNAs, colony selection, and the expansion and validation of clonal mutant lines, all within this fully defined culture condition. These methods may be applied to a wide range of genome-engineering applications in hPSCs, including those that utilize different types of site-specific nucleases such as zinc finger nucleases (ZFNs) and TALENs, and form a closer step towards clinical utility of these cells.
Data compression and genomes: a two-dimensional life domain map.

PubMed

Menconi, Giulia; Benci, Vieri; Buiatti, Marcello

2008-07-21

We define the complexity of DNA sequences as the information content per nucleotide, calculated by means of some Lempel-Ziv data compression algorithm. It is possible to use the statistics of the complexity values of the functional regions of different complete genomes to distinguish among genomes of different domains of life (Archaea, Bacteria and Eukarya). We shall focus on the distribution function of the complexity of non-coding regions. We show that the three domains may be plotted in separate regions within the two-dimensional space where the axes are the skewness coefficient and the curtosis coefficient of the aforementioned distribution. Preliminary results on 15 genomes are introduced.
Natural gene expression variation studies in yeast.

PubMed

Thompson, Dawn A; Cubillos, Francisco A

2017-01-01

The rise of sequence information across different yeast species and strains is driving an increasing number of studies in the emerging field of genomics to associate polymorphic variants, mRNA abundance and phenotypic differences between individuals. Here, we gathered evidence from recent studies covering several layers that define the genotype-phenotype gap, such as mRNA abundance, allele-specific expression and translation efficiency to demonstrate how genetic variants co-evolve and define an individual's genome. Moreover, we exposed several antecedents where inter- and intra-specific studies led to opposite conclusions, probably owing to genetic divergence. Future studies in this area will benefit from the access to a massive array of well-annotated genomes and new sequencing technologies, which will allow the fine breakdown of the complex layers that delineate the genotype-phenotype map. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Defining functional DNA elements in the human genome

PubMed Central

Kellis, Manolis; Wold, Barbara; Snyder, Michael P.; Bernstein, Bradley E.; Kundaje, Anshul; Marinov, Georgi K.; Ward, Lucas D.; Birney, Ewan; Crawford, Gregory E.; Dekker, Job; Dunham, Ian; Elnitski, Laura L.; Farnham, Peggy J.; Feingold, Elise A.; Gerstein, Mark; Giddings, Morgan C.; Gilbert, David M.; Gingeras, Thomas R.; Green, Eric D.; Guigo, Roderic; Hubbard, Tim; Kent, Jim; Lieb, Jason D.; Myers, Richard M.; Pazin, Michael J.; Ren, Bing; Stamatoyannopoulos, John A.; Weng, Zhiping; White, Kevin P.; Hardison, Ross C.

2014-01-01

With the completion of the human genome sequence, attention turned to identifying and annotating its functional DNA elements. As a complement to genetic and comparative genomics approaches, the Encyclopedia of DNA Elements Project was launched to contribute maps of RNA transcripts, transcriptional regulator binding sites, and chromatin states in many cell types. The resulting genome-wide data reveal sites of biochemical activity with high positional resolution and cell type specificity that facilitate studies of gene regulation and interpretation of noncoding variants associated with human disease. However, the biochemically active regions cover a much larger fraction of the genome than do evolutionarily conserved regions, raising the question of whether nonconserved but biochemically active regions are truly functional. Here, we review the strengths and limitations of biochemical, evolutionary, and genetic approaches for defining functional DNA segments, potential sources for the observed differences in estimated genomic coverage, and the biological implications of these discrepancies. We also analyze the relationship between signal intensity, genomic coverage, and evolutionary conservation. Our results reinforce the principle that each approach provides complementary information and that we need to use combinations of all three to elucidate genome function in human biology and disease. PMID:24753594
Global mapping of transposon location.

PubMed

Gabriel, Abram; Dapprich, Johannes; Kunkel, Mark; Gresham, David; Pratt, Stephen C; Dunham, Maitreya J

2006-12-15

Transposable genetic elements are ubiquitous, yet their presence or absence at any given position within a genome can vary between individual cells, tissues, or strains. Transposable elements have profound impacts on host genomes by altering gene expression, assisting in genomic rearrangements, causing insertional mutations, and serving as sources of phenotypic variation. Characterizing a genome's full complement of transposons requires whole genome sequencing, precluding simple studies of the impact of transposition on interindividual variation. Here, we describe a global mapping approach for identifying transposon locations in any genome, using a combination of transposon-specific DNA extraction and microarray-based comparative hybridization analysis. We use this approach to map the repertoire of endogenous transposons in different laboratory strains of Saccharomyces cerevisiae and demonstrate that transposons are a source of extensive genomic variation. We also apply this method to mapping bacterial transposon insertion sites in a yeast genomic library. This unique whole genome view of transposon location will facilitate our exploration of transposon dynamics, as well as defining bases for individual differences and adaptive potential.
The Search for Therapeutic Bacteriophages Uncovers One New Subfamily and Two New Genera of Pseudomonas-Infecting Myoviridae

PubMed Central

Henry, Marine; Bobay, Louis-Marie; Chevallereau, Anne; Saussereau, Emilie; Ceyssens, Pieter-Jan; Debarbieux, Laurent

2015-01-01

In a previous study, six virulent bacteriophages PAK_P1, PAK_P2, PAK_P3, PAK_P4, PAK_P5 and CHA_P1 were evaluated for their in vivo efficacy in treating Pseudomonas aeruginosa infections using a mouse model of lung infection. Here, we show that their genomes are closely related to five other Pseudomonas phages and allow a subdivision into two clades, PAK_P1-like and KPP10-like viruses, based on differences in genome size, %GC and genomic contents, as well as number of tRNAs. These two clades are well delineated, with a mean of 86% and 92% of proteins considered homologous within individual clades, and 25% proteins considered homologous between the two clades. By ESI-MS/MS analysis we determined that their virions are composed of at least 25 different proteins and electron microscopy revealed a morphology identical to the hallmark Salmonella phage Felix O1. A search for additional bacteriophage homologs, using profiles of protein families defined from the analysis of the 11 genomes, identified 10 additional candidates infecting hosts from different species. By carrying out a phylogenetic analysis using these 21 genomes we were able to define a new subfamily of viruses, the Felixounavirinae within the Myoviridae family. The new Felixounavirinae subfamily includes three genera: Felixounalikevirus, PAK_P1likevirus and KPP10likevirus. Sequencing genomes of bacteriophages with therapeutic potential increases the quantity of genomic data on closely related bacteriophages, leading to establishment of new taxonomic clades and the development of strategies for analyzing viral genomes as presented in this article. PMID:25629728
A clade of Listeria monocytogenes serotype 4b variant strains linked to recent listeriosis outbreaks associated with produce from a defined geographic region in the US.

PubMed

Burall, Laurel S; Grim, Christopher J; Datta, Atin R

2017-01-01

Four listeriosis incidences/outbreaks, spanning 19 months, have been linked to Listeria monocytogenes serotype 4b variant (4bV) strains. Three of these incidents can be linked to a defined geographical region, while the fourth is likely to be linked. In this study, whole genome sequencing (WGS) of strains from these incidents was used for genomic comparisons using two approached. The first was JSpecies tetramer, which analyzed tetranucleotide frequency to assess relatedness. The second, the CFSAN SNP Pipeline, was used to perform WGS SNP analyses against three different reference genomes to evaluate relatedness by SNP distances. In each case, unrelated strains were included as controls. The analyses showed that strains from these incidents form a highly related clade with SNP differences of ≤101 within the clade and >9000 against other strains. Multi-Virulence-Locus Sequence Typing, a third standardized approach for evaluation relatedness, was used to assess the genetic drift in six conserved, known virulence loci and showed a different clustering pattern indicating possible differences in selection pressure experienced by these genes. These data suggest a high degree of relatedness among these 4bV strains linked to a defined geographic region and also highlight the possibility of alterations related to adaptation and virulence.
Standards for Clinical Grade Genomic Databases.

PubMed

Yohe, Sophia L; Carter, Alexis B; Pfeifer, John D; Crawford, James M; Cushman-Vokoun, Allison; Caughron, Samuel; Leonard, Debra G B

2015-11-01

Next-generation sequencing performed in a clinical environment must meet clinical standards, which requires reproducibility of all aspects of the testing. Clinical-grade genomic databases (CGGDs) are required to classify a variant and to assist in the professional interpretation of clinical next-generation sequencing. Applying quality laboratory standards to the reference databases used for sequence-variant interpretation presents a new challenge for validation and curation. To define CGGD and the categories of information contained in CGGDs and to frame recommendations for the structure and use of these databases in clinical patient care. Members of the College of American Pathologists Personalized Health Care Committee reviewed the literature and existing state of genomic databases and developed a framework for guiding CGGD development in the future. Clinical-grade genomic databases may provide different types of information. This work group defined 3 layers of information in CGGDs: clinical genomic variant repositories, genomic medical data repositories, and genomic medicine evidence databases. The layers are differentiated by the types of genomic and medical information contained and the utility in assisting with clinical interpretation of genomic variants. Clinical-grade genomic databases must meet specific standards regarding submission, curation, and retrieval of data, as well as the maintenance of privacy and security. These organizing principles for CGGDs should serve as a foundation for future development of specific standards that support the use of such databases for patient care.
pico-PLAZA, a genome database of microbial photosynthetic eukaryotes.

PubMed

Vandepoele, Klaas; Van Bel, Michiel; Richard, Guilhem; Van Landeghem, Sofie; Verhelst, Bram; Moreau, Hervé; Van de Peer, Yves; Grimsley, Nigel; Piganeau, Gwenael

2013-08-01

With the advent of next generation genome sequencing, the number of sequenced algal genomes and transcriptomes is rapidly growing. Although a few genome portals exist to browse individual genome sequences, exploring complete genome information from multiple species for the analysis of user-defined sequences or gene lists remains a major challenge. pico-PLAZA is a web-based resource (http://bioinformatics.psb.ugent.be/pico-plaza/) for algal genomics that combines different data types with intuitive tools to explore genomic diversity, perform integrative evolutionary sequence analysis and study gene functions. Apart from homologous gene families, multiple sequence alignments, phylogenetic trees, Gene Ontology, InterPro and text-mining functional annotations, different interactive viewers are available to study genome organization using gene collinearity and synteny information. Different search functions, documentation pages, export functions and an extensive glossary are available to guide non-expert scientists. To illustrate the versatility of the platform, different case studies are presented demonstrating how pico-PLAZA can be used to functionally characterize large-scale EST/RNA-Seq data sets and to perform environmental genomics. Functional enrichments analysis of 16 Phaeodactylum tricornutum transcriptome libraries offers a molecular view on diatom adaptation to different environments of ecological relevance. Furthermore, we show how complementary genomic data sources can easily be combined to identify marker genes to study the diversity and distribution of algal species, for example in metagenomes, or to quantify intraspecific diversity from environmental strains. © 2013 John Wiley & Sons Ltd and Society for Applied Microbiology.
DEFINING THE CHEMICAL SPACE OF PUBLIC GENOMIC DATA (S)

EPA Science Inventory

The current project aims to chemically index the genomics content of public genomic databases to make these data accessible in relation to other publicly available, chemically-indexed toxicological information. By defining the chemical space of public genomic data, it is possibl...
Decoherence in yeast cell populations and its implications for genome-wide expression noise.

PubMed

Briones, M R S; Bosco, F

2009-01-20

Gene expression "noise" is commonly defined as the stochastic variation of gene expression levels in different cells of the same population under identical growth conditions. Here, we tested whether this "noise" is amplified with time, as a consequence of decoherence in global gene expression profiles (genome-wide microarrays) of synchronized cells. The stochastic component of transcription causes fluctuations that tend to be amplified as time progresses, leading to a decay of correlations of expression profiles, in perfect analogy with elementary relaxation processes. Measuring decoherence, defined here as a decay in the auto-correlation function of yeast genome-wide expression profiles, we found a slowdown in the decay of correlations, opposite to what would be expected if, as in mixing systems, correlations decay exponentially as the equilibrium state is reached. Our results indicate that the populational variation in gene expression (noise) is a consequence of temporal decoherence, in which the slow decay of correlations is a signature of strong interdependence of the transcription dynamics of different genes.
Genomic control of patterning

PubMed Central

Peter, Isabelle S.; Davidson, Eric H.

2014-01-01

The development of multicellular organisms involves the partitioning of the organism into territories of cells of specific structure and function. The information for spatial patterning processes is directly encoded in the genome. The genome determines its own usage depending on stage and position, by means of interactions that constitute gene regulatory networks (GRNs). The GRN driving endomesoderm development in sea urchin embryos illustrates different regulatory strategies by which developmental programs are initiated, orchestrated, stabilized or excluded to define the pattern of specified territories in the developing embryo. PMID:19378258
Genetic heterogeneity of L-Zagreb mumps virus vaccine strain.

PubMed

Kosutic-Gulija, Tanja; Forcic, Dubravko; Santak, Maja; Ramljak, Ana; Mateljak-Lukacevic, Sanja; Mazuran, Renata

2008-07-10

The most often used mumps vaccine strains Jeryl Lynn (JL), RIT4385, Urabe-AM9, L-Zagreb and L-3 differ in immunogenicity and reactogenicity. Previous analyses showed that JL, Urabe-AM9 and L-3 are genetically heterogeneous. We identified the heterogeneity of L-Zagreb throughout the entire genome. Two major variants were defined: variant A being identical to the consensus sequence of viral seeds and vaccine(s) and variant B which differs from variant A in three nucleotide positions. The difference between viral variants in L-Zagreb strain is insufficient for distinct viral strains to be defined. We demonstrated that proportion of variants in L-Zagreb viral population depends on cell substrate used for viral replication in vitro and in vivo. L-Zagreb strain should be considered as a single strain composed of at least two variant viral genomes.
By their genes ye shall know them: genomic signatures of predatory bacteria

PubMed Central

Pasternak, Zohar; Pietrokovski, Shmuel; Rotem, Or; Gophna, Uri; Lurie-Weinberger, Mor N; Jurkevitch, Edouard

2013-01-01

Predatory bacteria are taxonomically disparate, exhibit diverse predatory strategies and are widely distributed in varied environments. To date, their predatory phenotypes cannot be discerned in genome sequence data thereby limiting our understanding of bacterial predation, and of its impact in nature. Here, we define the ‘predatome,' that is, sets of protein families that reflect the phenotypes of predatory bacteria. The proteomes of all sequenced 11 predatory bacteria, including two de novo sequenced genomes, and 19 non-predatory bacteria from across the phylogenetic and ecological landscapes were compared. Protein families discriminating between the two groups were identified and quantified, demonstrating that differences in the proteomes of predatory and non-predatory bacteria are large and significant. This analysis allows predictions to be made, as we show by confirming from genome data an over-looked bacterial predator. The predatome exhibits deficiencies in riboflavin and amino acids biosynthesis, suggesting that predators obtain them from their prey. In contrast, these genomes are highly enriched in adhesins, proteases and particular metabolic proteins, used for binding to, processing and consuming prey, respectively. Strikingly, predators and non-predators differ in isoprenoid biosynthesis: predators use the mevalonate pathway, whereas non-predators, like almost all bacteria, use the DOXP pathway. By defining predatory signatures in bacterial genomes, the predatory potential they encode can be uncovered, filling an essential gap for measuring bacterial predation in nature. Moreover, we suggest that full-genome proteomic comparisons are applicable to other ecological interactions between microbes, and provide a convenient and rational tool for the functional classification of bacteria. PMID:23190728

Genomic diversity of the human intestinal parasite Entamoeba histolytica

PubMed Central

2012-01-01

Background Entamoeba histolytica is a significant cause of disease worldwide. However, little is known about the genetic diversity of the parasite. We re-sequenced the genomes of ten laboratory cultured lines of the eukaryotic pathogen Entamoeba histolytica in order to develop a picture of genetic diversity across the genome. Results The extreme nucleotide composition bias and repetitiveness of the E. histolytica genome provide a challenge for short-read mapping, yet we were able to define putative single nucleotide polymorphisms in a large portion of the genome. The results suggest a rather low level of single nucleotide diversity, although genes and gene families with putative roles in virulence are among the more polymorphic genes. We did observe large differences in coverage depth among genes, indicating differences in gene copy number between genomes. We found evidence indicating that recombination has occurred in the history of the sequenced genomes, suggesting that E. histolytica may reproduce sexually. Conclusions E. histolytica displays a relatively low level of nucleotide diversity across its genome. However, large differences in gene family content and gene copy number are seen among the sequenced genomes. The pattern of polymorphism indicates that E. histolytica reproduces sexually, or has done so in the past, which has previously been suggested but not proven. PMID:22630046
Genomics and functional genomics in Chlamydomonas reinhardtii

DOE Office of Scientific and Technical Information (OSTI.GOV)

Blaby, Ian K.; Blaby-Haas, Crysten E.

The availability of the Chlamydomonas reinhardtii nuclear genome sequence continues to enable researchers to address biological questions relevant to algae, land plants and animals in unprecedented ways. As we continue to characterize and understand biological processes in C. reinhardtii and translate that knowledge to other systems, we are faced with the realization that many genes encode proteins without a defined function. The field of functional genomics aims to close this gap between genome sequence and protein function. Transcriptomes, proteomes and phenomes can each provide layers of gene-specific functional data while supplying a global snapshot of cellular behavior under different conditions.more » Herein we present a brief history of functional genomics, the present status of the C. reinhardtii genome, how genome-wide experiments can aid in supplying protein function inferences, and provide an outlook for functional genomics in C. reinhardtii.« less
Genomics and functional genomics in Chlamydomonas reinhardtii

DOE PAGES

Blaby, Ian K.; Blaby-Haas, Crysten E.

2017-03-21

The availability of the Chlamydomonas reinhardtii nuclear genome sequence continues to enable researchers to address biological questions relevant to algae, land plants and animals in unprecedented ways. As we continue to characterize and understand biological processes in C. reinhardtii and translate that knowledge to other systems, we are faced with the realization that many genes encode proteins without a defined function. The field of functional genomics aims to close this gap between genome sequence and protein function. Transcriptomes, proteomes and phenomes can each provide layers of gene-specific functional data while supplying a global snapshot of cellular behavior under different conditions.more » Herein we present a brief history of functional genomics, the present status of the C. reinhardtii genome, how genome-wide experiments can aid in supplying protein function inferences, and provide an outlook for functional genomics in C. reinhardtii.« less
Genomic Diversity in the Endosymbiotic Bacterium Rhizobium leguminosarum.

PubMed

Sánchez-Cañizares, Carmen; Jorrín, Beatriz; Durán, David; Nadendla, Suvarna; Albareda, Marta; Rubio-Sanz, Laura; Lanza, Mónica; González-Guerrero, Manuel; Prieto, Rosa Isabel; Brito, Belén; Giglio, Michelle G; Rey, Luis; Ruiz-Argüeso, Tomás; Palacios, José M; Imperial, Juan

2018-01-24

Rhizobium leguminosarum bv. viciae is a soil α-proteobacterium that establishes a diazotrophic symbiosis with different legumes of the Fabeae tribe. The number of genome sequences from rhizobial strains available in public databases is constantly increasing, although complete, fully annotated genome structures from rhizobial genomes are scarce. In this work, we report and analyse the complete genome of R. leguminosarum bv. viciae UPM791. Whole genome sequencing can provide new insights into the genetic features contributing to symbiotically relevant processes such as bacterial adaptation to the rhizosphere, mechanisms for efficient competition with other bacteria, and the ability to establish a complex signalling dialogue with legumes, to enter the root without triggering plant defenses, and, ultimately, to fix nitrogen within the host. Comparison of the complete genome sequences of two strains of R. leguminosarum bv. viciae , 3841 and UPM791, highlights the existence of different symbiotic plasmids and a common core chromosome. Specific genomic traits, such as plasmid content or a distinctive regulation, define differential physiological capabilities of these endosymbionts. Among them, strain UPM791 presents unique adaptations for recycling the hydrogen generated in the nitrogen fixation process.
Are Escherichia coli Pathotypes Still Relevant in the Era of Whole-Genome Sequencing?

PubMed Central

Robins-Browne, Roy M.; Holt, Kathryn E.; Ingle, Danielle J.; Hocking, Dianna M.; Yang, Ji; Tauschek, Marija

2016-01-01

The empirical and pragmatic nature of diagnostic microbiology has given rise to several different schemes to subtype E.coli, including biotyping, serotyping, and pathotyping. These schemes have proved invaluable in identifying and tracking outbreaks, and for prognostication in individual cases of infection, but they are imprecise and potentially misleading due to the malleability and continuous evolution of E. coli. Whole genome sequencing can be used to accurately determine E. coli subtypes that are based on allelic variation or differences in gene content, such as serotyping and pathotyping. Whole genome sequencing also provides information about single nucleotide polymorphisms in the core genome of E. coli, which form the basis of sequence typing, and is more reliable than other systems for tracking the evolution and spread of individual strains. A typing scheme for E. coli based on genome sequences that includes elements of both the core and accessory genomes, should reduce typing anomalies and promote understanding of how different varieties of E. coli spread and cause disease. Such a scheme could also define pathotypes more precisely than current methods. PMID:27917373
Are Escherichia coli Pathotypes Still Relevant in the Era of Whole-Genome Sequencing?

PubMed

Robins-Browne, Roy M; Holt, Kathryn E; Ingle, Danielle J; Hocking, Dianna M; Yang, Ji; Tauschek, Marija

2016-01-01

The empirical and pragmatic nature of diagnostic microbiology has given rise to several different schemes to subtype E .coli, including biotyping, serotyping, and pathotyping. These schemes have proved invaluable in identifying and tracking outbreaks, and for prognostication in individual cases of infection, but they are imprecise and potentially misleading due to the malleability and continuous evolution of E. coli . Whole genome sequencing can be used to accurately determine E. coli subtypes that are based on allelic variation or differences in gene content, such as serotyping and pathotyping. Whole genome sequencing also provides information about single nucleotide polymorphisms in the core genome of E. coli , which form the basis of sequence typing, and is more reliable than other systems for tracking the evolution and spread of individual strains. A typing scheme for E. coli based on genome sequences that includes elements of both the core and accessory genomes, should reduce typing anomalies and promote understanding of how different varieties of E. coli spread and cause disease. Such a scheme could also define pathotypes more precisely than current methods.
The lifestyle of prokaryotic organisms influences the repertoire of promiscuous enzymes.

PubMed

Martínez-Núñez, Mario Alberto; Rodríguez-Vázquez, Katya; Pérez-Rueda, Ernesto

2015-09-01

The metabolism of microbial organisms and its diversity are partly the result of an adaptation process to the characteristics of the environments that they inhabit. In this work, we analyze the influence of lifestyle on the content of promiscuous enzymes in 761 nonredundant bacterial and archaeal genomes. Promiscuous enzymes were defined as those proteins whose catalytic activities are defined by two or more different Enzyme Commission (E.C.) numbers. The genomes analyzed were categorized into four lifestyles for their exhaustive comparisons: free-living, extremophiles, pathogens, and intracellular. From these analyses we found that free-living organisms have larger genomes and an enrichment of promiscuous enzymes. In contrast, intracellular organisms showed smaller genomes and the lesser proportion of promiscuous enzymes. On the basis of our data, we show that the proportion of promiscuous enzymes in an organism is mainly influenced by the lifestyle, where fluctuating environments promote its emergence. Finally, we evidenced that duplication processes occur preferentially in metabolism of free-living and extremophiles species. © 2015 Wiley Periodicals, Inc.
Public consultation in ethics: an experiment in representative ethics.

PubMed

Burgess, Michael M

2004-01-01

Genome Canada has funded a research project to evaluate the usefulness of different forms of ethical analysis for assessing the moral weight of public opinion in the governance of genomics. This paper will describe a role of public consultation for ethical analysis and a contribution of ethical analysis to public consultation and the governance of genomics/biotechnology. Public consultation increases the robustness of ethical analysis with a more diverse set of moral experiences. Consultation must be carefully and respectfully designed to generate sufficiently diverse and rich accounts of moral experiences. Since dominant groups tend to define ethical or policy issues in a manner that excludes some interests or perspectives, it is important to identify the range of interests that diverse publics hold before defining the issue and scope of the discussion and the premature foreclosure of ethical dialogue. Consequently, a significant contribution of ethical dialogue strengthened by social analysis is to consider the context and non-policy use of power to govern genomics and to sustain social debate on enduring ethical issues.
Integrated Genomic Analysis Identifies Clinically Relevant Subtypes of Glioblastoma Characterized by Abnormalities in PDGFRA, IDH1, EGFR, and NF1

DOE Office of Scientific and Technical Information (OSTI.GOV)

Verhaak, Roel GW; Hoadley, Katherine A; Purdom, Elizabeth

The Cancer Genome Atlas Network recently cataloged recurrent genomic abnormalities in glioblastoma multiforme (GBM). We describe a robust gene expression-based molecular classification of GBM into Proneural, Neural, Classical, and Mesenchymal subtypes and integrate multidimensional genomic data to establish patterns of somatic mutations and DNA copy number. Aberrations and gene expression of EGFR, NF1, and PDGFRA/IDH1 each define the Classical, Mesenchymal, and Proneural subtypes, respectively. Gene signatures of normal brain cell types show a strong relationship between subtypes and different neural lineages. Additionally, response to aggressive therapy differs by subtype, with the greatest benefit in the Classical subtype and no benefitmore » in the Proneural subtype. We provide a framework that unifies transcriptomic and genomic dimensions for GBM molecular stratification with important implications for future studies.« less
Evolution of Genome Size and Complexity in Pinus

PubMed Central

Morse, Alison M.; Peterson, Daniel G.; Islam-Faridi, M. Nurul; Smith, Katherine E.; Magbanua, Zenaida; Garcia, Saul A.; Kubisiak, Thomas L.; Amerson, Henry V.; Carlson, John E.; Nelson, C. Dana; Davis, John M.

2009-01-01

Background Genome evolution in the gymnosperm lineage of seed plants has given rise to many of the most complex and largest plant genomes, however the elements involved are poorly understood. Methodology/Principal Findings Gymny is a previously undescribed retrotransposon family in Pinus that is related to Athila elements in Arabidopsis. Gymny elements are dispersed throughout the modern Pinus genome and occupy a physical space at least the size of the Arabidopsis thaliana genome. In contrast to previously described retroelements in Pinus, the Gymny family was amplified or introduced after the divergence of pine and spruce (Picea). If retrotransposon expansions are responsible for genome size differences within the Pinaceae, as they are in angiosperms, then they have yet to be identified. In contrast, molecular divergence of Gymny retrotransposons together with other families of retrotransposons can account for the large genome complexity of pines along with protein-coding genic DNA, as revealed by massively parallel DNA sequence analysis of Cot fractionated genomic DNA. Conclusions/Significance Most of the enormous genome complexity of pines can be explained by divergence of retrotransposons, however the elements responsible for genome size variation are yet to be identified. Genomic resources for Pinus including those reported here should assist in further defining whether and how the roles of retrotransposons differ in the evolution of angiosperm and gymnosperm genomes. PMID:19194510
Genetic heterogeneity of L-Zagreb mumps virus vaccine strain

PubMed Central

Kosutic-Gulija, Tanja; Forcic, Dubravko; Šantak, Maja; Ramljak, Ana; Mateljak-Lukacevic, Sanja; Mazuran, Renata

2008-01-01

Background The most often used mumps vaccine strains Jeryl Lynn (JL), RIT4385, Urabe-AM9, L-Zagreb and L-3 differ in immunogenicity and reactogenicity. Previous analyses showed that JL, Urabe-AM9 and L-3 are genetically heterogeneous. Results We identified the heterogeneity of L-Zagreb throughout the entire genome. Two major variants were defined: variant A being identical to the consensus sequence of viral seeds and vaccine(s) and variant B which differs from variant A in three nucleotide positions. The difference between viral variants in L-Zagreb strain is insufficient for distinct viral strains to be defined. We demonstrated that proportion of variants in L-Zagreb viral population depends on cell substrate used for viral replication in vitro and in vivo. Conclusion L-Zagreb strain should be considered as a single strain composed of at least two variant viral genomes. PMID:18616793
MAGNAMWAR: an R package for genome-wide association studies of bacterial orthologs.

PubMed

Sexton, Corinne E; Smith, Hayden Z; Newell, Peter D; Douglas, Angela E; Chaston, John M

2018-06-01

Here we report on an R package for genome-wide association studies of orthologous genes in bacteria. Before using the software, orthologs from bacterial genomes or metagenomes are defined using local or online implementations of OrthoMCL. These presence-absence patterns are statistically associated with variation in user-collected phenotypes using the Mono-Associated GNotobiotic Animals Metagenome-Wide Association R package (MAGNAMWAR). Genotype-phenotype associations can be performed with several different statistical tests based on the type and distribution of the data. MAGNAMWAR is available on CRAN. john_chaston@byu.edu.
Comparative Genome Structure, Secondary Metabolite, and Effector Coding Capacity across Cochliobolus Pathogens

PubMed Central

Bushley, Kathryn E.; Ohm, Robin A.; Otillar, Robert; Martin, Joel; Schackwitz, Wendy; Grimwood, Jane; MohdZainudin, NurAinIzzati; Xue, Chunsheng; Wang, Rui; Manning, Viola A.; Dhillon, Braham; Tu, Zheng Jin; Steffenson, Brian J.; Salamov, Asaf; Sun, Hui; Lowry, Steve; LaButti, Kurt; Han, James; Copeland, Alex; Lindquist, Erika; Barry, Kerrie; Schmutz, Jeremy; Baker, Scott E.; Ciuffetti, Lynda M.; Grigoriev, Igor V.; Zhong, Shaobin; Turgeon, B. Gillian

2013-01-01

The genomes of five Cochliobolus heterostrophus strains, two Cochliobolus sativus strains, three additional Cochliobolus species (Cochliobolus victoriae, Cochliobolus carbonum, Cochliobolus miyabeanus), and closely related Setosphaeria turcica were sequenced at the Joint Genome Institute (JGI). The datasets were used to identify SNPs between strains and species, unique genomic regions, core secondary metabolism genes, and small secreted protein (SSP) candidate effector encoding genes with a view towards pinpointing structural elements and gene content associated with specificity of these closely related fungi to different cereal hosts. Whole-genome alignment shows that three to five percent of each genome differs between strains of the same species, while a quarter of each genome differs between species. On average, SNP counts among field isolates of the same C. heterostrophus species are more than 25× higher than those between inbred lines and 50× lower than SNPs between Cochliobolus species. The suites of nonribosomal peptide synthetase (NRPS), polyketide synthase (PKS), and SSP–encoding genes are astoundingly diverse among species but remarkably conserved among isolates of the same species, whether inbred or field strains, except for defining examples that map to unique genomic regions. Functional analysis of several strain-unique PKSs and NRPSs reveal a strong correlation with a role in virulence. PMID:23357949
Comparative Genome Structure, Secondary Metabolite, and Effector Coding Capacity across Cochliobolus Pathogens

DOE Office of Scientific and Technical Information (OSTI.GOV)

Condon, Bradford J.; Leng, Yueqiang; Wu, Dongliang

The genomes of five Cochliobolus heterostrophus strains, two Cochliobolus sativus strains, three additional Cochliobolus species (Cochliobolus victoriae, Cochliobolus carbonum, Cochliobolus miyabeanus), and closely related Setosphaeria turcica were sequenced at the Joint Genome Institute (JGI). The datasets were used to identify SNPs between strains and species, unique genomic regions, core secondary metabolism genes, and small secreted protein (SSP) candidate effector encoding genes with a view towards pinpointing structural elements and gene content associated with specificity of these closely related fungi to different cereal hosts. Whole-genome alignment shows that three to five of each genome differs between strains of the same species,more » while a quarter of each genome differs between species. On average, SNP counts among field isolates of the same C. heterostrophus species are more than 25 higher than those between inbred lines and 50 lower than SNPs between Cochliobolus species. The suites of nonribosomal peptide synthetase (NRPS), polyketide synthase (PKS), and SSP encoding genes are astoundingly diverse among species but remarkably conserved among isolates of the same species, whether inbred or field strains, except for defining examples that map to unique genomic regions. Functional analysis of several strain-unique PKSs and NRPSs reveal a strong correlation with a role in virulence.« less
Whole genome analysis reveals the diversity and evolutionary relationships between necrotic enteritis-causing strains of Clostridium perfringens.

PubMed

Lacey, Jake A; Allnutt, Theodore R; Vezina, Ben; Van, Thi Thu Hao; Stent, Thomas; Han, Xiaoyan; Rood, Julian I; Wade, Ben; Keyburn, Anthony L; Seemann, Torsten; Chen, Honglei; Haring, Volker; Johanesen, Priscilla A; Lyras, Dena; Moore, Robert J

2018-05-22

Clostridium perfringens causes a range of diseases in animals and humans including necrotic enteritis in chickens and food poisoning and gas gangrene in humans. Necrotic enteritis is of concern in commercial chicken production due to the cost of the implementation of infection control measures and to productivity losses. This study has focused on the genomic analysis of a range of chicken-derived C. perfringens isolates, from around the world and from different years. The genomes were sequenced and compared with 20 genomes available from public databases, which were from a diverse collection of isolates from chickens, other animals, and humans. We used a distance based phylogeny that was constructed based on gene content rather than sequence identity. Similarity between strains was defined as the number of genes that they have in common divided by their total number of genes. In this type of phylogenetic analysis, evolutionary distance can be interpreted in terms of evolutionary events such as acquisition and loss of genes, whereas the underlying properties (the gene content) can be interpreted in terms of function. We also compared these methods to the sequence-based phylogeny of the core genome. Distinct pathogenic clades of necrotic enteritis-causing C. perfringens were identified. They were characterised by variable regions encoded on the chromosome, with predicted roles in capsule production, adhesion, inhibition of related strains, phage integration, and metabolism. Some strains have almost identical genomes, even though they were isolated from different geographic regions at various times, while other highly distant genomes appear to result in similar outcomes with regard to virulence and pathogenesis. The high level of diversity in chicken isolates suggests there is no reliable factor that defines a chicken strain of C. perfringens, however, disease-causing strains can be defined by the presence of netB-encoding plasmids. This study reveals that horizontal gene transfer appears to play a significant role in genetic variation of the C. perfringens chromosome as well as the plasmid content within strains.
Use of biological priors enhances understanding of genetic architecture and genomic prediction of complex traits within and between dairy cattle breeds.

PubMed

Fang, Lingzhao; Sahana, Goutam; Ma, Peipei; Su, Guosheng; Yu, Ying; Zhang, Shengli; Lund, Mogens Sandø; Sørensen, Peter

2017-08-10

A better understanding of the genetic architecture underlying complex traits (e.g., the distribution of causal variants and their effects) may aid in the genomic prediction. Here, we hypothesized that the genomic variants of complex traits might be enriched in a subset of genomic regions defined by genes grouped on the basis of "Gene Ontology" (GO), and that incorporating this independent biological information into genomic prediction models might improve their predictive ability. Four complex traits (i.e., milk, fat and protein yields, and mastitis) together with imputed sequence variants in Holstein (HOL) and Jersey (JER) cattle were analysed. We first carried out a post-GWAS analysis in a HOL training population to assess the degree of enrichment of the association signals in the gene regions defined by each GO term. We then extended the genomic best linear unbiased prediction model (GBLUP) to a genomic feature BLUP (GFBLUP) model, including an additional genomic effect quantifying the joint effect of a group of variants located in a genomic feature. The GBLUP model using a single random effect assumes that all genomic variants contribute to the genomic relationship equally, whereas GFBLUP attributes different weights to the individual genomic relationships in the prediction equation based on the estimated genomic parameters. Our results demonstrate that the immune-relevant GO terms were more associated with mastitis than milk production, and several biologically meaningful GO terms improved the prediction accuracy with GFBLUP for the four traits, as compared with GBLUP. The improvement of the genomic prediction between breeds (the average increase across the four traits was 0.161) was more apparent than that it was within the HOL (the average increase across the four traits was 0.020). Our genomic feature modelling approaches provide a framework to simultaneously explore the genetic architecture and genomic prediction of complex traits by taking advantage of independent biological knowledge.
Three chromosomal rearrangements promote genomic divergence between migratory and stationary ecotypes of Atlantic cod.

PubMed

Berg, Paul R; Star, Bastiaan; Pampoulie, Christophe; Sodeland, Marte; Barth, Julia M I; Knutsen, Halvor; Jakobsen, Kjetill S; Jentoft, Sissel

2016-03-17

Identification of genome-wide patterns of divergence provides insight on how genomes are influenced by selection and can reveal the potential for local adaptation in spatially structured populations. In Atlantic cod - historically a major marine resource - Northeast-Arctic- and Norwegian coastal cod are recognized by fundamental differences in migratory and non-migratory behavior, respectively. However, the genomic architecture underlying such behavioral ecotypes is unclear. Here, we have analyzed more than 8.000 polymorphic SNPs distributed throughout all 23 linkage groups and show that loci putatively under selection are localized within three distinct genomic regions, each of several megabases long, covering approximately 4% of the Atlantic cod genome. These regions likely represent genomic inversions. The frequency of these distinct regions differ markedly between the ecotypes, spawning in the vicinity of each other, which contrasts with the low level of divergence in the rest of the genome. The observed patterns strongly suggest that these chromosomal rearrangements are instrumental in local adaptation and separation of Atlantic cod populations, leaving footprints of large genomic regions under selection. Our findings demonstrate the power of using genomic information in further understanding the population dynamics and defining management units in one of the world's most economically important marine resources.
Comparative genomics of Burkholderia multivorans, a ubiquitous pathogen with a highly conserved genomic structure

PubMed Central

Cooper, Vaughn S.; Hatcher, Philip J.; Verheyde, Bart; Carlier, Aurélien; Vandamme, Peter

2017-01-01

The natural environment serves as a reservoir of opportunistic pathogens. A well-established method for studying the epidemiology of such opportunists is multilocus sequence typing, which in many cases has defined strains predisposed to causing infection. Burkholderia multivorans is an important pathogen in people with cystic fibrosis (CF) and its epidemiology suggests that strains are acquired from non-human sources such as the natural environment. This raises the central question of whether the isolation source (CF or environment) or the multilocus sequence type (ST) of B. multivorans better predicts their genomic content and functionality. We identified four pairs of B. multivorans isolates, representing distinct STs and consisting of one CF and one environmental isolate each. All genomes were sequenced using the PacBio SMRT sequencing technology, which resulted in eight high-quality B. multivorans genome assemblies. The present study demonstrated that the genomic structure of the examined B. multivorans STs is highly conserved and that the B. multivorans genomic lineages are defined by their ST. Orthologous protein families were not uniformly distributed among chromosomes, with core orthologs being enriched on the primary chromosome and ST-specific orthologs being enriched on the second and third chromosome. The ST-specific orthologs were enriched in genes involved in defense mechanisms and secondary metabolism, corroborating the strain-specificity of these virulence characteristics. Finally, the same B. multivorans genomic lineages occur in both CF and environmental samples and on different continents, demonstrating their ubiquity and evolutionary persistence. PMID:28430818
Genome assortment, not serogroup, defines Vibrio cholerae pandemic strains

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brettin, Thomas S; Bruce, David C; Challacombe, Jean F

2009-01-01

Vibrio cholerae, the causative agent of cholera, is a bacterium autochthonous to the aquatic environment, and a serious public health threat. V. cholerae serogroup O1 is responsible for the previous two cholera pandemics, in which classical and El Tor biotypes were dominant in the 6th and the current 7th pandemics, respectively. Cholera researchers continually face newly emerging and re-emerging pathogenic clones carrying combinations of new serogroups as well as of phenotypic and genotypic properties. These genotype and phenotype changes have hampered control of the disease. Here we compare the complete genome sequences of 23 strains of V. cholerae isolated frommore » a variety of sources and geographical locations over the past 98 years in an effort to elucidate the evolutionary mechanisms governing genetic diversity and genesis of new pathogenic clones. The genome-based phylogeny revealed 12 distinct V. cholerae phyletic lineages, of which one, designated the V. cholerae core genome (CG), comprises both O1 classical and EI Tor biotypes. All 7th pandemic clones share nearly identical gene content, i.e., the same genome backbone. The transition from 6th to 7th pandemic strains is defined here as a 'shift' between pathogenic clones belonging to the same O1 serogroup, but from significantly different phyletic lineages within the CG clade. In contrast, transition among clones during the present 7th pandemic period can be characterized as a 'drift' between clones, differentiated mainly by varying composition of laterally transferred genomic islands, resulting in emergence of variants, exemplified by V.cholerae serogroup O139 and V.cholerae O1 El Tor hybrid clones that produce cholera toxin of classical biotype. Based on the comprehensive comparative genomics presented in this study it is concluded that V. cholerae undergoes extensive genetic recombination via lateral gene transfer, and, therefore, genome assortment, not serogroup, should be used to define pathogenic V. cholerae clones.« less
Mycobacterium leprae: genes, pseudogenes and genetic diversity

PubMed Central

Singh, Pushpendra; Cole, Stewart T

2011-01-01

Leprosy, which has afflicted human populations for millenia, results from infection with Mycobacterium leprae, an unculturable pathogen with an exceptionally long generation time. Considerable insight into the biology and drug resistance of the leprosy bacillus has been obtained from genomics. M. leprae has undergone reductive evolution and pseudogenes now occupy half of its genome. Comparative genomics of four different strains revealed remarkable conservation of the genome (99.995% identity) yet uncovered 215 polymorphic sites, mainly single nucleotide polymorphisms, and a handful of new pseudogenes. Mapping these polymorphisms in a large panel of strains defined 16 single nucleotide polymorphism-subtypes that showed strong geographical associations and helped retrace the evolution of M. leprae. PMID:21162636

Test Pricing and Reimbursement in Genomic Medicine: Towards a General Strategy.

PubMed

Vozikis, Athanassios; Cooper, David N; Mitropoulou, Christina; Kambouris, Manousos E; Brand, Angela; Dolzan, Vita; Fortina, Paolo; Innocenti, Federico; Lee, Ming Ta Michael; Leyens, Lada; Macek, Milan; Al-Mulla, Fahd; Prainsack, Barbara; Squassina, Alessio; Taruscio, Domenica; van Schaik, Ron H; Vayena, Effy; Williams, Marc S; Patrinos, George P

2016-01-01

This paper aims to provide an overview of the rationale and basic principles guiding the governance of genomic testing services, to clarify their objectives, and allocate and define responsibilities among stakeholders in a health-care system, with a special focus on the EU countries. Particular attention is paid to issues pertaining to pricing and reimbursement policies, the availability of essential genomic tests which differs between various countries owing to differences in disease prevalence and public health relevance, the prescribing and use of genomic testing services according to existing or new guidelines, budgetary and fiscal control, the balance between price and access to innovative testing, monitoring and evaluation for cost-effectiveness and safety, and the development of research capacity. We conclude that addressing the specific items put forward in this article will help to create a robust policy in relation to pricing and reimbursement in genomic medicine. This will contribute to an effective and sustainable health-care system and will prove beneficial to the economy at large. © 2016 S. Karger AG, Basel.
Flexibility and symmetry of prokaryotic genome rearrangement reveal lineage-associated core-gene-defined genome organizational frameworks.

PubMed

Kang, Yu; Gu, Chaohao; Yuan, Lina; Wang, Yue; Zhu, Yanmin; Li, Xinna; Luo, Qibin; Xiao, Jingfa; Jiang, Daquan; Qian, Minping; Ahmed Khan, Aftab; Chen, Fei; Zhang, Zhang; Yu, Jun

2014-11-25

The prokaryotic pangenome partitions genes into core and dispensable genes. The order of core genes, albeit assumed to be stable under selection in general, is frequently interrupted by horizontal gene transfer and rearrangement, but how a core-gene-defined genome maintains its stability or flexibility remains to be investigated. Based on data from 30 species, including 425 genomes from six phyla, we grouped core genes into syntenic blocks in the context of a pangenome according to their stability across multiple isolates. A subset of the core genes, often species specific and lineage associated, formed a core-gene-defined genome organizational framework (cGOF). Such cGOFs are either single segmental (one-third of the species analyzed) or multisegmental (the rest). Multisegment cGOFs were further classified into symmetric or asymmetric according to segment orientations toward the origin-terminus axis. The cGOFs in Gram-positive species are exclusively symmetric and often reversible in orientation, as opposed to those of the Gram-negative bacteria, which are all asymmetric and irreversible. Meanwhile, all species showing strong strand-biased gene distribution contain symmetric cGOFs and often specific DnaE (α subunit of DNA polymerase III) isoforms. Furthermore, functional evaluations revealed that cGOF genes are hub associated with regard to cellular activities, and the stability of cGOF provides efficient indexes for scaffold orientation as demonstrated by assembling virtual and empirical genome drafts. cGOFs show species specificity, and the symmetry of multisegmental cGOFs is conserved among taxa and constrained by DNA polymerase-centric strand-biased gene distribution. The definition of species-specific cGOFs provides powerful guidance for genome assembly and other structure-based analysis. Prokaryotic genomes are frequently interrupted by horizontal gene transfer (HGT) and rearrangement. To know whether there is a set of genes not only conserved in position among isolates but also functionally essential for a given species and to further evaluate the stability or flexibility of such genome structures across lineages are of importance. Based on a large number of multi-isolate pangenomic data, our analysis reveals that a subset of core genes is organized into a core-gene-defined genome organizational framework, or cGOF. Furthermore, the lineage-associated cGOFs among Gram-positive and Gram-negative bacteria behave differently: the former, composed of 2 to 4 segments, have their fragments symmetrically rearranged around the origin-terminus axis, whereas the latter show more complex segmentation and are partitioned asymmetrically into chromosomal structures. The definition of cGOFs provides new insights into prokaryotic genome organization and efficient guidance for genome assembly and analysis. Copyright © 2014 Kang et al.
Genome sequencing of ovine isolates of Mycobacterium avium subspecies paratuberculosis offers insights into host association

PubMed Central

2012-01-01

Background The genome of Mycobacterium avium subspecies paratuberculosis (MAP) is remarkably homogeneous among the genomes of bovine, human and wildlife isolates. However, previous work in our laboratories with the bovine K-10 strain has revealed substantial differences compared to sheep isolates. To systematically characterize all genomic differences that may be associated with the specific hosts, we sequenced the genomes of three U.S. sheep isolates and also obtained an optical map. Results Our analysis of one of the isolates, MAP S397, revealed a genome 4.8 Mb in size with 4,700 open reading frames (ORFs). Comparative analysis of the MAP S397 isolate showed it acquired approximately 10 large sequence regions that are shared with the human M. avium subsp. hominissuis strain 104 and lost 2 large regions that are present in the bovine strain. In addition, optical mapping defined the presence of 7 large inversions between the bovine and ovine genomes (~ 2.36 Mb). Whole-genome sequencing of 2 additional sheep strains of MAP (JTC1074 and JTC7565) further confirmed genomic homogeneity of the sheep isolates despite the presence of polymorphisms on the nucleotide level. Conclusions Comparative sequence analysis employed here provided a better understanding of the host association, evolution of members of the M. avium complex and could help in deciphering the phenotypic differences observed among sheep and cattle strains of MAP. A similar approach based on whole-genome sequencing combined with optical mapping could be employed to examine closely related pathogens. We propose an evolutionary scenario for M. avium complex strains based on these genome sequences. PMID:22409516
Genome-editing Technologies for Gene and Cell Therapy.

PubMed

Maeder, Morgan L; Gersbach, Charles A

2016-03-01

Gene therapy has historically been defined as the addition of new genes to human cells. However, the recent advent of genome-editing technologies has enabled a new paradigm in which the sequence of the human genome can be precisely manipulated to achieve a therapeutic effect. This includes the correction of mutations that cause disease, the addition of therapeutic genes to specific sites in the genome, and the removal of deleterious genes or genome sequences. This review presents the mechanisms of different genome-editing strategies and describes each of the common nuclease-based platforms, including zinc finger nucleases, transcription activator-like effector nucleases (TALENs), meganucleases, and the CRISPR/Cas9 system. We then summarize the progress made in applying genome editing to various areas of gene and cell therapy, including antiviral strategies, immunotherapies, and the treatment of monogenic hereditary disorders. The current challenges and future prospects for genome editing as a transformative technology for gene and cell therapy are also discussed.
Genome-editing Technologies for Gene and Cell Therapy

PubMed Central

Maeder, Morgan L; Gersbach, Charles A

2016-01-01

Gene therapy has historically been defined as the addition of new genes to human cells. However, the recent advent of genome-editing technologies has enabled a new paradigm in which the sequence of the human genome can be precisely manipulated to achieve a therapeutic effect. This includes the correction of mutations that cause disease, the addition of therapeutic genes to specific sites in the genome, and the removal of deleterious genes or genome sequences. This review presents the mechanisms of different genome-editing strategies and describes each of the common nuclease-based platforms, including zinc finger nucleases, transcription activator-like effector nucleases (TALENs), meganucleases, and the CRISPR/Cas9 system. We then summarize the progress made in applying genome editing to various areas of gene and cell therapy, including antiviral strategies, immunotherapies, and the treatment of monogenic hereditary disorders. The current challenges and future prospects for genome editing as a transformative technology for gene and cell therapy are also discussed. PMID:26755333
Sex in the brain: hormones and sex differences.

PubMed

Marrocco, Jordan; McEwen, Bruce S

2016-12-01

Contrary to popular belief, sex hormones act throughout the entire brain of both males and females via both genomic and nongenomic receptors. Many neural and behavioral functions are affected by estrogens, including mood, cognitive function, blood pressure regulation, motor coordination, pain, and opioid sensitivity. Subtle sex differences exist for many of these functions that are developmentally programmed by hormones and by not yet precisely defined genetic factors, including the mitochondrial genome. These sex differences, and responses to sex hormones in brain regions and upon functions not previously regarded as subject to such differences, indicate that we are entering a new era in our ability to understand and appreciate the diversity of gender-related behaviors and brain functions.
Genomic taxonomy of vibrios

PubMed Central

Thompson, Cristiane C; Vicente, Ana Carolina P; Souza, Rangel C; Vasconcelos, Ana Tereza R; Vesth, Tammi; Alves, Nelson; Ussery, David W; Iida, Tetsuya; Thompson, Fabiano L

2009-01-01

Background Vibrio taxonomy has been based on a polyphasic approach. In this study, we retrieve useful taxonomic information (i.e. data that can be used to distinguish different taxonomic levels, such as species and genera) from 32 genome sequences of different vibrio species. We use a variety of tools to explore the taxonomic relationship between the sequenced genomes, including Multilocus Sequence Analysis (MLSA), supertrees, Average Amino Acid Identity (AAI), genomic signatures, and Genome BLAST atlases. Our aim is to analyse the usefulness of these tools for species identification in vibrios. Results We have generated four new genome sequences of three Vibrio species, i.e., V. alginolyticus 40B, V. harveyi-like 1DA3, and V. mimicus strains VM573 and VM603, and present a broad analyses of these genomes along with other sequenced Vibrio species. The genome atlas and pangenome plots provide a tantalizing image of the genomic differences that occur between closely related sister species, e.g. V. cholerae and V. mimicus. The vibrio pangenome contains around 26504 genes. The V. cholerae core genome and pangenome consist of 1520 and 6923 genes, respectively. Pangenomes might allow different strains of V. cholerae to occupy different niches. MLSA and supertree analyses resulted in a similar phylogenetic picture, with a clear distinction of four groups (Vibrio core group, V. cholerae-V. mimicus, Aliivibrio spp., and Photobacterium spp.). A Vibrio species is defined as a group of strains that share > 95% DNA identity in MLSA and supertree analysis, > 96% AAI, ≤ 10 genome signature dissimilarity, and > 61% proteome identity. Strains of the same species and species of the same genus will form monophyletic groups on the basis of MLSA and supertree. Conclusion The combination of different analytical and bioinformatics tools will enable the most accurate species identification through genomic computational analysis. This endeavour will culminate in the birth of the online genomic taxonomy whereby researchers and end-users of taxonomy will be able to identify their isolates through a web-based server. This novel approach to microbial systematics will result in a tremendous advance concerning biodiversity discovery, description, and understanding. PMID:19860885
Comprehensive red blood cell and platelet antigen prediction from whole genome sequencing: proof of principle

PubMed Central

Westhoff, Connie M.; Uy, Jon Michael; Aguad, Maria; Smeland‐Wagman, Robin; Kaufman, Richard M.; Rehm, Heidi L.; Green, Robert C.; Silberstein, Leslie E.

2015-01-01

BACKGROUND There are 346 serologically defined red blood cell (RBC) antigens and 33 serologically defined platelet (PLT) antigens, most of which have known genetic changes in 45 RBC or six PLT genes that correlate with antigen expression. Polymorphic sites associated with antigen expression in the primary literature and reference databases are annotated according to nucleotide positions in cDNA. This makes antigen prediction from next‐generation sequencing data challenging, since it uses genomic coordinates. STUDY DESIGN AND METHODS The conventional cDNA reference sequences for all known RBC and PLT genes that correlate with antigen expression were aligned to the human reference genome. The alignments allowed conversion of conventional cDNA nucleotide positions to the corresponding genomic coordinates. RBC and PLT antigen prediction was then performed using the human reference genome and whole genome sequencing (WGS) data with serologic confirmation. RESULTS Some major differences and alignment issues were found when attempting to convert the conventional cDNA to human reference genome sequences for the following genes: ABO, A4GALT, RHD, RHCE, FUT3, ACKR1 (previously DARC), ACHE, FUT2, CR1, GCNT2, and RHAG. However, it was possible to create usable alignments, which facilitated the prediction of all RBC and PLT antigens with a known molecular basis from WGS data. Traditional serologic typing for 18 RBC antigens were in agreement with the WGS‐based antigen predictions, providing proof of principle for this approach. CONCLUSION Detailed mapping of conventional cDNA annotated RBC and PLT alleles can enable accurate prediction of RBC and PLT antigens from whole genomic sequencing data. PMID:26634332
Molecular signatures of plastic phenotypes in two eusocial insect species with simple societies.

PubMed

Patalano, Solenn; Vlasova, Anna; Wyatt, Chris; Ewels, Philip; Camara, Francisco; Ferreira, Pedro G; Asher, Claire L; Jurkowski, Tomasz P; Segonds-Pichon, Anne; Bachman, Martin; González-Navarrete, Irene; Minoche, André E; Krueger, Felix; Lowy, Ernesto; Marcet-Houben, Marina; Rodriguez-Ales, Jose Luis; Nascimento, Fabio S; Balasubramanian, Shankar; Gabaldon, Toni; Tarver, James E; Andrews, Simon; Himmelbauer, Heinz; Hughes, William O H; Guigó, Roderic; Reik, Wolf; Sumner, Seirian

2015-11-10

Phenotypic plasticity is important in adaptation and shapes the evolution of organisms. However, we understand little about what aspects of the genome are important in facilitating plasticity. Eusocial insect societies produce plastic phenotypes from the same genome, as reproductives (queens) and nonreproductives (workers). The greatest plasticity is found in the simple eusocial insect societies in which individuals retain the ability to switch between reproductive and nonreproductive phenotypes as adults. We lack comprehensive data on the molecular basis of plastic phenotypes. Here, we sequenced genomes, microRNAs (miRNAs), and multiple transcriptomes and methylomes from individual brains in a wasp (Polistes canadensis) and an ant (Dinoponera quadriceps) that live in simple eusocial societies. In both species, we found few differences between phenotypes at the transcriptional level, with little functional specialization, and no evidence that phenotype-specific gene expression is driven by DNA methylation or miRNAs. Instead, phenotypic differentiation was defined more subtly by nonrandom transcriptional network organization, with roles in these networks for both conserved and taxon-restricted genes. The general lack of highly methylated regions or methylome patterning in both species may be an important mechanism for achieving plasticity among phenotypes during adulthood. These findings define previously unidentified hypotheses on the genomic processes that facilitate plasticity and suggest that the molecular hallmarks of social behavior are likely to differ with the level of social complexity.
Molecular signatures of plastic phenotypes in two eusocial insect species with simple societies

PubMed Central

Patalano, Solenn; Vlasova, Anna; Wyatt, Chris; Ewels, Philip; Camara, Francisco; Ferreira, Pedro G.; Asher, Claire L.; Jurkowski, Tomasz P.; Segonds-Pichon, Anne; Bachman, Martin; González-Navarrete, Irene; Minoche, André E.; Krueger, Felix; Lowy, Ernesto; Marcet-Houben, Marina; Rodriguez-Ales, Jose Luis; Nascimento, Fabio S.; Balasubramanian, Shankar; Gabaldon, Toni; Tarver, James E.; Andrews, Simon; Himmelbauer, Heinz; Hughes, William O. H.; Guigó, Roderic; Reik, Wolf; Sumner, Seirian

2015-01-01

Phenotypic plasticity is important in adaptation and shapes the evolution of organisms. However, we understand little about what aspects of the genome are important in facilitating plasticity. Eusocial insect societies produce plastic phenotypes from the same genome, as reproductives (queens) and nonreproductives (workers). The greatest plasticity is found in the simple eusocial insect societies in which individuals retain the ability to switch between reproductive and nonreproductive phenotypes as adults. We lack comprehensive data on the molecular basis of plastic phenotypes. Here, we sequenced genomes, microRNAs (miRNAs), and multiple transcriptomes and methylomes from individual brains in a wasp (Polistes canadensis) and an ant (Dinoponera quadriceps) that live in simple eusocial societies. In both species, we found few differences between phenotypes at the transcriptional level, with little functional specialization, and no evidence that phenotype-specific gene expression is driven by DNA methylation or miRNAs. Instead, phenotypic differentiation was defined more subtly by nonrandom transcriptional network organization, with roles in these networks for both conserved and taxon-restricted genes. The general lack of highly methylated regions or methylome patterning in both species may be an important mechanism for achieving plasticity among phenotypes during adulthood. These findings define previously unidentified hypotheses on the genomic processes that facilitate plasticity and suggest that the molecular hallmarks of social behavior are likely to differ with the level of social complexity. PMID:26483466
Genome-wide and locus-specific DNA hypomethylation in G9a deficient mouse embryonic stem cells.

PubMed

Ikegami, Kohta; Iwatani, Misa; Suzuki, Masako; Tachibana, Makoto; Shinkai, Yoichi; Tanaka, Satoshi; Greally, John M; Yagi, Shintaro; Hattori, Naka; Shiota, Kunio

2007-01-01

In the mammalian genome, numerous CpG-rich loci define tissue-dependent and differentially methylated regions (T-DMRs). Euchromatin from different cell types differs in terms of its tissue-specific DNA methylation profile as defined by these T-DMRs. G9a is a euchromatin-localized histone methyltransferase (HMT) and catalyzes methylation of histone H3 at lysines 9 and 27 (H3-K9 and -K27). To test whether HMT activity influences euchromatic cytosine methylation, we analyzed the DNA methylation status of approximately 2000 CpG-rich loci, which are predicted in silico, in G9a(-/-) embryonic stem cells by restriction landmark genomic scanning (RLGS). While the RLGS profile of wild-type cells contained about 1300 spots, 32 new spots indicating DNA demethylation were seen in the profile of G9a(-/-) cells. Virtual-image RLGS (Vi-RLGS) allowed us to identify the genomic source of ten of these spots. These were confirmed to be cytosine demethylated, not just at the Not I site detected by the RLGS but extending over several kilobase pairs in cis. Chromatin immunoprecipitation (ChIP) confirmed these loci to be targets of G9a, with decreased H3-K9 and/or -K27 dimethylation in the G9a(-/-) cells. These data indicate that G9a site-selectively contributes to DNA methylation.
Driving Apart and Segregating Genomes in Archaea.

PubMed

Barillà, Daniela

2016-12-01

Genome segregation is a fundamental biological process in organisms from all domains of life. How this stage of the cell cycle unfolds in Eukarya has been clearly defined and considerable progress has been made to unravel chromosome partition in Bacteria. The picture is still elusive in Archaea. The lineages of this domain exhibit different cell-cycle lifestyles and wide-ranging chromosome copy numbers, fluctuating from 1 up to 55. This plurality of patterns suggests that a variety of mechanisms might underpin disentangling and delivery of DNA molecules to daughter cells. Here I describe recent developments in archaeal genome maintenance, including investigations of novel genome segregation machines that point to unforeseen bacterial and eukaryotic connections. Crown Copyright © 2016. Published by Elsevier Ltd. All rights reserved.
Molecular and Genomic Alterations in Glioblastoma Multiforme.

PubMed

Crespo, Ines; Vital, Ana Louisa; Gonzalez-Tablas, María; Patino, María del Carmen; Otero, Alvaro; Lopes, María Celeste; de Oliveira, Catarina; Domingues, Patricia; Orfao, Alberto; Tabernero, Maria Dolores

2015-07-01

In recent years, important advances have been achieved in the understanding of the molecular biology of glioblastoma multiforme (GBM); thus, complex genetic alterations and genomic profiles, which recurrently involve multiple signaling pathways, have been defined, leading to the first molecular/genetic classification of the disease. In this regard, different genetic alterations and genetic pathways appear to distinguish primary (eg, EGFR amplification) versus secondary (eg, IDH1/2 or TP53 mutation) GBM. Such genetic alterations target distinct combinations of the growth factor receptor-ras signaling pathways, as well as the phosphatidylinositol 3-kinase/phosphatase and tensin homolog/AKT, retinoblastoma/cyclin-dependent kinase (CDK) N2A-p16(INK4A), and TP53/mouse double minute (MDM) 2/MDM4/CDKN2A-p14(ARF) pathways, in cells that present features associated with key stages of normal neurogenesis and (normal) central nervous system cell types. This translates into well-defined genomic profiles that have been recently classified by The Cancer Genome Atlas Consortium into four subtypes: classic, mesenchymal, proneural, and neural GBM. Herein, we review the most relevant genetic alterations of primary versus secondary GBM, the specific signaling pathways involved, and the overall genomic profile of this genetically heterogeneous group of malignant tumors. Copyright © 2015 American Society for Investigative Pathology. Published by Elsevier Inc. All rights reserved.
Universal and idiosyncratic characteristic lengths in bacterial genomes

NASA Astrophysics Data System (ADS)

Junier, Ivan; Frémont, Paul; Rivoire, Olivier

2018-05-01

In condensed matter physics, simplified descriptions are obtained by coarse-graining the features of a system at a certain characteristic length, defined as the typical length beyond which some properties are no longer correlated. From a physics standpoint, in vitro DNA has thus a characteristic length of 300 base pairs (bp), the Kuhn length of the molecule beyond which correlations in its orientations are typically lost. From a biology standpoint, in vivo DNA has a characteristic length of 1000 bp, the typical length of genes. Since bacteria live in very different physico-chemical conditions and since their genomes lack translational invariance, whether larger, universal characteristic lengths exist is a non-trivial question. Here, we examine this problem by leveraging the large number of fully sequenced genomes available in public databases. By analyzing GC content correlations and the evolutionary conservation of gene contexts (synteny) in hundreds of bacterial chromosomes, we conclude that a fundamental characteristic length around 10–20 kb can be defined. This characteristic length reflects elementary structures involved in the coordination of gene expression, which are present all along the genome of nearly all bacteria. Technically, reaching this conclusion required us to implement methods that are insensitive to the presence of large idiosyncratic genomic features, which may co-exist along these fundamental universal structures.
Defining the biological bases of individual differences in musicality

PubMed Central

Gingras, Bruno; Honing, Henkjan; Peretz, Isabelle; Trainor, Laurel J.; Fisher, Simon E.

2015-01-01

Advances in molecular technologies make it possible to pinpoint genomic factors associated with complex human traits. For cognition and behaviour, identification of underlying genes provides new entry points for deciphering the key neurobiological pathways. In the past decade, the search for genetic correlates of musicality has gained traction. Reports have documented familial clustering for different extremes of ability, including amusia and absolute pitch (AP), with twin studies demonstrating high heritability for some music-related skills, such as pitch perception. Certain chromosomal regions have been linked to AP and musical aptitude, while individual candidate genes have been investigated in relation to aptitude and creativity. Most recently, researchers in this field started performing genome-wide association scans. Thus far, studies have been hampered by relatively small sample sizes and limitations in defining components of musicality, including an emphasis on skills that can only be assessed in trained musicians. With opportunities to administer standardized aptitude tests online, systematic large-scale assessment of musical abilities is now feasible, an important step towards high-powered genome-wide screens. Here, we offer a synthesis of existing literatures and outline concrete suggestions for the development of comprehensive operational tools for the analysis of musical phenotypes. PMID:25646515
Positional orthology: putting genomic evolutionary relationships into context.

PubMed

Dewey, Colin N

2011-09-01

Orthology is a powerful refinement of homology that allows us to describe more precisely the evolution of genomes and understand the function of the genes they contain. However, because orthology is not concerned with genomic position, it is limited in its ability to describe genes that are likely to have equivalent roles in different genomes. Because of this limitation, the concept of 'positional orthology' has emerged, which describes the relation between orthologous genes that retain their ancestral genomic positions. In this review, we formally define this concept, for which we introduce the shorter term 'toporthology', with respect to the evolutionary events experienced by a gene's ancestors. Through a discussion of recent studies on the role of genomic context in gene evolution, we show that the distinction between orthology and toporthology is biologically significant. We then review a number of orthology prediction methods that take genomic context into account and thus that may be used to infer the important relation of toporthology.
Positional orthology: putting genomic evolutionary relationships into context

PubMed Central

2011-01-01

Orthology is a powerful refinement of homology that allows us to describe more precisely the evolution of genomes and understand the function of the genes they contain. However, because orthology is not concerned with genomic position, it is limited in its ability to describe genes that are likely to have equivalent roles in different genomes. Because of this limitation, the concept of ‘positional orthology’ has emerged, which describes the relation between orthologous genes that retain their ancestral genomic positions. In this review, we formally define this concept, for which we introduce the shorter term ‘toporthology’, with respect to the evolutionary events experienced by a gene’s ancestors. Through a discussion of recent studies on the role of genomic context in gene evolution, we show that the distinction between orthology and toporthology is biologically significant. We then review a number of orthology prediction methods that take genomic context into account and thus that may be used to infer the important relation of toporthology. PMID:21705766
From Genes to Environment: Using integrative genomics to build a “systems level” understanding of autism spectrum disorders

PubMed Central

Hu, Valerie W.

2012-01-01

Autism spectrum disorders (ASD) are pervasive neurodevelopmental disorders that affect an estimated 1 in 110 individuals. Although there is a strong genetic component associated with these disorders, this review focuses on the multi-factorial nature of ASD and how different genome-wide (genomic) approaches contribute to our understanding of autism. Emphasis is placed on the need to study defined ASD phenotypes as well as to integrate large-scale ‘omics’ data in order to develop a “systems level” perspective of ASD which, in turn, is necessary to allow predictions regarding responses to specific perturbations and interventions. PMID:22497667
Viral Determinants of Integration Site Preferences of Simian Immunodeficiency Virus-Based Vectors

PubMed Central

Monse, Hella; Laufs, Stephanie; Kuate, Seraphin; Zeller, W. Jens; Fruehauf, Stefan; Überla, Klaus

2006-01-01

Preferential integration into transcriptionally active regions of genomes has been observed for retroviral vectors based on gamma-retroviruses and lentiviruses. However, differences in the integration site preferences were detected, which might be explained by differences in viral components of the preintegration complexes. Viral determinants of integration site preferences have not been defined. Therefore, integration sites of simian immunodeficiency virus (SIV)-based vectors produced in the absence of accessory genes or lacking promoter and enhancer elements were compared. Similar integration patterns for the different SIV vectors indicate that vif, vpr, vpx, nef, env, and promoter or enhancer elements are not required for preferential integration of SIV into transcriptionally active regions of genomes. PMID:16873270
Potential contribution of genomics and biotechnology in animal production

USDA-ARS?s Scientific Manuscript database

The overall objective of the book chapter is to define the potential contribution of genomics in livestock production in Latin American countries. A brief description on what is genomics, genome-wide association studies (GWAS), and genomic selection (GS) is provided. Genomics has been rapidly adopte...

Chromosomes in a genome-wise order: evidence for metaphase architecture.

PubMed

Weise, Anja; Bhatt, Samarth; Piaszinski, Katja; Kosyakova, Nadezda; Fan, Xiaobo; Altendorf-Hofmann, Annelore; Tanomtong, Alongklod; Chaveerach, Arunrat; de Cioffi, Marcelo Bello; de Oliveira, Edivaldo; Walther, Joachim-U; Liehr, Thomas; Chaudhuri, Jyoti P

2016-01-01

One fundamental finding of the last decade is that, besides the primary DNA sequence information there are several epigenetic "information-layers" like DNA-and histone modifications, chromatin packaging and, last but not least, the position of genes in the nucleus. We postulate that the functional genomic architecture is not restricted to the interphase of the cell cycle but can also be observed in the metaphase stage, when chromosomes are most condensed and microscopically visible. If so, it offers the unique opportunity to directly analyze the functional aspects of genomic architecture in different cells, species and diseases. Another aspect not directly accessible by molecular techniques is the genome merged from two different haploid parental genomes represented by the homologous chromosome sets. Our results show that there is not only a well-known and defined nuclear architecture in interphase but also in metaphase leading to a bilateral organization of the two haploid sets of chromosomes. Moreover, evidence is provided for the parental origin of the haploid grouping. From our findings we postulate an additional epigenetic information layer within the genome including the organization of homologous chromosomes and their parental origin which may now substantially change the landscape of genetics.
Defining Genome Project Standards in a New Era of Sequencing

ScienceCinema

Chain, Patrick

2018-01-16

Patrick Chain of the DOE Joint Genome Institute gives a talk on behalf of the International Genome Sequencing Standards Consortium on the need for intermediate genome classifications between "draft" and "finished".
A Genome-Wide Association Study for Regulators of Micronucleus Formation in Mice.

PubMed

McIntyre, Rebecca E; Nicod, Jérôme; Robles-Espinoza, Carla Daniela; Maciejowski, John; Cai, Na; Hill, Jennifer; Verstraten, Ruth; Iyer, Vivek; Rust, Alistair G; Balmus, Gabriel; Mott, Richard; Flint, Jonathan; Adams, David J

2016-08-09

In mammals the regulation of genomic instability plays a key role in tumor suppression and also controls genome plasticity, which is important for recombination during the processes of immunity and meiosis. Most studies to identify regulators of genomic instability have been performed in cells in culture or in systems that report on gross rearrangements of the genome, yet subtle differences in the level of genomic instability can contribute to whole organism phenotypes such as tumor predisposition. Here we performed a genome-wide association study in a population of 1379 outbred Crl:CFW(SW)-US_P08 mice to dissect the genetic landscape of micronucleus formation, a biomarker of chromosomal breaks, whole chromosome loss, and extranuclear DNA. Variation in micronucleus levels is a complex trait with a genome-wide heritability of 53.1%. We identify seven loci influencing micronucleus formation (false discovery rate <5%), and define candidate genes at each locus. Intriguingly at several loci we find evidence for sexual dimorphism in micronucleus formation, with a locus on chromosome 11 being specific to males. Copyright © 2016 McIntyre et al.
Multi-population Genomic Relationships for Estimating Current Genetic Variances Within and Genetic Correlations Between Populations.

PubMed

Wientjes, Yvonne C J; Bijma, Piter; Vandenplas, Jérémie; Calus, Mario P L

2017-10-01

Different methods are available to calculate multi-population genomic relationship matrices. Since those matrices differ in base population, it is anticipated that the method used to calculate genomic relationships affects the estimate of genetic variances, covariances, and correlations. The aim of this article is to define the multi-population genomic relationship matrix to estimate current genetic variances within and genetic correlations between populations. The genomic relationship matrix containing two populations consists of four blocks, one block for population 1, one block for population 2, and two blocks for relationships between the populations. It is known, based on literature, that by using current allele frequencies to calculate genomic relationships within a population, current genetic variances are estimated. In this article, we theoretically derived the properties of the genomic relationship matrix to estimate genetic correlations between populations and validated it using simulations. When the scaling factor of across-population genomic relationships is equal to the product of the square roots of the scaling factors for within-population genomic relationships, the genetic correlation is estimated unbiasedly even though estimated genetic variances do not necessarily refer to the current population. When this property is not met, the correlation based on estimated variances should be multiplied by a correction factor based on the scaling factors. In this study, we present a genomic relationship matrix which directly estimates current genetic variances as well as genetic correlations between populations. Copyright © 2017 by the Genetics Society of America.
Does genomic imprinting play a role in autoimmunity?

PubMed

Camprubí, Cristina; Monk, David

2011-01-01

In the 19th century Gregor Mendel defined the laws of genetic inheritance by crossing different types of peas. From these results arose his principle of equivalence: the gene will have the same behaviour whether it is inherited from the mother or the father. Today, several key exceptions to this principle are known, for example sex-linked traits and genes in the mitochondrial genome, whose inheritance patterns are referred to as 'non mendelian'. A third, important exception in mammals is that of genomic imprinting, where transcripts are expressed in a monoallelic fashion from only the maternal or the paternal chromosome. In this chapter, we discuss how parent-of-origin effects and genomic imprinting may play a role in autoimmunity and speculate how imprinted miRNAs may influence the expression of many target autoimmune associated genes.
Clinical and Biological Relevance of Genomic Heterogeneity in Chronic Lymphocytic Leukemia

PubMed Central

Friedman, Daphne R.; Lucas, Joseph E.; Weinberg, J. Brice

2013-01-01

Background Chronic lymphocytic leukemia (CLL) is typically regarded as an indolent B-cell malignancy. However, there is wide variability with regards to need for therapy, time to progressive disease, and treatment response. This clinical variability is due, in part, to biological heterogeneity between individual patients’ leukemias. While much has been learned about this biological variation using genomic approaches, it is unclear whether such efforts have sufficiently evaluated biological and clinical heterogeneity in CLL. Methods To study the extent of genomic variability in CLL and the biological and clinical attributes of genomic classification in CLL, we evaluated 893 unique CLL samples from fifteen publicly available gene expression profiling datasets. We used unsupervised approaches to divide the data into subgroups, evaluated the biological pathways and genetic aberrations that were associated with the subgroups, and compared prognostic and clinical outcome data between the subgroups. Results Using an unsupervised approach, we determined that approximately 600 CLL samples are needed to define the spectrum of diversity in CLL genomic expression. We identified seven genomically-defined CLL subgroups that have distinct biological properties, are associated with specific chromosomal deletions and amplifications, and have marked differences in molecular prognostic markers and clinical outcomes. Conclusions Our results indicate that investigations focusing on small numbers of patient samples likely provide a biased outlook on CLL biology. These findings may have important implications in identifying patients who should be treated with specific targeted therapies, which could have efficacy against CLL cells that rely on specific biological pathways. PMID:23468975
Clinical and biological relevance of genomic heterogeneity in chronic lymphocytic leukemia.

PubMed

Friedman, Daphne R; Lucas, Joseph E; Weinberg, J Brice

2013-01-01

Chronic lymphocytic leukemia (CLL) is typically regarded as an indolent B-cell malignancy. However, there is wide variability with regards to need for therapy, time to progressive disease, and treatment response. This clinical variability is due, in part, to biological heterogeneity between individual patients' leukemias. While much has been learned about this biological variation using genomic approaches, it is unclear whether such efforts have sufficiently evaluated biological and clinical heterogeneity in CLL. To study the extent of genomic variability in CLL and the biological and clinical attributes of genomic classification in CLL, we evaluated 893 unique CLL samples from fifteen publicly available gene expression profiling datasets. We used unsupervised approaches to divide the data into subgroups, evaluated the biological pathways and genetic aberrations that were associated with the subgroups, and compared prognostic and clinical outcome data between the subgroups. Using an unsupervised approach, we determined that approximately 600 CLL samples are needed to define the spectrum of diversity in CLL genomic expression. We identified seven genomically-defined CLL subgroups that have distinct biological properties, are associated with specific chromosomal deletions and amplifications, and have marked differences in molecular prognostic markers and clinical outcomes. Our results indicate that investigations focusing on small numbers of patient samples likely provide a biased outlook on CLL biology. These findings may have important implications in identifying patients who should be treated with specific targeted therapies, which could have efficacy against CLL cells that rely on specific biological pathways.
Genomic analysis of three Bifidobacterium species isolated from the calf gastrointestinal tract

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kelly, William J.; Cookson, Adrian L.; Altermann, Eric

Ruminant animals contribute significantly to the global value of agriculture and rely on a complex microbial community for efficient digestion. However, little is known of how this microbial-host relationship develops and is maintained. To begin to address this, we have determined the ability of three Bifidobacterium species isolated from the faeces of newborn calves to grow on carbohydrates typical of a newborn ruminant diet. Genome sequences have been determined for these bacteria with analysis of the genomes providing insights into the host association and identification of several genes that may mediate interactions with the ruminant gastrointestinal tract. The present studymore » provides a starting point from which we can define the role of potential beneficial microbes in the nutrition of young ruminants and begin to influence the interactions between the microbiota and the host. The differences observed in genomic content hint at niche partitioning among the bifidobacterial species analysed and the different strategies they employ to successfully adapt to this habitat.« less
Genomic analysis of three Bifidobacterium species isolated from the calf gastrointestinal tract

DOE PAGES

Kelly, William J.; Cookson, Adrian L.; Altermann, Eric; ...

2016-07-29

Ruminant animals contribute significantly to the global value of agriculture and rely on a complex microbial community for efficient digestion. However, little is known of how this microbial-host relationship develops and is maintained. To begin to address this, we have determined the ability of three Bifidobacterium species isolated from the faeces of newborn calves to grow on carbohydrates typical of a newborn ruminant diet. Genome sequences have been determined for these bacteria with analysis of the genomes providing insights into the host association and identification of several genes that may mediate interactions with the ruminant gastrointestinal tract. The present studymore » provides a starting point from which we can define the role of potential beneficial microbes in the nutrition of young ruminants and begin to influence the interactions between the microbiota and the host. The differences observed in genomic content hint at niche partitioning among the bifidobacterial species analysed and the different strategies they employ to successfully adapt to this habitat.« less
A dictionary based informational genome analysis

PubMed Central

2012-01-01

Background In the post-genomic era several methods of computational genomics are emerging to understand how the whole information is structured within genomes. Literature of last five years accounts for several alignment-free methods, arisen as alternative metrics for dissimilarity of biological sequences. Among the others, recent approaches are based on empirical frequencies of DNA k-mers in whole genomes. Results Any set of words (factors) occurring in a genome provides a genomic dictionary. About sixty genomes were analyzed by means of informational indexes based on genomic dictionaries, where a systemic view replaces a local sequence analysis. A software prototype applying a methodology here outlined carried out some computations on genomic data. We computed informational indexes, built the genomic dictionaries with different sizes, along with frequency distributions. The software performed three main tasks: computation of informational indexes, storage of these in a database, index analysis and visualization. The validation was done by investigating genomes of various organisms. A systematic analysis of genomic repeats of several lengths, which is of vivid interest in biology (for example to compute excessively represented functional sequences, such as promoters), was discussed, and suggested a method to define synthetic genetic networks. Conclusions We introduced a methodology based on dictionaries, and an efficient motif-finding software application for comparative genomics. This approach could be extended along many investigation lines, namely exported in other contexts of computational genomics, as a basis for discrimination of genomic pathologies. PMID:22985068
Comparative genomics of the marine bacterial genus Glaciecola reveals the high degree of genomic diversity and genomic characteristic for cold adaptation.

PubMed

Qin, Qi-Long; Xie, Bin-Bin; Yu, Yong; Shu, Yan-Li; Rong, Jin-Cheng; Zhang, Yan-Jiao; Zhao, Dian-Li; Chen, Xiu-Lan; Zhang, Xi-Ying; Chen, Bo; Zhou, Bai-Cheng; Zhang, Yu-Zhong

2014-06-01

To what extent the genomes of different species belonging to one genus can be diverse and the relationship between genomic differentiation and environmental factor remain unclear for oceanic bacteria. With many new bacterial genera and species being isolated from marine environments, this question warrants attention. In this study, we sequenced all the type strains of the published species of Glaciecola, a recently defined cold-adapted genus with species from diverse marine locations, to study the genomic diversity and cold-adaptation strategy in this genus.The genome size diverged widely from 3.08 to 5.96 Mb, which can be explained by massive gene gain and loss events. Horizontal gene transfer and new gene emergence contributed substantially to the genome size expansion. The genus Glaciecola had an open pan-genome. Comparative genomic research indicated that species of the genus Glaciecola had high diversity in genome size, gene content and genetic relatedness. This may be prevalent in marine bacterial genera considering the dynamic and complex environments of the ocean. Species of Glaciecola had some common genomic features related to cold adaptation, which enable them to thrive and play a role in biogeochemical cycle in the cold marine environments.
MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wu, Yu-Wei; Simmons, Blake A.; Singer, Steven W.

The recovery of genomes from metagenomic datasets is a critical step to defining the functional roles of the underlying uncultivated populations. We previously developed MaxBin, an automated binning approach for high-throughput recovery of microbial genomes from metagenomes. Here, we present an expanded binning algorithm, MaxBin 2.0, which recovers genomes from co-assembly of a collection of metagenomic datasets. Tests on simulated datasets revealed that MaxBin 2.0 is highly accurate in recovering individual genomes, and the application of MaxBin 2.0 to several metagenomes from environmental samples demonstrated that it could achieve two complementary goals: recovering more bacterial genomes compared to binning amore » single sample as well as comparing the microbial community composition between different sampling environments. Availability and implementation: MaxBin 2.0 is freely available at http://sourceforge.net/projects/maxbin/ under BSD license. Supplementary information: Supplementary data are available at Bioinformatics online.« less
Systems properties of the Haemophilus influenzae Rd metabolic genotype.

PubMed

Edwards, J S; Palsson, B O

1999-06-18

Haemophilus influenzae Rd was the first free-living organism for which the complete genomic sequence was established. The annotated sequence and known biochemical information was used to define the H. influenzae Rd metabolic genotype. This genotype contains 488 metabolic reactions operating on 343 metabolites. The stoichiometric matrix was used to determine the systems characteristics of the metabolic genotype and to assess the metabolic capabilities of H. influenzae. The need to balance cofactor and biosynthetic precursor production during growth on mixed substrates led to the definition of six different optimal metabolic phenotypes arising from the same metabolic genotype, each with different constraining features. The effects of variations in the metabolic genotype were also studied, and it was shown that the H. influenzae Rd metabolic genotype contains redundant functions under defined conditions. We thus show that the synthesis of in silico metabolic genotypes from annotated genome sequences is possible and that systems analysis methods are available that can be used to analyze and interpret phenotypic behavior of such genotypes.
An archaeal genomic signature

NASA Technical Reports Server (NTRS)

Graham, D. E.; Overbeek, R.; Olsen, G. J.; Woese, C. R.

2000-01-01

Comparisons of complete genome sequences allow the most objective and comprehensive descriptions possible of a lineage's evolution. This communication uses the completed genomes from four major euryarchaeal taxa to define a genomic signature for the Euryarchaeota and, by extension, the Archaea as a whole. The signature is defined in terms of the set of protein-encoding genes found in at least two diverse members of the euryarchaeal taxa that function uniquely within the Archaea; most signature proteins have no recognizable bacterial or eukaryal homologs. By this definition, 351 clusters of signature proteins have been identified. Functions of most proteins in this signature set are currently unknown. At least 70% of the clusters that contain proteins from all the euryarchaeal genomes also have crenarchaeal homologs. This conservative set, which appears refractory to horizontal gene transfer to the Bacteria or the Eukarya, would seem to reflect the significant innovations that were unique and fundamental to the archaeal "design fabric." Genomic protein signature analysis methods may be extended to characterize the evolution of any phylogenetically defined lineage. The complete set of protein clusters for the archaeal genomic signature is presented as supplementary material (see the PNAS web site, www.pnas.org).
Implementation of genomic recursions in single-step genomic best linear unbiased predictor for US Holsteins with a large number of genotyped animals.

PubMed

Masuda, Y; Misztal, I; Tsuruta, S; Legarra, A; Aguilar, I; Lourenco, D A L; Fragomeni, B O; Lawlor, T J

2016-03-01

The objectives of this study were to develop and evaluate an efficient implementation in the computation of the inverse of genomic relationship matrix with the recursion algorithm, called the algorithm for proven and young (APY), in single-step genomic BLUP. We validated genomic predictions for young bulls with more than 500,000 genotyped animals in final score for US Holsteins. Phenotypic data included 11,626,576 final scores on 7,093,380 US Holstein cows, and genotypes were available for 569,404 animals. Daughter deviations for young bulls with no classified daughters in 2009, but at least 30 classified daughters in 2014 were computed using all the phenotypic data. Genomic predictions for the same bulls were calculated with single-step genomic BLUP using phenotypes up to 2009. We calculated the inverse of the genomic relationship matrix GAPY(-1) based on a direct inversion of genomic relationship matrix on a small subset of genotyped animals (core animals) and extended that information to noncore animals by recursion. We tested several sets of core animals including 9,406 bulls with at least 1 classified daughter, 9,406 bulls and 1,052 classified dams of bulls, 9,406 bulls and 7,422 classified cows, and random samples of 5,000 to 30,000 animals. Validation reliability was assessed by the coefficient of determination from regression of daughter deviation on genomic predictions for the predicted young bulls. The reliabilities were 0.39 with 5,000 randomly chosen core animals, 0.45 with the 9,406 bulls, and 7,422 cows as core animals, and 0.44 with the remaining sets. With phenotypes truncated in 2009 and the preconditioned conjugate gradient to solve mixed model equations, the number of rounds to convergence for core animals defined by bulls was 1,343; defined by bulls and cows, 2,066; and defined by 10,000 random animals, at most 1,629. With complete phenotype data, the number of rounds decreased to 858, 1,299, and at most 1,092, respectively. Setting up GAPY(-1) for 569,404 genotyped animals with 10,000 core animals took 1.3h and 57 GB of memory. The validation reliability with APY reaches a plateau when the number of core animals is at least 10,000. Predictions with APY have little differences in reliability among definitions of core animals. Single-step genomic BLUP with APY is applicable to millions of genotyped animals. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Cell Context Dependent p53 Genome-Wide Binding Patterns and Enrichment at Repeats

DOE PAGES

Botcheva, Krassimira; McCorkle, Sean R.

2014-11-21

The p53 ability to elicit stress specific and cell type specific responses is well recognized, but how that specificity is established remains to be defined. Whether upon activation p53 binds to its genomic targets in a cell type and stress type dependent manner is still an open question. Here we show that the p53 binding to the human genome is selective and cell context-dependent. We mapped the genomic binding sites for the endogenous wild type p53 protein in the human cancer cell line HCT116 and compared them to those we previously determined in the normal cell line IMR90. We reportmore » distinct p53 genome-wide binding landscapes in two different cell lines, analyzed under the same treatment and experimental conditions, using the same ChIP-seq approach. This is evidence for cell context dependent p53 genomic binding. The observed differences affect the p53 binding sites distribution with respect to major genomic and epigenomic elements (promoter regions, CpG islands and repeats). We correlated the high-confidence p53 ChIP-seq peaks positions with the annotated human repeats (UCSC Human Genome Browser) and observed both common and cell line specific trends. In HCT116, the p53 binding was specifically enriched at LINE repeats, compared to IMR90 cells. The p53 genome-wide binding patterns in HCT116 and IMR90 likely reflect the different epigenetic landscapes in these two cell lines, resulting from cancer-associated changes (accumulated in HCT116) superimposed on tissue specific differences (HCT116 has epithelial, while IMR90 has mesenchymal origin). In conclusion, our data support the model for p53 binding to the human genome in a highly selective manner, mobilizing distinct sets of genes, contributing to distinct pathways.« less
Defining the landscape of adaptive genetic diversity.

PubMed

Eckert, Andrew J; Dyer, Rodney J

2012-06-01

Whether they are used to describe fitness, genome architecture or the spatial distribution of environmental variables, the concept of a landscape has figured prominently in our collective reasoning. The tradition of landscapes in evolutionary biology is one of fitness mapped onto axes defined by phenotypes or molecular sequence states. The characteristics of these landscapes depend on natural selection, which is structured across both genomic and environmental landscapes, and thus, the bridge among differing uses of the landscape concept (i.e. metaphorically or literally) is that of an adaptive phenotype and its distribution across geographical landscapes in relation to selective pressures. One of the ultimate goals of evolutionary biology should thus be to construct fitness landscapes in geographical space. Natural plant populations are ideal systems with which to explore the feasibility of attaining this goal, because much is known about the quantitative genetic architecture of complex traits for many different plant species. What is less known are the molecular components of this architecture. In this issue of Molecular Ecology, Parchman et al. (2012) pioneer one of the first truly genome-wide association studies in a tree that moves us closer to this form of mechanistic understanding for an adaptive phenotype in natural populations of lodgepole pine (Pinus contorta Dougl. ex Loud.). © 2012 Blackwell Publishing Ltd.
Defining the biological bases of individual differences in musicality.

PubMed

Gingras, Bruno; Honing, Henkjan; Peretz, Isabelle; Trainor, Laurel J; Fisher, Simon E

2015-03-19

Advances in molecular technologies make it possible to pinpoint genomic factors associated with complex human traits. For cognition and behaviour, identification of underlying genes provides new entry points for deciphering the key neurobiological pathways. In the past decade, the search for genetic correlates of musicality has gained traction. Reports have documented familial clustering for different extremes of ability, including amusia and absolute pitch (AP), with twin studies demonstrating high heritability for some music-related skills, such as pitch perception. Certain chromosomal regions have been linked to AP and musical aptitude, while individual candidate genes have been investigated in relation to aptitude and creativity. Most recently, researchers in this field started performing genome-wide association scans. Thus far, studies have been hampered by relatively small sample sizes and limitations in defining components of musicality, including an emphasis on skills that can only be assessed in trained musicians. With opportunities to administer standardized aptitude tests online, systematic large-scale assessment of musical abilities is now feasible, an important step towards high-powered genome-wide screens. Here, we offer a synthesis of existing literatures and outline concrete suggestions for the development of comprehensive operational tools for the analysis of musical phenotypes. © 2015 The Author(s) Published by the Royal Society. All rights reserved.
Reduce Manual Curation by Combining Gene Predictions from Multiple Annotation Engines, a Case Study of Start Codon Prediction

PubMed Central

Ederveen, Thomas H. A.; Overmars, Lex; van Hijum, Sacha A. F. T.

2013-01-01

Nowadays, prokaryotic genomes are sequenced faster than the capacity to manually curate gene annotations. Automated genome annotation engines provide users a straight-forward and complete solution for predicting ORF coordinates and function. For many labs, the use of AGEs is therefore essential to decrease the time necessary for annotating a given prokaryotic genome. However, it is not uncommon for AGEs to provide different and sometimes conflicting predictions. Combining multiple AGEs might allow for more accurate predictions. Here we analyzed the ab initio open reading frame (ORF) calling performance of different AGEs based on curated genome annotations of eight strains from different bacterial species with GC% ranging from 35–52%. We present a case study which demonstrates a novel way of comparative genome annotation, using combinations of AGEs in a pre-defined order (or path) to predict ORF start codons. The order of AGE combinations is from high to low specificity, where the specificity is based on the eight genome annotations. For each AGE combination we are able to derive a so-called projected confidence value, which is the average specificity of ORF start codon prediction based on the eight genomes. The projected confidence enables estimating likeliness of a correct prediction for a particular ORF start codon by a particular AGE combination, pinpointing ORFs notoriously difficult to predict start codons. We correctly predict start codons for 90.5±4.8% of the genes in a genome (based on the eight genomes) with an accuracy of 81.1±7.6%. Our consensus-path methodology allows a marked improvement over majority voting (9.7±4.4%) and with an optimal path ORF start prediction sensitivity is gained while maintaining a high specificity. PMID:23675487
Allele-specific control of replication timing and genome organization during development.

PubMed

Rivera-Mulia, Juan Carlos; Dimond, Andrew; Vera, Daniel; Trevilla-Garcia, Claudia; Sasaki, Takayo; Zimmerman, Jared; Dupont, Catherine; Gribnau, Joost; Fraser, Peter; Gilbert, David M

2018-05-07

DNA replication occurs in a defined temporal order known as the replication-timing (RT) program. RT is regulated during development in discrete chromosomal units, coordinated with transcriptional activity and 3D genome organization. Here, we derived distinct cell types from F1 hybrid musculus X castaneus mouse crosses and exploited the high single nucleotide polymorphism (SNP) density to characterize allelic differences in RT (Repli-seq), genome organization (Hi-C and promoter-capture Hi-C), gene expression (total nuclear RNA-seq) and chromatin accessibility (ATAC-seq). We also present HARP: a new computational tool for sorting SNPs in phased genomes to efficiently measure allele-specific genome-wide data. Analysis of six different hybrid mESC clones with different genomes (C57BL/6, 129/sv and CAST/Ei), parental configurations and gender revealed significant RT asynchrony between alleles across ~12% of the autosomal genome linked to sub-species genomes but not to parental origin, growth conditions or gender. RT asynchrony in mESCs strongly correlated with changes in Hi-C compartments between alleles but not SNP density, gene expression, imprinting or chromatin accessibility. We then tracked mESC RT asynchronous regions during development by analyzing differentiated cell types including extraembryonic endoderm stem (XEN) cells, 4 male and female primary mouse embryonic fibroblasts (MEFs) and neural precursor cells (NPCs) differentiated in vitro from mESCs with opposite parental configurations. We found that RT asynchrony and allelic discordance in Hi-C compartments seen in mESCs was largely lost in all differentiated cell types, coordinated with a more uniform Hi-C compartment arrangement, suggesting that genome organization of homologues converges to similar folding patterns during cell fate commitment. Published by Cold Spring Harbor Laboratory Press.

Comparative whole genome analysis of six diagnostic brucellaphages.

PubMed

Farlow, Jason; Filippov, Andrey A; Sergueev, Kirill V; Hang, Jun; Kotorashvili, Adam; Nikolich, Mikeljon P

2014-05-15

Whole genome sequencing of six diagnostic brucellaphages, Tbilisi (Tb), Firenze (Fz), Weybridge (Wb), S708, Berkeley (Bk) and R/C, was followed with genomic comparisons including recently described genomes of the Tb phage from Mexico (TbM) and Pr phage to elucidate genomic diversity and candidate host range determinants. Comparative whole genome analysis revealed high sequence homogeneity among these brucellaphage genomes and resolved three genetic groups consistent with defined host range phenotypes. Group I was composed of Tb and Fz phages that are predominantly lytic for Brucella abortus and Brucella neotomae; Group II included Bk, R/C, and Pr phages that are lytic mainly for B. abortus, Brucella melitensis and Brucella suis; Group III was composed of Wb and S708 phages that are lytic for B. suis, B. abortus and B. neotomae. We found that the putative phage collar protein is a variable locus with features that may be contributing to the host specificities exhibited by different brucellaphage groups. The presence of several candidate host range determinants is illustrated herein for future dissection of the differential host specificity observed among these phages. Published by Elsevier B.V.
Genome chaos: survival strategy during crisis.

PubMed

Liu, Guo; Stevens, Joshua B; Horne, Steven D; Abdallah, Batoul Y; Ye, Karen J; Bremer, Steven W; Ye, Christine J; Chen, David J; Heng, Henry H

2014-01-01

Genome chaos, a process of complex, rapid genome re-organization, results in the formation of chaotic genomes, which is followed by the potential to establish stable genomes. It was initially detected through cytogenetic analyses, and recently confirmed by whole-genome sequencing efforts which identified multiple subtypes including "chromothripsis", "chromoplexy", "chromoanasynthesis", and "chromoanagenesis". Although genome chaos occurs commonly in tumors, both the mechanism and detailed aspects of the process are unknown due to the inability of observing its evolution over time in clinical samples. Here, an experimental system to monitor the evolutionary process of genome chaos was developed to elucidate its mechanisms. Genome chaos occurs following exposure to chemotherapeutics with different mechanisms, which act collectively as stressors. Characterization of the karyotype and its dynamic changes prior to, during, and after induction of genome chaos demonstrates that chromosome fragmentation (C-Frag) occurs just prior to chaotic genome formation. Chaotic genomes seem to form by random rejoining of chromosomal fragments, in part through non-homologous end joining (NHEJ). Stress induced genome chaos results in increased karyotypic heterogeneity. Such increased evolutionary potential is demonstrated by the identification of increased transcriptome dynamics associated with high levels of karyotypic variance. In contrast to impacting on a limited number of cancer genes, re-organized genomes lead to new system dynamics essential for cancer evolution. Genome chaos acts as a mechanism of rapid, adaptive, genome-based evolution that plays an essential role in promoting rapid macroevolution of new genome-defined systems during crisis, which may explain some unwanted consequences of cancer treatment.
A systems approach defining constraints of the genome architecture on lineage selection and evolvability during somatic cancer evolution

PubMed Central

Rübben, Albert; Nordhoff, Ole

2013-01-01

Summary Most clinically distinguishable malignant tumors are characterized by specific mutations, specific patterns of chromosomal rearrangements and a predominant mechanism of genetic instability but it remains unsolved whether modifications of cancer genomes can be explained solely by mutations and selection through the cancer microenvironment. It has been suggested that internal dynamics of genomic modifications as opposed to the external evolutionary forces have a significant and complex impact on Darwinian species evolution. A similar situation can be expected for somatic cancer evolution as molecular key mechanisms encountered in species evolution also constitute prevalent mutation mechanisms in human cancers. This assumption is developed into a systems approach of carcinogenesis which focuses on possible inner constraints of the genome architecture on lineage selection during somatic cancer evolution. The proposed systems approach can be considered an analogy to the concept of evolvability in species evolution. The principal hypothesis is that permissive or restrictive effects of the genome architecture on lineage selection during somatic cancer evolution exist and have a measurable impact. The systems approach postulates three classes of lineage selection effects of the genome architecture on somatic cancer evolution: i) effects mediated by changes of fitness of cells of cancer lineage, ii) effects mediated by changes of mutation probabilities and iii) effects mediated by changes of gene designation and physical and functional genome redundancy. Physical genome redundancy is the copy number of identical genetic sequences. Functional genome redundancy of a gene or a regulatory element is defined as the number of different genetic elements, regardless of copy number, coding for the same specific biological function within a cancer cell. Complex interactions of the genome architecture on lineage selection may be expected when modifications of the genome architecture have multiple and possibly opposed effects which manifest themselves at disparate times and progression stages. Dissection of putative mechanisms mediating constraints exerted by the genome architecture on somatic cancer evolution may provide an algorithm for understanding and predicting as well as modifying somatic cancer evolution in individual patients. PMID:23336076
Defining and Evaluating a Core Genome Multilocus Sequence Typing Scheme for Genome-Wide Typing of Clostridium difficile.

PubMed

Bletz, Stefan; Janezic, Sandra; Harmsen, Dag; Rupnik, Maja; Mellmann, Alexander

2018-06-01

Clostridium difficile , recently renamed Clostridioides difficile , is the most common cause of antibiotic-associated nosocomial gastrointestinal infections worldwide. To differentiate endogenous infections and transmission events, highly discriminatory subtyping is necessary. Today, methods based on whole-genome sequencing data are increasingly used to subtype bacterial pathogens; however, frequently a standardized methodology and typing nomenclature are missing. Here we report a core genome multilocus sequence typing (cgMLST) approach developed for C. difficile Initially, we determined the breadth of the C. difficile population based on all available MLST sequence types with Bayesian inference (BAPS). The resulting BAPS partitions were used in combination with C. difficile clade information to select representative isolates that were subsequently used to define cgMLST target genes. Finally, we evaluated the novel cgMLST scheme with genomes from 3,025 isolates. BAPS grouping ( n = 6 groups) together with the clade information led to a total of 11 representative isolates that were included for cgMLST definition and resulted in 2,270 cgMLST genes that were present in all isolates. Overall, 2,184 to 2,268 cgMLST targets were detected in the genome sequences of 70 outbreak-associated and reference strains, and on average 99.3% cgMLST targets (1,116 to 2,270 targets) were present in 2,954 genomes downloaded from the NCBI database, underlining the representativeness of the cgMLST scheme. Moreover, reanalyzing different cluster scenarios with cgMLST were concordant to published single nucleotide variant analyses. In conclusion, the novel cgMLST is representative for the whole C. difficile population, is highly discriminatory in outbreak situations, and provides a unique nomenclature facilitating interlaboratory exchange. Copyright © 2018 American Society for Microbiology.
Epigenetic Alterations in Epstein-Barr Virus-Associated Diseases.

PubMed

Niller, Hans Helmut; Banati, Ferenc; Salamon, Daniel; Minarovits, Janos

2016-01-01

Latent Epstein-Bar virus genomes undergo epigenetic modifications which are dependent on the respective tissue type and cellular phenotype. These define distinct viral epigenotypes corresponding with latent viral gene expression profiles. Viral Latent Membrane Proteins 1 and 2A can induce cellular DNA methyltransferases, thereby influencing the methylation status of the viral and cellular genomes. Therefore, not only the viral genomes carry epigenetic modifications, but also the cellular genomes adopt major epigenetic alterations upon EBV infection. The distinct cellular epigenotypes of EBV-infected cells differ from the epigenotypes of their normal counterparts. In Burkitt lymphoma (BL), nasopharyngeal carcinoma (NPC) and EBV-associated gastric carcinoma (EBVaGC) significant changes in the host cell methylome with a strong tendency towards CpG island hypermethylation are observed. Hypermethylated genes unique for EBVaGC suggest the existence of an EBV-specific "epigenetic signature". Contrary to the primary malignancies carrying latent EBV genomes, lymphoblastoid cells (LCs) established by EBV infection of peripheral B cells in vitro are characterized by a massive genome-wide demethylation and a significant decrease and redistribution of heterochromatic histone marks. Establishing complete epigenomes of the diverse EBV-associated malignancies shall clarify their similarities and differences and further clarify the contribution of EBV to the pathogenesis, especially for the epithelial malignancies, NPC and EBVaGC.
Detecting microsatellites within genomes: significant variation among algorithms.

PubMed

Leclercq, Sébastien; Rivals, Eric; Jarne, Philippe

2007-04-18

Microsatellites are short, tandemly-repeated DNA sequences which are widely distributed among genomes. Their structure, role and evolution can be analyzed based on exhaustive extraction from sequenced genomes. Several dedicated algorithms have been developed for this purpose. Here, we compared the detection efficiency of five of them (TRF, Mreps, Sputnik, STAR, and RepeatMasker). Our analysis was first conducted on the human X chromosome, and microsatellite distributions were characterized by microsatellite number, length, and divergence from a pure motif. The algorithms work with user-defined parameters, and we demonstrate that the parameter values chosen can strongly influence microsatellite distributions. The five algorithms were then compared by fixing parameters settings, and the analysis was extended to three other genomes (Saccharomyces cerevisiae, Neurospora crassa and Drosophila melanogaster) spanning a wide range of size and structure. Significant differences for all characteristics of microsatellites were observed among algorithms, but not among genomes, for both perfect and imperfect microsatellites. Striking differences were detected for short microsatellites (below 20 bp), regardless of motif. Since the algorithm used strongly influences empirical distributions, studies analyzing microsatellite evolution based on a comparison between empirical and theoretical size distributions should therefore be considered with caution. We also discuss why a typological definition of microsatellites limits our capacity to capture their genomic distributions.
Detecting microsatellites within genomes: significant variation among algorithms

PubMed Central

Leclercq, Sébastien; Rivals, Eric; Jarne, Philippe

2007-01-01

Background Microsatellites are short, tandemly-repeated DNA sequences which are widely distributed among genomes. Their structure, role and evolution can be analyzed based on exhaustive extraction from sequenced genomes. Several dedicated algorithms have been developed for this purpose. Here, we compared the detection efficiency of five of them (TRF, Mreps, Sputnik, STAR, and RepeatMasker). Results Our analysis was first conducted on the human X chromosome, and microsatellite distributions were characterized by microsatellite number, length, and divergence from a pure motif. The algorithms work with user-defined parameters, and we demonstrate that the parameter values chosen can strongly influence microsatellite distributions. The five algorithms were then compared by fixing parameters settings, and the analysis was extended to three other genomes (Saccharomyces cerevisiae, Neurospora crassa and Drosophila melanogaster) spanning a wide range of size and structure. Significant differences for all characteristics of microsatellites were observed among algorithms, but not among genomes, for both perfect and imperfect microsatellites. Striking differences were detected for short microsatellites (below 20 bp), regardless of motif. Conclusion Since the algorithm used strongly influences empirical distributions, studies analyzing microsatellite evolution based on a comparison between empirical and theoretical size distributions should therefore be considered with caution. We also discuss why a typological definition of microsatellites limits our capacity to capture their genomic distributions. PMID:17442102
Environmental Adaptation Contributes to Gene Polymorphism across the Arabidopsis thaliana Genome

PubMed Central

Lee, Cheng-Ruei

2012-01-01

The level of within-species polymorphism differs greatly among genes in a genome. Many genomic studies have investigated the relationship between gene polymorphism and factors such as recombination rate or expression pattern. However, the polymorphism of a gene is affected not only by its physical properties or functional constraints but also by natural selection on organisms in their environments. Specifically, if functionally divergent alleles enable adaptation to different environments, locus-specific polymorphism may be maintained by spatially heterogeneous natural selection. To test this hypothesis and estimate the extent to which environmental selection shapes the pattern of genome-wide polymorphism, we define the "environmental relevance" of a gene as the proportion of genetic variation explained by environmental factors, after controlling for population structure. We found substantial effects of environmental relevance on patterns of polymorphism among genes. In addition, the correlation between environmental relevance and gene polymorphism is positive, consistent with the expectation that balancing selection among heterogeneous environments maintains genetic variation at ecologically important genes. Comparison of the gene ontology annotations shows that genes with high environmental relevance are enriched in unknown function categories. These results suggest an important role for environmental factors in shaping genome-wide patterns of polymorphism and indicate another direction of genomic study. PMID:22798389
Genome-scale reconstruction of the Streptococcus pyogenes M49 metabolic network reveals growth requirements and indicates potential drug targets.

PubMed

Levering, Jennifer; Fiedler, Tomas; Sieg, Antje; van Grinsven, Koen W A; Hering, Silvio; Veith, Nadine; Olivier, Brett G; Klett, Lara; Hugenholtz, Jeroen; Teusink, Bas; Kreikemeyer, Bernd; Kummer, Ursula

2016-08-20

Genome-scale metabolic models comprise stoichiometric relations between metabolites, as well as associations between genes and metabolic reactions and facilitate the analysis of metabolism. We computationally reconstructed the metabolic network of the lactic acid bacterium Streptococcus pyogenes M49. Initially, we based the reconstruction on genome annotations and already existing and curated metabolic networks of Bacillus subtilis, Escherichia coli, Lactobacillus plantarum and Lactococcus lactis. This initial draft was manually curated with the final reconstruction accounting for 480 genes associated with 576 reactions and 558 metabolites. In order to constrain the model further, we performed growth experiments of wild type and arcA deletion strains of S. pyogenes M49 in a chemically defined medium and calculated nutrient uptake and production fluxes. We additionally performed amino acid auxotrophy experiments to test the consistency of the model. The established genome-scale model can be used to understand the growth requirements of the human pathogen S. pyogenes and define optimal and suboptimal conditions, but also to describe differences and similarities between S. pyogenes and related lactic acid bacteria such as L. lactis in order to find strategies to reduce the growth of the pathogen and propose drug targets. Copyright © 2016 Elsevier B.V. All rights reserved.
Collective Dynamics of Specific Gene Ensembles Crucial for Neutrophil Differentiation: The Existence of Genome Vehicles Revealed

PubMed Central

Giuliani, Alessandro; Tomita, Masaru

2010-01-01

Cell fate decision remarkably generates specific cell differentiation path among the multiple possibilities that can arise through the complex interplay of high-dimensional genome activities. The coordinated action of thousands of genes to switch cell fate decision has indicated the existence of stable attractors guiding the process. However, origins of the intracellular mechanisms that create “cellular attractor” still remain unknown. Here, we examined the collective behavior of genome-wide expressions for neutrophil differentiation through two different stimuli, dimethyl sulfoxide (DMSO) and all-trans-retinoic acid (atRA). To overcome the difficulties of dealing with single gene expression noises, we grouped genes into ensembles and analyzed their expression dynamics in correlation space defined by Pearson correlation and mutual information. The standard deviation of correlation distributions of gene ensembles reduces when the ensemble size is increased following the inverse square root law, for both ensembles chosen randomly from whole genome and ranked according to expression variances across time. Choosing the ensemble size of 200 genes, we show the two probability distributions of correlations of randomly selected genes for atRA and DMSO responses overlapped after 48 hours, defining the neutrophil attractor. Next, tracking the ranked ensembles' trajectories, we noticed that only certain, not all, fall into the attractor in a fractal-like manner. The removal of these genome elements from the whole genomes, for both atRA and DMSO responses, destroys the attractor providing evidence for the existence of specific genome elements (named “genome vehicle”) responsible for the neutrophil attractor. Notably, within the genome vehicles, genes with low or moderate expression changes, which are often considered noisy and insignificant, are essential components for the creation of the neutrophil attractor. Further investigations along with our findings might provide a comprehensive mechanistic view of cell fate decision. PMID:20725638
Discovery of novel targets for multi-epitope vaccines: Screening of HIV-1 genomes using association rule mining

PubMed Central

Paul, Sinu; Piontkivska, Helen

2009-01-01

Background Studies have shown that in the genome of human immunodeficiency virus (HIV-1) regions responsible for interactions with the host's immune system, namely, cytotoxic T-lymphocyte (CTL) epitopes tend to cluster together in relatively conserved regions. On the other hand, "epitope-less" regions or regions with relatively low density of epitopes tend to be more variable. However, very little is known about relationships among epitopes from different genes, in other words, whether particular epitopes from different genes would occur together in the same viral genome. To identify CTL epitopes in different genes that co-occur in HIV genomes, association rule mining was used. Results Using a set of 189 best-defined HIV-1 CTL/CD8+ epitopes from 9 different protein-coding genes, as described by Frahm, Linde & Brander (2007), we examined the complete genomic sequences of 62 reference HIV sequences (including 13 subtypes and sub-subtypes with approximately 4 representative sequences for each subtype or sub-subtype, and 18 circulating recombinant forms). The results showed that despite inclusion of recombinant sequences that would be expected to break-up associations of epitopes in different genes when two different genomes are recombined, there exist particular combinations of epitopes (epitope associations) that occur repeatedly across the world-wide population of HIV-1. For example, Pol epitope LFLDGIDKA is found to be significantly associated with epitopes GHQAAMQML and FLKEKGGL from Gag and Nef, respectively, and this association rule is observed even among circulating recombinant forms. Conclusion We have identified CTL epitope combinations co-occurring in HIV-1 genomes including different subtypes and recombinant forms. Such co-occurrence has important implications for design of complex vaccines (multi-epitope vaccines) and/or drugs that would target multiple HIV-1 regions at once and, thus, may be expected to overcome challenges associated with viral escape. PMID:19580659
Comparative Genomics Analysis of Streptococcus Isolates from the Human Small Intestine Reveals their Adaptation to a Highly Dynamic Ecosystem

PubMed Central

Van den Bogert, Bartholomeus; Boekhorst, Jos; Herrmann, Ruth; Smid, Eddy J.; Zoetendal, Erwin G.; Kleerebezem, Michiel

2013-01-01

The human small-intestinal microbiota is characterised by relatively large and dynamic Streptococcus populations. In this study, genome sequences of small-intestinal streptococci from S. mitis, S. bovis, and S. salivarius species-groups were determined and compared with those from 58 Streptococcus strains in public databases. The Streptococcus pangenome consists of 12,403 orthologous groups of which 574 are shared among all sequenced streptococci and are defined as the Streptococcus core genome. Genome mining of the small-intestinal streptococci focused on functions playing an important role in the interaction of these streptococci in the small-intestinal ecosystem, including natural competence and nutrient-transport and metabolism. Analysis of the small-intestinal Streptococcus genomes predicts a high capacity to synthesize amino acids and various vitamins as well as substantial divergence in their carbohydrate transport and metabolic capacities, which is in agreement with observed physiological differences between these Streptococcus strains. Gene-specific PCR-strategies enabled evaluation of conservation of Streptococcus populations in intestinal samples from different human individuals, revealing that the S. salivarius strains were frequently detected in the small-intestine microbiota, supporting the representative value of the genomes provided in this study. Finally, the Streptococcus genomes allow prediction of the effect of dietary substances on Streptococcus population dynamics in the human small-intestine. PMID:24386196
Identification and Resolution of Microdiversity through Metagenomic Sequencing of Parallel Consortia

PubMed Central

Maezato, Yukari; Wu, Yu-Wei; Romine, Margaret F.; Lindemann, Stephen R.

2015-01-01

To gain a predictive understanding of the interspecies interactions within microbial communities that govern community function, the genomic complement of every member population must be determined. Although metagenomic sequencing has enabled the de novo reconstruction of some microbial genomes from environmental communities, microdiversity confounds current genome reconstruction techniques. To overcome this issue, we performed short-read metagenomic sequencing on parallel consortia, defined as consortia cultivated under the same conditions from the same natural community with overlapping species composition. The differences in species abundance between the two consortia allowed reconstruction of near-complete (at an estimated >85% of gene complement) genome sequences for 17 of the 20 detected member species. Two Halomonas spp. indistinguishable by amplicon analysis were found to be present within the community. In addition, comparison of metagenomic reads against the consensus scaffolds revealed within-species variation for one of the Halomonas populations, one of the Rhodobacteraceae populations, and the Rhizobiales population. Genomic comparison of these representative instances of inter- and intraspecies microdiversity suggests differences in functional potential that may result in the expression of distinct roles in the community. In addition, isolation and complete genome sequence determination of six member species allowed an investigation into the sensitivity and specificity of genome reconstruction processes, demonstrating robustness across a wide range of sequence coverage (9× to 2,700×) within the metagenomic data set. PMID:26497460
Identification and Resolution of Microdiversity through Metagenomic Sequencing of Parallel Consortia

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nelson, William C.; Maezato, Yukari; Wu, Yu-Wei

2015-10-23

To gain a predictive understanding of the interspecies interactions within microbial communities that govern community function, the genomic complement of every member population must be determined. Although metagenomic sequencing has enabled thede novoreconstruction of some microbial genomes from environmental communities, microdiversity confounds current genome reconstruction techniques. To overcome this issue, we performed short-read metagenomic sequencing on parallel consortia, defined as consortia cultivated under the same conditions from the same natural community with overlapping species composition. The differences in species abundance between the two consortia allowed reconstruction of near-complete (at an estimated >85% of gene complement) genome sequences for 17 ofmore » the 20 detected member species. TwoHalomonasspp. indistinguishable by amplicon analysis were found to be present within the community. In addition, comparison of metagenomic reads against the consensus scaffolds revealed within-species variation for one of theHalomonaspopulations, one of theRhodobacteraceaepopulations, and theRhizobialespopulation. Genomic comparison of these representative instances of inter- and intraspecies microdiversity suggests differences in functional potential that may result in the expression of distinct roles in the community. In addition, isolation and complete genome sequence determination of six member species allowed an investigation into the sensitivity and specificity of genome reconstruction processes, demonstrating robustness across a wide range of sequence coverage (9× to 2,700×) within the metagenomic data set.« less
Structural genomics reveals EVE as a new ASCH/PUA-related domain

PubMed Central

Bertonati, Claudia; Punta, Marco; Fischer, Markus; Yachdav, Guy; Forouhar, Farhad; Zhou, Weihong; Kuzin, Alexander P.; Seetharaman, Jayaraman; Abashidze, Mariam; Ramelot, Theresa A.; Kennedy, Michael A.; Cort, John R.; Belachew, Adam; Hunt, John F.; Tong, Liang; Montelione, Gaetano T.; Rost, Burkhard

2014-01-01

Summary We report on several proteins recently solved by structural genomics consortia, in particular by the Northeast Structural Genomics consortium (NESG). The proteins considered in this study differ substantially in their sequences but they share a similar structural core, characterized by a pseudobarrel five-stranded beta sheet. This core corresponds to the PUA domain-like architecture in the SCOP database. By connecting sequence information with structural knowledge, we characterize a new subgroup of these proteins that we propose to be distinctly different from previously described PUA domain-like domains such as PUA proper or ASCH. We refer to these newly defined domains as EVE. Although EVE may have retained the ability of PUA domains to bind RNA, the available experimental and computational data suggests that both the details of its molecular function and its cellular function differ from those of other PUA domain-like domains. This study of EVE and its relatives illustrates how the combination of structure and genomics creates new insights by connecting a cornucopia of structures that map to the same evolutionary potential. Primary sequence information alone would have not been sufficient to reveal these evolutionary links. PMID:19191354
Structural Genomics Reveals EVE as a New ASCH/PUA-Related Domain

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bertonati, C.; Punta, M; Fischer, M

2008-01-01

We report on several proteins recently solved by structural genomics consortia, in particular by the Northeast Structural Genomics consortium (NESG). The proteins considered in this study differ substantially in their sequences but they share a similar structural core, characterized by a pseudobarrel five-stranded beta sheet. This core corresponds to the PUA domain-like architecture in the SCOP database. By connecting sequence information with structural knowledge, we characterize a new subgroup of these proteins that we propose to be distinctly different from previously described PUA domain-like domains such as PUA proper or ASCH. We refer to these newly defined domains as EVE.more » Although EVE may have retained the ability of PUA domains to bind RNA, the available experimental and computational data suggests that both the details of its molecular function and its cellular function differ from those of other PUA domain-like domains. This study of EVE and its relatives illustrates how the combination of structure and genomics creates new insights by connecting a cornucopia of structures that map to the same evolutionary potential. Primary sequence information alone would have not been sufficient to reveal these evolutionary links.« less
Inferring the Minimal Genome of Mesoplasma florum by Comparative Genomics and Transposon Mutagenesis.

PubMed

Baby, Vincent; Lachance, Jean-Christophe; Gagnon, Jules; Lucier, Jean-François; Matteau, Dominick; Knight, Tom; Rodrigue, Sébastien

2018-01-01

The creation and comparison of minimal genomes will help better define the most fundamental mechanisms supporting life. Mesoplasma florum is a near-minimal, fast-growing, nonpathogenic bacterium potentially amenable to genome reduction efforts. In a comparative genomic study of 13 M. florum strains, including 11 newly sequenced genomes, we have identified the core genome and open pangenome of this species. Our results show that all of the strains have approximately 80% of their gene content in common. Of the remaining 20%, 17% of the genes were found in multiple strains and 3% were unique to any given strain. On the basis of random transposon mutagenesis, we also estimated that ~290 out of 720 genes are essential for M. florum L1 in rich medium. We next evaluated different genome reduction scenarios for M. florum L1 by using gene conservation and essentiality data, as well as comparisons with the first working approximation of a minimal organism, Mycoplasma mycoides JCVI-syn3.0. Our results suggest that 409 of the 473 M. mycoides JCVI-syn3.0 genes have orthologs in M. florum L1. Conversely, 57 putatively essential M. florum L1 genes have no homolog in M. mycoides JCVI-syn3.0. This suggests differences in minimal genome compositions, even for these evolutionarily closely related bacteria. IMPORTANCE The last years have witnessed the development of whole-genome cloning and transplantation methods and the complete synthesis of entire chromosomes. Recently, the first minimal cell, Mycoplasma mycoides JCVI-syn3.0, was created. Despite these milestone achievements, several questions remain to be answered. For example, is the composition of minimal genomes virtually identical in phylogenetically related species? On the basis of comparative genomics and transposon mutagenesis, we investigated this question by using an alternative model, Mesoplasma florum, that is also amenable to genome reduction efforts. Our results suggest that the creation of additional minimal genomes could help reveal different gene compositions and strategies that can support life, even within closely related species.
Comparative Pan-Genome Analysis of Piscirickettsia salmonis Reveals Genomic Divergences within Genogroups.

PubMed

Nourdin-Galindo, Guillermo; Sánchez, Patricio; Molina, Cristian F; Espinoza-Rojas, Daniela A; Oliver, Cristian; Ruiz, Pamela; Vargas-Chacoff, Luis; Cárcamo, Juan G; Figueroa, Jaime E; Mancilla, Marcos; Maracaja-Coutinho, Vinicius; Yañez, Alejandro J

2017-01-01

Piscirickettsia salmonis is the etiological agent of salmonid rickettsial septicemia, a disease that seriously affects the salmonid industry. Despite efforts to genomically characterize P. salmonis , functional information on the life cycle, pathogenesis mechanisms, diagnosis, treatment, and control of this fish pathogen remain lacking. To address this knowledge gap, the present study conducted an in silico pan-genome analysis of 19 P. salmonis strains from distinct geographic locations and genogroups. Results revealed an expected open pan-genome of 3,463 genes and a core-genome of 1,732 genes. Two marked genogroups were identified, as confirmed by phylogenetic and phylogenomic relationships to the LF-89 and EM-90 reference strains, as well as by assessments of genomic structures. Different structural configurations were found for the six identified copies of the ribosomal operon in the P. salmonis genome, indicating translocation throughout the genetic material. Chromosomal divergences in genomic localization and quantity of genetic cassettes were also found for the Dot/Icm type IVB secretion system. To determine divergences between core-genomes, additional pan-genome descriptions were compiled for the so-termed LF and EM genogroups. Open pan-genomes composed of 2,924 and 2,778 genes and core-genomes composed of 2,170 and 2,228 genes were respectively found for the LF and EM genogroups. The core-genomes were functionally annotated using the Gene Ontology, KEGG, and Virulence Factor databases, revealing the presence of several shared groups of genes related to basic function of intracellular survival and bacterial pathogenesis. Additionally, the specific pan-genomes for the LF and EM genogroups were defined, resulting in the identification of 148 and 273 exclusive proteins, respectively. Notably, specific virulence factors linked to adherence, colonization, invasion factors, and endotoxins were established. The obtained data suggest that these genes could be directly associated with inter-genogroup differences in pathogenesis and host-pathogen interactions, information that could be useful in designing novel strategies for diagnosing and controlling P. salmonis infection.
Comparative Pan-Genome Analysis of Piscirickettsia salmonis Reveals Genomic Divergences within Genogroups

PubMed Central

Nourdin-Galindo, Guillermo; Sánchez, Patricio; Molina, Cristian F.; Espinoza-Rojas, Daniela A.; Oliver, Cristian; Ruiz, Pamela; Vargas-Chacoff, Luis; Cárcamo, Juan G.; Figueroa, Jaime E.; Mancilla, Marcos; Maracaja-Coutinho, Vinicius; Yañez, Alejandro J.

2017-01-01

Piscirickettsia salmonis is the etiological agent of salmonid rickettsial septicemia, a disease that seriously affects the salmonid industry. Despite efforts to genomically characterize P. salmonis, functional information on the life cycle, pathogenesis mechanisms, diagnosis, treatment, and control of this fish pathogen remain lacking. To address this knowledge gap, the present study conducted an in silico pan-genome analysis of 19 P. salmonis strains from distinct geographic locations and genogroups. Results revealed an expected open pan-genome of 3,463 genes and a core-genome of 1,732 genes. Two marked genogroups were identified, as confirmed by phylogenetic and phylogenomic relationships to the LF-89 and EM-90 reference strains, as well as by assessments of genomic structures. Different structural configurations were found for the six identified copies of the ribosomal operon in the P. salmonis genome, indicating translocation throughout the genetic material. Chromosomal divergences in genomic localization and quantity of genetic cassettes were also found for the Dot/Icm type IVB secretion system. To determine divergences between core-genomes, additional pan-genome descriptions were compiled for the so-termed LF and EM genogroups. Open pan-genomes composed of 2,924 and 2,778 genes and core-genomes composed of 2,170 and 2,228 genes were respectively found for the LF and EM genogroups. The core-genomes were functionally annotated using the Gene Ontology, KEGG, and Virulence Factor databases, revealing the presence of several shared groups of genes related to basic function of intracellular survival and bacterial pathogenesis. Additionally, the specific pan-genomes for the LF and EM genogroups were defined, resulting in the identification of 148 and 273 exclusive proteins, respectively. Notably, specific virulence factors linked to adherence, colonization, invasion factors, and endotoxins were established. The obtained data suggest that these genes could be directly associated with inter-genogroup differences in pathogenesis and host-pathogen interactions, information that could be useful in designing novel strategies for diagnosing and controlling P. salmonis infection. PMID:29164068
Pharmacogenetics: Implications of Race and Ethnicity on Defining Genetic Profiles for Personalized Medicine

PubMed Central

Ortega, Victor E.; Meyers, Deborah A.

2014-01-01

Pharmacogenetics is being used to develop personalized therapies specific to individuals from different ethnic or racial groups. Pharmacogenetic studies to date have been primarily performed in trial cohorts consisting of non-Hispanic whites of European descent. A “bottleneck” or collapse of genetic diversity associated with the first human colonization of Europe during the Upper Paleolithic period, followed by the recent mixing of African, European, and Native American ancestries has resulted in different ethnic groups with varying degrees of genetic diversity. Differences in genetic ancestry may introduce genetic variation which has the potential to alter the therapeutic efficacy of commonly used asthma therapies, for example β2-adrenergic receptor agonists (beta agonists). Pharmacogenetic studies of admixed ethnic groups have been limited to small candidate gene association studies of which the best example is the gene coding for the receptor target of beta agonist therapy, ADRB2. Large consortium-based sequencing studies are using next-generation whole-genome sequencing to provide a diverse genome map of different admixed populations which can be used for future pharmacogenetic studies. These studies will include candidate gene studies, genome-wide association studies, and whole-genome admixture-based approaches which account for ancestral genetic structure, complex haplotypes, gene-gene interactions, and rare variants to detect and replicate novel pharmacogenetic loci. PMID:24369795

A Novel Protective Vaccine Antigen from the Core Escherichia coli Genome

PubMed Central

Moriel, Danilo G.; Tan, Lendl; Goh, Kelvin G. K.; Ipe, Deepak S.; Lo, Alvin W.; Peters, Kate M.

2016-01-01

ABSTRACT Escherichia coli is a versatile pathogen capable of causing intestinal and extraintestinal infections that result in a huge burden of global human disease. The diversity of E. coli is reflected by its multiple different pathotypes and mosaic genome composition. E. coli strains are also a major driver of antibiotic resistance, emphasizing the urgent need for new treatment and prevention measures. Here, we used a large data set comprising 1,700 draft and complete genomes to define the core and accessory genome of E. coli and demonstrated the overlapping relationship between strains from different pathotypes. In combination with proteomic investigation, this analysis revealed core genes that encode surface-exposed or secreted proteins that represent potential broad-coverage vaccine antigens. One of these antigens, YncE, was characterized as a conserved immunogenic antigen able to protect against acute systemic infection in mice after vaccination. Overall, this work provides a genomic blueprint for future analyses of conserved and accessory E. coli genes. The work also identified YncE as a novel antigen that could be exploited in the development of a vaccine against all pathogenic E. coli strains—an important direction given the high global incidence of infections caused by multidrug-resistant strains for which there are few effective antibiotics. IMPORTANCE E. coli is a multifaceted pathogen of major significance to global human health and an important contributor to increasing antibiotic resistance. Given the paucity of therapies still effective against multidrug-resistant pathogenic E. coli strains, novel treatment and prevention strategies are urgently required. In this study, we defined the core and accessory components of the E. coli genome by examining a large collection of draft and completely sequenced strains available from public databases. This data set was mined by employing a reverse-vaccinology approach in combination with proteomics to identify putative broadly protective vaccine antigens. One such antigen was identified that was highly immunogenic and induced protection in a mouse model of bacteremia. Overall, our study provides a genomic and proteomic framework for the selection of novel vaccine antigens that could mediate broad protection against pathogenic E. coli. PMID:27904885
A genome-wide association study of seed protein and oil content in soybean

PubMed Central

2014-01-01

Background Association analysis is an alternative to conventional family-based methods to detect the location of gene(s) or quantitative trait loci (QTL) and provides relatively high resolution in terms of defining the genome position of a gene or QTL. Seed protein and oil concentration are quantitative traits which are determined by the interaction among many genes with small to moderate genetic effects and their interaction with the environment. In this study, a genome-wide association study (GWAS) was performed to identify quantitative trait loci (QTL) controlling seed protein and oil concentration in 298 soybean germplasm accessions exhibiting a wide range of seed protein and oil content. Results A total of 55,159 single nucleotide polymorphisms (SNPs) were genotyped using various methods including Illumina Infinium and GoldenGate assays and 31,954 markers with minor allele frequency >0.10 were used to estimate linkage disequilibrium (LD) in heterochromatic and euchromatic regions. In euchromatic regions, the mean LD (r 2 ) rapidly declined to 0.2 within 360 Kbp, whereas the mean LD declined to 0.2 at 9,600 Kbp in heterochromatic regions. The GWAS results identified 40 SNPs in 17 different genomic regions significantly associated with seed protein. Of these, the five SNPs with the highest associations and seven adjacent SNPs were located in the 27.6-30.0 Mbp region of Gm20. A major seed protein QTL has been previously mapped to the same location and potential candidate genes have recently been identified in this region. The GWAS results also detected 25 SNPs in 13 different genomic regions associated with seed oil. Of these markers, seven SNPs had a significant association with both protein and oil. Conclusions This research indicated that GWAS not only identified most of the previously reported QTL controlling seed protein and oil, but also resulted in narrower genomic regions than the regions reported as containing these QTL. The narrower GWAS-defined genome regions will allow more precise marker-assisted allele selection and will expedite positional cloning of the causal gene(s). PMID:24382143
A genome-wide association study of seed protein and oil content in soybean.

PubMed

Hwang, Eun-Young; Song, Qijian; Jia, Gaofeng; Specht, James E; Hyten, David L; Costa, Jose; Cregan, Perry B

2014-01-02

Association analysis is an alternative to conventional family-based methods to detect the location of gene(s) or quantitative trait loci (QTL) and provides relatively high resolution in terms of defining the genome position of a gene or QTL. Seed protein and oil concentration are quantitative traits which are determined by the interaction among many genes with small to moderate genetic effects and their interaction with the environment. In this study, a genome-wide association study (GWAS) was performed to identify quantitative trait loci (QTL) controlling seed protein and oil concentration in 298 soybean germplasm accessions exhibiting a wide range of seed protein and oil content. A total of 55,159 single nucleotide polymorphisms (SNPs) were genotyped using various methods including Illumina Infinium and GoldenGate assays and 31,954 markers with minor allele frequency >0.10 were used to estimate linkage disequilibrium (LD) in heterochromatic and euchromatic regions. In euchromatic regions, the mean LD (r2) rapidly declined to 0.2 within 360 Kbp, whereas the mean LD declined to 0.2 at 9,600 Kbp in heterochromatic regions. The GWAS results identified 40 SNPs in 17 different genomic regions significantly associated with seed protein. Of these, the five SNPs with the highest associations and seven adjacent SNPs were located in the 27.6-30.0 Mbp region of Gm20. A major seed protein QTL has been previously mapped to the same location and potential candidate genes have recently been identified in this region. The GWAS results also detected 25 SNPs in 13 different genomic regions associated with seed oil. Of these markers, seven SNPs had a significant association with both protein and oil. This research indicated that GWAS not only identified most of the previously reported QTL controlling seed protein and oil, but also resulted in narrower genomic regions than the regions reported as containing these QTL. The narrower GWAS-defined genome regions will allow more precise marker-assisted allele selection and will expedite positional cloning of the causal gene(s).
Establishment of a Brazilian line of human embryonic stem cells in defined medium: implications for cell therapy in an ethnically diverse population.

PubMed

Fraga, Ana M; Sukoyan, Marina; Rajan, Prithi; Braga, Daniela Paes de Almeida Ferreira; Iaconelli, Assumpto; Franco, José Gonçalves; Borges, Edson; Pereira, Lygia V

2011-01-01

Pluripotent human embryonic stem (hES) cells are an important experimental tool for basic and applied research, and a potential source of different tissues for transplantation. However, one important challenge for the clinical use of these cells is the issue of immunocompatibility, which may be dealt with by the establishment of hES cell banks to attend different populations. Here we describe the derivation and characterization of a line of hES cells from the Brazilian population, named BR-1, in commercial defined medium. In contrast to the other hES cell lines established in defined medium, BR-1 maintained a stable normal karyotype as determined by genomic array analysis after 6 months in continuous culture (passage 29). To our knowledge, this is the first reported line of hES cells derived in South America. We have determined its genomic ancestry and compared the HLA-profile of BR-1 and another 22 hES cell lines established elsewhere with those of the Brazilian population, finding they would match only 0.011% of those individuals. Our results highlight the challenges involved in hES cell banking for populations with a high degree of ethnic admixture.
NF-Y Binding Site Architecture Defines a C-Fos Targeted Promoter Class

PubMed Central

Haubrock, Martin; Hartmann, Fabian; Wingender, Edgar

2016-01-01

ChIP-seq experiments detect the chromatin occupancy of known transcription factors in a genome-wide fashion. The comparisons of several species-specific ChIP-seq libraries done for different transcription factors have revealed a complex combinatorial and context-specific co-localization behavior for the identified binding regions. In this study we have investigated human derived ChIP-seq data to identify common cis-regulatory principles for the human transcription factor c-Fos. We found that in four different cell lines, c-Fos targeted proximal and distal genomic intervals show prevalences for either AP-1 motifs or CCAAT boxes as known binding motifs for the transcription factor NF-Y, and thereby act in a mutually exclusive manner. For proximal regions of co-localized c-Fos and NF-YB binding, we gathered evidence that a characteristic configuration of repeating CCAAT motifs may be responsible for attracting c-Fos, probably provided by a nearby AP-1 bound enhancer. Our results suggest a novel regulatory function of NF-Y in gene-proximal regions. Specific CCAAT dimer repeats bound by the transcription factor NF-Y define this novel cis-regulatory module. Based on this behavior we propose a new enhancer promoter interaction model based on AP-1 motif defined enhancers which interact with CCAAT-box characterized promoter regions. PMID:27517874
Whole genome comparison of a large collection of mycobacteriophages reveals a continuum of phage genetic diversity

PubMed Central

Pope, Welkin H; Bowman, Charles A; Russell, Daniel A; Jacobs-Sera, Deborah; Asai, David J; Cresawn, Steven G; Jacobs, William R; Hendrix, Roger W; Lawrence, Jeffrey G; Hatfull, Graham F; Abbazia, Patrick; Ababio, Amma; Adam, Naazneen

2015-01-01

The bacteriophage population is large, dynamic, ancient, and genetically diverse. Limited genomic information shows that phage genomes are mosaic, and the genetic architecture of phage populations remains ill-defined. To understand the population structure of phages infecting a single host strain, we isolated, sequenced, and compared 627 phages of Mycobacterium smegmatis. Their genetic diversity is considerable, and there are 28 distinct genomic types (clusters) with related nucleotide sequences. However, amino acid sequence comparisons show pervasive genomic mosaicism, and quantification of inter-cluster and intra-cluster relatedness reveals a continuum of genetic diversity, albeit with uneven representation of different phages. Furthermore, rarefaction analysis shows that the mycobacteriophage population is not closed, and there is a constant influx of genes from other sources. Phage isolation and analysis was performed by a large consortium of academic institutions, illustrating the substantial benefits of a disseminated, structured program involving large numbers of freshman undergraduates in scientific discovery. DOI: http://dx.doi.org/10.7554/eLife.06416.001 PMID:25919952
Whole genome comparison of a large collection of mycobacteriophages reveals a continuum of phage genetic diversity.

PubMed

Pope, Welkin H; Bowman, Charles A; Russell, Daniel A; Jacobs-Sera, Deborah; Asai, David J; Cresawn, Steven G; Jacobs, William R; Hendrix, Roger W; Lawrence, Jeffrey G; Hatfull, Graham F

2015-04-28

The bacteriophage population is large, dynamic, ancient, and genetically diverse. Limited genomic information shows that phage genomes are mosaic, and the genetic architecture of phage populations remains ill-defined. To understand the population structure of phages infecting a single host strain, we isolated, sequenced, and compared 627 phages of Mycobacterium smegmatis. Their genetic diversity is considerable, and there are 28 distinct genomic types (clusters) with related nucleotide sequences. However, amino acid sequence comparisons show pervasive genomic mosaicism, and quantification of inter-cluster and intra-cluster relatedness reveals a continuum of genetic diversity, albeit with uneven representation of different phages. Furthermore, rarefaction analysis shows that the mycobacteriophage population is not closed, and there is a constant influx of genes from other sources. Phage isolation and analysis was performed by a large consortium of academic institutions, illustrating the substantial benefits of a disseminated, structured program involving large numbers of freshman undergraduates in scientific discovery.
Defining Genomic Changes in Triple-Negative Breast Cancer in Women of African Descent

DTIC Science & Technology

2012-06-01

African and African - American breast cancer cases. Gene Expression Array Studies The 31 triple negative Kijabe samples were... American Adjacent Normal Breast Tissue PI: Pegram & Baumbach Defining Genomic Changes in Triple Negative Breast Cancer in Women of African ...Tissues from African - American and East African Patients with Triple Negative Breast
Childhood Cancer Genomics Gaps and Opportunities - Workshop Summary

Cancer.gov

NCI convened a workshop of representative research teams that have been leaders in defining the genomic landscape of childhood cancers to discuss the influence of genomic discoveries on the future of childhood cancer research.
Design of Biomedical Robots for Phenotype Prediction Problems

PubMed Central

deAndrés-Galiana, Enrique J.; Sonis, Stephen T.

2016-01-01

Abstract Genomics has been used with varying degrees of success in the context of drug discovery and in defining mechanisms of action for diseases like cancer and neurodegenerative and rare diseases in the quest for orphan drugs. To improve its utility, accuracy, and cost-effectiveness optimization of analytical methods, especially those that translate to clinically relevant outcomes, is critical. Here we define a novel tool for genomic analysis termed a biomedical robot in order to improve phenotype prediction, identifying disease pathogenesis and significantly defining therapeutic targets. Biomedical robot analytics differ from historical methods in that they are based on melding feature selection methods and ensemble learning techniques. The biomedical robot mathematically exploits the structure of the uncertainty space of any classification problem conceived as an ill-posed optimization problem. Given a classifier, there exist different equivalent small-scale genetic signatures that provide similar predictive accuracies. We perform the sensitivity analysis to noise of the biomedical robot concept using synthetic microarrays perturbed by different kinds of noises in expression and class assignment. Finally, we show the application of this concept to the analysis of different diseases, inferring the pathways and the correlation networks. The final aim of a biomedical robot is to improve knowledge discovery and provide decision systems to optimize diagnosis, treatment, and prognosis. This analysis shows that the biomedical robots are robust against different kinds of noises and particularly to a wrong class assignment of the samples. Assessing the uncertainty that is inherent to any phenotype prediction problem is the right way to address this kind of problem. PMID:27347715
Design of Biomedical Robots for Phenotype Prediction Problems.

PubMed

deAndrés-Galiana, Enrique J; Fernández-Martínez, Juan Luis; Sonis, Stephen T

2016-08-01

Genomics has been used with varying degrees of success in the context of drug discovery and in defining mechanisms of action for diseases like cancer and neurodegenerative and rare diseases in the quest for orphan drugs. To improve its utility, accuracy, and cost-effectiveness optimization of analytical methods, especially those that translate to clinically relevant outcomes, is critical. Here we define a novel tool for genomic analysis termed a biomedical robot in order to improve phenotype prediction, identifying disease pathogenesis and significantly defining therapeutic targets. Biomedical robot analytics differ from historical methods in that they are based on melding feature selection methods and ensemble learning techniques. The biomedical robot mathematically exploits the structure of the uncertainty space of any classification problem conceived as an ill-posed optimization problem. Given a classifier, there exist different equivalent small-scale genetic signatures that provide similar predictive accuracies. We perform the sensitivity analysis to noise of the biomedical robot concept using synthetic microarrays perturbed by different kinds of noises in expression and class assignment. Finally, we show the application of this concept to the analysis of different diseases, inferring the pathways and the correlation networks. The final aim of a biomedical robot is to improve knowledge discovery and provide decision systems to optimize diagnosis, treatment, and prognosis. This analysis shows that the biomedical robots are robust against different kinds of noises and particularly to a wrong class assignment of the samples. Assessing the uncertainty that is inherent to any phenotype prediction problem is the right way to address this kind of problem.
Concise classification of the genomic porcine endogenous retroviral gamma1 load to defined lineages.

PubMed

Klymiuk, Nikolai; Wolf, Eckhard; Aigner, Bernhard

2008-02-05

We investigated the infection history of porcine endogenous retroviruses (PERV) gamma1 by analyzing published env and LTR sequences. PERV sequences from various breeds, porcine cell lines and infected human primary cells were included in the study. We identified a considerable number of retroviral lineages indicating multiple independent colonization events of the porcine genome. A recent boost of the proviral load in an isolated pig herd and exclusive occurrence of distinct lineages in single studies indicated the ongoing colonization of the porcine genome with endogenous retroviruses. Retroviral recombination between co-packaged genomes was a general factor for PERV gamma1 diversity which indicated the simultaneous expression of different proviral loci over a period of time. In total, our detailed description of endogenous retroviral lineages is the prerequisite for breeding approaches to minimize the infectious potential of porcine tissues for the subsequent use in xenotransplantation.
Functional and topological characteristics of mammalian regulatory domains

PubMed Central

Symmons, Orsolya; Uslu, Veli Vural; Tsujimura, Taro; Ruf, Sandra; Nassari, Sonya; Schwarzer, Wibke; Ettwiller, Laurence; Spitz, François

2014-01-01

Long-range regulatory interactions play an important role in shaping gene-expression programs. However, the genomic features that organize these activities are still poorly characterized. We conducted a large operational analysis to chart the distribution of gene regulatory activities along the mouse genome, using hundreds of insertions of a regulatory sensor. We found that enhancers distribute their activities along broad regions and not in a gene-centric manner, defining large regulatory domains. Remarkably, these domains correlate strongly with the recently described TADs, which partition the genome into distinct self-interacting blocks. Different features, including specific repeats and CTCF-binding sites, correlate with the transition zones separating regulatory domains, and may help to further organize promiscuously distributed regulatory influences within large domains. These findings support a model of genomic organization where TADs confine regulatory activities to specific but large regulatory domains, contributing to the establishment of specific gene expression profiles. PMID:24398455
A Proposed Genus Boundary for the Prokaryotes Based on Genomic Insights

PubMed Central

Qin, Qi-Long; Xie, Bin-Bin; Zhang, Xi-Ying; Chen, Xiu-Lan; Zhou, Bai-Cheng; Zhou, Jizhong; Oren, Aharon

2014-01-01

Genomic information has already been applied to prokaryotic species definition and classification. However, the contribution of the genome sequence to prokaryotic genus delimitation has been less studied. To gain insights into genus definition for the prokaryotes, we attempted to reveal the genus-level genomic differences in the current prokaryotic classification system and to delineate the boundary of a genus on the basis of genomic information. The average nucleotide sequence identity between two genomes can be used for prokaryotic species delineation, but it is not suitable for genus demarcation. We used the percentage of conserved proteins (POCP) between two strains to estimate their evolutionary and phenotypic distance. A comprehensive genomic survey indicated that the POCP can serve as a robust genomic index for establishing the genus boundary for prokaryotic groups. Basically, two species belonging to the same genus would share at least half of their proteins. In a specific lineage, the genus and family/order ranks showed slight or no overlap in terms of POCP values. A prokaryotic genus can be defined as a group of species with all pairwise POCP values higher than 50%. Integration of whole-genome data into the current taxonomy system can provide comprehensive information for prokaryotic genus definition and delimitation. PMID:24706738
Whole Genome Sequences of Three Treponema pallidum ssp. pertenue Strains: Yaws and Syphilis Treponemes Differ in Less than 0.2% of the Genome Sequence

PubMed Central

Chen, Lei; Pospíšilová, Petra; Strouhal, Michal; Qin, Xiang; Mikalová, Lenka; Norris, Steven J.; Muzny, Donna M.; Gibbs, Richard A.; Fulton, Lucinda L.; Sodergren, Erica; Weinstock, George M.; Šmajs, David

2012-01-01

Background The yaws treponemes, Treponema pallidum ssp. pertenue (TPE) strains, are closely related to syphilis causing strains of Treponema pallidum ssp. pallidum (TPA). Both yaws and syphilis are distinguished on the basis of epidemiological characteristics, clinical symptoms, and several genetic signatures of the corresponding causative agents. Methodology/Principal Findings To precisely define genetic differences between TPA and TPE, high-quality whole genome sequences of three TPE strains (Samoa D, CDC-2, Gauthier) were determined using next-generation sequencing techniques. TPE genome sequences were compared to four genomes of TPA strains (Nichols, DAL-1, SS14, Chicago). The genome structure was identical in all three TPE strains with similar length ranging between 1,139,330 bp and 1,139,744 bp. No major genome rearrangements were found when compared to the four TPA genomes. The whole genome nucleotide divergence (dA) between TPA and TPE subspecies was 4.7 and 4.8 times higher than the observed nucleotide diversity (π) among TPA and TPE strains, respectively, corresponding to 99.8% identity between TPA and TPE genomes. A set of 97 (9.9%) TPE genes encoded proteins containing two or more amino acid replacements or other major sequence changes. The TPE divergent genes were mostly from the group encoding potential virulence factors and genes encoding proteins with unknown function. Conclusions/Significance Hypothetical genes, with genetic differences, consistently found between TPE and TPA strains are candidates for syphilitic treponemes virulence factors. Seventeen TPE genes were predicted under positive selection, and eleven of them coded either for predicted exported proteins or membrane proteins suggesting their possible association with the cell surface. Sequence changes between TPE and TPA strains and changes specific to individual strains represent suitable targets for subspecies- and strain-specific molecular diagnostics. PMID:22292095
The development of genomics applied to dairy breeding

USDA-ARS?s Scientific Manuscript database

Genomic selection (GS) has profoundly changed dairy cattle breeding in the last decade and can be defined as the use of genomic breeding values (GEBV) in selection programs. The GEBV is the sum of the effects of dense DNA markers across the whole genome, capturing all the quantitative trait loci (QT...
Identical bacterial populations colonize premature infant gut, skin, and oral microbiomes and exhibit different in situ growth rates

PubMed Central

Olm, Matthew R.; Brown, Christopher T.; Brooks, Brandon; Firek, Brian; Baker, Robyn; Burstein, David; Soenjoyo, Karina; Thomas, Brian C.; Morowitz, Michael; Banfield, Jillian F.

2017-01-01

The initial microbiome impacts the health and future development of premature infants. Methodological limitations have led to gaps in our understanding of the habitat range and subpopulation complexity of founding strains, as well as how different body sites support microbial growth. Here, we used metagenomics to reconstruct genomes of strains that colonized the skin, mouth, and gut of two hospitalized premature infants during the first month of life. Seven bacterial populations, considered to be identical given whole-genome average nucleotide identity of >99.9%, colonized multiple body sites, yet none were shared between infants. Gut-associated Citrobacter koseri genomes harbored 47 polymorphic sites that we used to define 10 subpopulations, one of which appeared in the gut after 1 wk but did not spread to other body sites. Differential genome coverage was used to measure bacterial population replication rates in situ. In all cases where the same bacterial population was detected in multiple body sites, replication rates were faster in mouth and skin compared to the gut. The ability of identical strains to colonize multiple body sites underscores the habit flexibility of initial colonists, whereas differences in microbial replication rates between body sites suggest differences in host control and/or resource availability. Population genomic analyses revealed microdiversity within bacterial populations, implying initial inoculation by multiple individual cells with distinct genotypes. Overall, however, the overlap of strains across body sites implies that the premature infant microbiome can exhibit very low microbial diversity. PMID:28073918
Patterns of expression of position-dependent integrated transgenes in mouse embryo.

PubMed Central

Bonnerot, C; Grimber, G; Briand, P; Nicolas, J F

1990-01-01

The abilities to introduce foreign DNA into the genome of mice and to visualize gene expression at the single-cell level underlie a method for defining individual elements of a genetic program. We describe the use of an Escherichia coli lacZ reporter gene fused to the promoter of the gene for hypoxanthine phosphoribosyl transferase that is expressed in all tissues. Most transgenic mice (six of seven) obtained with this construct express the lacZ gene from the hypoxanthine phosphoribosyltransferase promoter. Unexpectedly, however, the expression is temporally and spatially regulated. Each transgenic line is characterized by a specific, highly reproducible pattern of lacZ expression. These results show that, for expression, the integrated construct must be complemented by elements of the genome. These elements exert dominant developmental control on the hypoxanthine phosphoribosyltransferase promoter. The expression patterns in some transgenic mice conform to a typological marker and in others to a subtle combination of typology and topography. These observations define discrete heterogeneities of cell types and of certain structures, particularly in the nervous system and in the mesoderm. This system opens opportunities for developmental studies by providing cellular, molecular, and genetic markers of cell types, cell states, and cells from developmental compartments. Finally this method illustrates that genes transduced or transposed to a different position in the genome acquire different spatiotemporal specificities, a result that has implications for evolution. Images PMID:1696727
Sauropod dinosaurs evolved moderately sized genomes unrelated to body size.

PubMed

Organ, Chris L; Brusatte, Stephen L; Stein, Koen

2009-12-22

Sauropodomorph dinosaurs include the largest land animals to have ever lived, some reaching up to 10 times the mass of an African elephant. Despite their status defining the upper range for body size in land animals, it remains unknown whether sauropodomorphs evolved larger-sized genomes than non-avian theropods, their sister taxon, or whether a relationship exists between genome size and body size in dinosaurs, two questions critical for understanding broad patterns of genome evolution in dinosaurs. Here we report inferences of genome size for 10 sauropodomorph taxa. The estimates are derived from a Bayesian phylogenetic generalized least squares approach that generates posterior distributions of regression models relating genome size to osteocyte lacunae volume in extant tetrapods. We estimate that the average genome size of sauropodomorphs was 2.02 pg (range of species means: 1.77-2.21 pg), a value in the upper range of extant birds (mean = 1.42 pg, range: 0.97-2.16 pg) and near the average for extant non-avian reptiles (mean = 2.24 pg, range: 1.05-5.44 pg). The results suggest that the variation in size and architecture of genomes in extinct dinosaurs was lower than the variation found in mammals. A substantial difference in genome size separates the two major clades within dinosaurs, Ornithischia (large genomes) and Saurischia (moderate to small genomes). We find no relationship between body size and estimated genome size in extinct dinosaurs, which suggests that neutral forces did not dominate the evolution of genome size in this group.
Understanding the differences between genome sequences of Escherichia coli B strains REL606 and BL21(DE3), and comparison of the closely related E. coli B and K-12 genomes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Studier, F.W.; Daegelen, P.; Lenski, R. E.

2009-12-01

Each difference between the genome sequences of Escherichia coli B strains REL606 and BL21(DE3) can be interpreted in light of known laboratory manipulations plus a gene conversion between ribosomal RNA operons. Two treatments with 1-methyl-3-nitro-1-nitrosoguanidine in the REL606 lineage produced at least 93 single-base-pair mutations ({approx} 90% GC-to-AT transitions) and 3 single-base-pair GC deletions. Two UV treatments in the BL21(DE3) lineage produced only 4 single-base-pair mutations but 16 large deletions. P1 transductions from K-12 into the two B lineages produced 317 single-base-pair differences and 9 insertions or deletions, reflecting differences between B DNA in BL21(DE3) and integrated restriction fragments ofmore » K-12 DNA inherited by REL606. Two sites showed selective enrichment of spontaneous mutations. No unselected spontaneous single-base-pair mutations were evident. The genome sequences revealed that a progenitor of REL606 had been misidentified, explaining initially perplexing differences. Limited sequencing of other B strains defined characteristic properties of B and allowed assembly of the inferred genome of the ancestral B of Delbrueck and Luria. Comparison of the B and K-12 genomes shows that more than half of the 3793 proteins of their basic genomes are predicted to be identical, although {approx} 310 appear to be functional in either B or K-12 but not in both. The ancestral basic genome appears to have had {approx} 4039 coding sequences occupying {approx} 4.0 Mbp. Repeated horizontal transfer from diverged Escherichia coli genomes and homologous recombination may explain the observed variable distribution of single-base-pair differences. Fifteen sites are occupied by phage-related elements, but only six by comparable elements at the same site. More than 50 sites are occupied by IS elements in both B and K, 16 in common, and likely founding IS elements are identified. A signature of widespread cryptic phage P4-type mobile elements was identified. Complex deletions (dense clusters of small deletions and substitutions) apparently removed nonessential genes from {approx} 30 sites in the basic genomes.« less

Pathology and genomics of pediatric melanoma: A critical reexamination and new insights.

PubMed

Bahrami, Armita; Barnhill, Raymond L

2018-02-01

The clinicopathologic features of pediatric melanoma are distinct from those of the adult counterpart. For example, most childhood melanomas exhibit a uniquely favorable biologic behavior, save for those arising in large/giant congenital nevi. Recent studies suggest that the characteristically favorable biologic behavior of childhood melanoma may be related to extreme telomere shortening and dysfunction in the cancer cells. Herein, we review the genomic profiles that have been defined for the different subtypes of pediatric melanoma and particularly emphasize the potential prognostic value of telomerase reverse transcriptase alterations for these tumors. © 2017 Wiley Periodicals, Inc.
Using Genome-Wide Expression Profiling to Define Gene Networks Relevant to the Study of Complex Traits: From RNA Integrity to Network Topology

PubMed Central

O'Brien, M.A.; Costin, B.N.; Miles, M.F.

2014-01-01

Postgenomic studies of the function of genes and their role in disease have now become an area of intense study since efforts to define the raw sequence material of the genome have largely been completed. The use of whole-genome approaches such as microarray expression profiling and, more recently, RNA-sequence analysis of transcript abundance has allowed an unprecedented look at the workings of the genome. However, the accurate derivation of such high-throughput data and their analysis in terms of biological function has been critical to truly leveraging the postgenomic revolution. This chapter will describe an approach that focuses on the use of gene networks to both organize and interpret genomic expression data. Such networks, derived from statistical analysis of large genomic datasets and the application of multiple bioinformatics data resources, poten-tially allow the identification of key control elements for networks associated with human disease, and thus may lead to derivation of novel therapeutic approaches. However, as discussed in this chapter, the leveraging of such networks cannot occur without a thorough understanding of the technical and statistical factors influencing the derivation of genomic expression data. Thus, while the catch phrase may be “it's the network … stupid,” the understanding of factors extending from RNA isolation to genomic profiling technique, multivariate statistics, and bioinformatics are all critical to defining fully useful gene networks for study of complex biology. PMID:23195313
Mapping Second Chromosome Mutations to Defined Genomic Regions in Drosophila melanogaster

PubMed Central

Kahsai, Lily; Cook, Kevin R.

2017-01-01

Hundreds of Drosophila melanogaster stocks are currently maintained at the Bloomington Drosophila Stock Center with mutations that have not been associated with sequence-defined genes. They have been preserved because they have interesting loss-of-function phenotypes. The experimental value of these mutations would be increased by tying them to specific genomic intervals so that geneticists can more easily associate them with annotated genes. Here, we report the mapping of 85 second chromosome complementation groups in the Bloomington collection to specific, small clusters of contiguous genes or individual genes in the sequenced genome. This information should prove valuable to Drosophila geneticists interested in processes associated with particular phenotypes and those searching for mutations affecting specific sequence-defined genes. PMID:29066472
Multiple genome sequences reveal adaptations of a phototrophic bacterium to sediment microenvironments.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Oda, Yasuhiro; Larimer, Frank W; Chain, Patrick S. G.

The bacterial genus Rhodopseudomonas is comprised of photosynthetic bacteria found widely distributed in aquatic sediments. Members of the genus catalyze hydrogen gas production, carbon dioxide sequestration, and biomass turnover. The genome sequence of Rhodopseudomonas palustris CGA009 revealed a surprising richness of metabolic versatility that would seem to explain its ability to live in a heterogeneous environment like sediment. However, there is considerable genotypic diversity among Rhodopseudomonas isolates. Here we report the complete genome sequences of four additional members of the genus isolated from a restricted geographical area. The sequences confirm that the isolates belong to a coherent taxonomic unit, butmore » they also have significant differences. Whole genome alignments show that the circular chromosomes of the isolates consist of a collinear backbone with a moderate number of genomic rearrangements that impact local gene order and orientation. There are 3,319 genes, 70% of the genes in each genome, shared by four or more strains. Between 10% and 18% of the genes in each genome are strain specific. Some of these genes suggest specialized physiological traits, which we verified experimentally, that include expanded light harvesting, oxygen respiration, and nitrogen fixation capabilities, as well as anaerobic fermentation. Strain-specific adaptations include traits that may be useful in bioenergy applications. This work suggests that against a backdrop of metabolic versatility that is a defining characteristic of Rhodopseudomonas, different ecotypes have evolved to take advantage of physical and chemical conditions in sediment microenvironments that are too small for human observation.« less
Phylogeographic diversity and mosaicism of the Helicobacter pylori tfs integrative and conjugative elements.

PubMed

Delahay, Robin M; Croxall, Nicola J; Stephens, Amberley D

2018-01-01

The genome of the gastric pathogen Helicobacter pylori is characterised by considerable variation of both gene sequence and content, much of which is contained within three large genomic islands comprising the cag pathogenicity island ( cag PAI) and two mobile integrative and conjugative elements (ICEs) termed tfs3 and tfs4 . All three islands are implicated as virulence factors, although whereas the cag PAI is well characterised, understanding of how the tfs elements influence H. pylori interactions with different human hosts is significantly confounded by limited definition of their distribution, diversity and structural representation in the global H. pylori population. To gain a global perspective of tfs ICE population dynamics we established a bioinformatics workflow to extract and precisely define the full tfs pan-gene content contained within a global collection of 221 draft and complete H. pylori genome sequences. Complete (ca. 35-55kbp) and remnant tfs ICE clusters were reconstructed from a dataset comprising > 12,000 genes, from which orthologous gene complements and distinct alleles descriptive of different tfs ICE types were defined and classified in comparative analyses. The genetic variation within defined ICE modular segments was subsequently used to provide a complete description of tfs ICE diversity and a comprehensive assessment of their phylogeographic context. Our further examination of the apparent ICE modular types identified an ancient and complex history of ICE residence, mobility and interaction within particular H. pylori phylogeographic lineages and further, provided evidence of both contemporary inter-lineage and inter-species ICE transfer and displacement. Our collective results establish a clear view of tfs ICE diversity and phylogeographic representation in the global H. pylori population, and provide a robust contextual framework for elucidating the functional role of the tfs ICEs particularly as it relates to the risk of gastric disease associated with different tfs ICE genotypes.
Origin and Diversification of Basic-Helix-Loop-Helix Proteins in Plants

PubMed Central

Pires, Nuno; Dolan, Liam

2010-01-01

Basic helix-loop-helix (bHLH) proteins are a class of transcription factors found throughout eukaryotic organisms. Classification of the complete sets of bHLH proteins in the sequenced genomes of Arabidopsis thaliana and Oryza sativa (rice) has defined the diversity of these proteins among flowering plants. However, the evolutionary relationships of different plant bHLH groups and the diversity of bHLH proteins in more ancestral groups of plants are currently unknown. In this study, we use whole-genome sequences from nine species of land plants and algae to define the relationships between these proteins in plants. We show that few (less than 5) bHLH proteins are encoded in the genomes of chlorophytes and red algae. In contrast, many bHLH proteins (100–170) are encoded in the genomes of land plants (embryophytes). Phylogenetic analyses suggest that plant bHLH proteins are monophyletic and constitute 26 subfamilies. Twenty of these subfamilies existed in the common ancestors of extant mosses and vascular plants, whereas six further subfamilies evolved among the vascular plants. In addition to the conserved bHLH domains, most subfamilies are characterized by the presence of highly conserved short amino acid motifs. We conclude that much of the diversity of plant bHLH proteins was established in early land plants, over 440 million years ago. PMID:19942615
Wide Distribution of Mitochondrial Genome Rearrangements in Wild Strains of the Cultivated Basidiomycete Agrocybe aegerita

PubMed Central

Barroso, G.; Blesa, S.; Labarere, J.

1995-01-01

We used restriction fragment length polymorphisms to examine mitochondrial genome rearrangements in 36 wild strains of the cultivated basidiomycete Agrocybe aegerita, collected from widely distributed locations in Europe. We identified two polymorphic regions within the mitochondrial DNA which varied independently: one carrying the Cox II coding sequence and the other carrying the Cox I, ATP6, and ATP8 coding sequences. Two types of mutations were responsible for the restriction fragment length polymorphisms that we observed and, accordingly, were involved in the A. aegerita mitochondrial genome evolution: (i) point mutations, which resulted in strain-specific mitochondrial markers, and (ii) length mutations due to genome rearrangements, such as deletions, insertions, or duplications. Within each polymorphic region, the length differences defined only two mitochondrial types, suggesting that these length mutations were not randomly generated but resulted from a precise rearrangement mechanism. For each of the two polymorphic regions, the two molecular types were distributed among the 36 strains without obvious correlation with their geographic origin. On the basis of these two polymorphisms, it is possible to define four mitochondrial haplotypes. The four mitochondrial haplotypes could be the result of intermolecular recombination between allelic forms present in the population long enough to reach linkage equilibrium. All of the 36 dikaryotic strains contained only a single mitochondrial type, confirming the previously described mitochondrial sorting out after cytoplasmic mixing in basidiomycetes. PMID:16534984
Genomic V exons from whole genome shotgun data in reptiles.

PubMed

Olivieri, D N; von Haeften, B; Sánchez-Espinel, C; Faro, J; Gambón-Deza, F

2014-08-01

Reptiles and mammals diverged over 300 million years ago, creating two parallel evolutionary lineages amongst terrestrial vertebrates. In reptiles, two main evolutionary lines emerged: one gave rise to Squamata, while the other gave rise to Testudines, Crocodylia, and Aves. In this study, we determined the genomic variable (V) exons from whole genome shotgun sequencing (WGS) data in reptiles corresponding to the three main immunoglobulin (IG) loci and the four main T cell receptor (TR) loci. We show that Squamata lack the TRG and TRD genes, and snakes lack the IGKV genes. In representative species of Testudines and Crocodylia, the seven major IG and TR loci are maintained. As in mammals, genes of the IG loci can be grouped into well-defined IMGT clans through a multi-species phylogenetic analysis. We show that the reptilian IGHV and IGLV genes are distributed amongst the established mammalian clans, while their IGKV genes are found within a single clan, nearly exclusive from the mammalian sequences. The reptilian and mammalian TRAV genes cluster into six common evolutionary clades (since IMGT clans have not been defined for TR). In contrast, the reptilian TRBV genes cluster into three clades, which have few mammalian members. In this locus, the V exon sequences from mammals appear to have undergone different evolutionary diversification processes that occurred outside these shared reptilian clans. These sequences can be obtained in a freely available public repository (http://vgenerepertoire.org).
Decoding the genome with an integrative analysis tool: combinatorial CRM Decoder.

PubMed

Kang, Keunsoo; Kim, Joomyeong; Chung, Jae Hoon; Lee, Daeyoup

2011-09-01

The identification of genome-wide cis-regulatory modules (CRMs) and characterization of their associated epigenetic features are fundamental steps toward the understanding of gene regulatory networks. Although integrative analysis of available genome-wide information can provide new biological insights, the lack of novel methodologies has become a major bottleneck. Here, we present a comprehensive analysis tool called combinatorial CRM decoder (CCD), which utilizes the publicly available information to identify and characterize genome-wide CRMs in a species of interest. CCD first defines a set of the epigenetic features which is significantly associated with a set of known CRMs as a code called 'trace code', and subsequently uses the trace code to pinpoint putative CRMs throughout the genome. Using 61 genome-wide data sets obtained from 17 independent mouse studies, CCD successfully catalogued ∼12 600 CRMs (five distinct classes) including polycomb repressive complex 2 target sites as well as imprinting control regions. Interestingly, we discovered that ∼4% of the identified CRMs belong to at least two different classes named 'multi-functional CRM', suggesting their functional importance for regulating spatiotemporal gene expression. From these examples, we show that CCD can be applied to any potential genome-wide datasets and therefore will shed light on unveiling genome-wide CRMs in various species.
D-MATRIX: A web tool for constructing weight matrix of conserved DNA motifs

PubMed Central

Sen, Naresh; Mishra, Manoj; Khan, Feroz; Meena, Abha; Sharma, Ashok

2009-01-01

Despite considerable efforts to date, DNA motif prediction in whole genome remains a challenge for researchers. Currently the genome wide motif prediction tools required either direct pattern sequence (for single motif) or weight matrix (for multiple motifs). Although there are known motif pattern databases and tools for genome level prediction but no tool for weight matrix construction. Considering this, we developed a D-MATRIX tool which predicts the different types of weight matrix based on user defined aligned motif sequence set and motif width. For retrieval of known motif sequences user can access the commonly used databases such as TFD, RegulonDB, DBTBS, Transfac. DMATRIX program uses a simple statistical approach for weight matrix construction, which can be converted into different file formats according to user requirement. It provides the possibility to identify the conserved motifs in the coregulated genes or whole genome. As example, we successfully constructed the weight matrix of LexA transcription factor binding site with the help of known sosbox cisregulatory elements in Deinococcus radiodurans genome. The algorithm is implemented in C-Sharp and wrapped in ASP.Net to maintain a user friendly web interface. DMATRIX tool is accessible through the CIMAP domain network. Availability http://203.190.147.116/dmatrix/ PMID:19759861
D-MATRIX: a web tool for constructing weight matrix of conserved DNA motifs.

PubMed

Sen, Naresh; Mishra, Manoj; Khan, Feroz; Meena, Abha; Sharma, Ashok

2009-07-27

Despite considerable efforts to date, DNA motif prediction in whole genome remains a challenge for researchers. Currently the genome wide motif prediction tools required either direct pattern sequence (for single motif) or weight matrix (for multiple motifs). Although there are known motif pattern databases and tools for genome level prediction but no tool for weight matrix construction. Considering this, we developed a D-MATRIX tool which predicts the different types of weight matrix based on user defined aligned motif sequence set and motif width. For retrieval of known motif sequences user can access the commonly used databases such as TFD, RegulonDB, DBTBS, Transfac. D-MATRIX program uses a simple statistical approach for weight matrix construction, which can be converted into different file formats according to user requirement. It provides the possibility to identify the conserved motifs in the co-regulated genes or whole genome. As example, we successfully constructed the weight matrix of LexA transcription factor binding site with the help of known sos-box cis-regulatory elements in Deinococcus radiodurans genome. The algorithm is implemented in C-Sharp and wrapped in ASP.Net to maintain a user friendly web interface. D-MATRIX tool is accessible through the CIMAP domain network. http://203.190.147.116/dmatrix/
The Genetics of Symbiotic Nitrogen Fixation: Comparative Genomics of 14 Rhizobia Strains by Resolution of Protein Clusters

PubMed Central

Black, Michael; Moolhuijzen, Paula; Chapman, Brett; Barrero, Roberto; Howieson, John; Hungria, Mariangela; Bellgard, Matthew

2012-01-01

The symbiotic relationship between legumes and nitrogen fixing bacteria is critical for agriculture, as it may have profound impacts on lowering costs for farmers, on land sustainability, on soil quality, and on mitigation of greenhouse gas emissions. However, despite the importance of the symbioses to the global nitrogen cycling balance, very few rhizobial genomes have been sequenced so far, although there are some ongoing efforts in sequencing elite strains. In this study, the genomes of fourteen selected strains of the order Rhizobiales, all previously fully sequenced and annotated, were compared to assess differences between the strains and to investigate the feasibility of defining a core ‘symbiome’—the essential genes required by all rhizobia for nodulation and nitrogen fixation. Comparison of these whole genomes has revealed valuable information, such as several events of lateral gene transfer, particularly in the symbiotic plasmids and genomic islands that have contributed to a better understanding of the evolution of contrasting symbioses. Unique genes were also identified, as well as omissions of symbiotic genes that were expected to be found. Protein comparisons have also allowed the identification of a variety of similarities and differences in several groups of genes, including those involved in nodulation, nitrogen fixation, production of exopolysaccharides, Type I to Type VI secretion systems, among others, and identifying some key genes that could be related to host specificity and/or a better saprophytic ability. However, while several significant differences in the type and number of proteins were observed, the evidence presented suggests no simple core symbiome exists. A more abstract systems biology concept of nitrogen fixing symbiosis may be required. The results have also highlighted that comparative genomics represents a valuable tool for capturing specificities and generalities of each genome. PMID:24704847
Viral symbiosis and the holobiontic nature of the human genome.

PubMed

Ryan, Francis Patrick

2016-01-01

The human genome is a holobiontic union of the mammalian nuclear genome, the mitochondrial genome and large numbers of endogenized retroviral genomes. This article defines and explores this symbiogenetic pattern of evolution, looking at the implications for human genetics, epigenetics, embryogenesis, physiology and the pathogenesis of inborn errors of metabolism and many other diseases. © 2016 APMIS. Published by John Wiley & Sons Ltd.
Genomic features separating ten strains of Neorhizobium galegae with different symbiotic phenotypes.

PubMed

Österman, Janina; Mousavi, Seyed Abdollah; Koskinen, Patrik; Paulin, Lars; Lindström, Kristina

2015-05-02

The symbiotic phenotype of Neorhizobium galegae, with strains specifically fixing nitrogen with either Galega orientalis or G. officinalis, has made it a target in research on determinants of host specificity in nitrogen fixation. The genomic differences between representative strains of the two symbiovars are, however, relatively small. This introduced a need for a dataset representing a larger bacterial population in order to make better conclusions on characteristics typical for a subset of the species. In this study, we produced draft genomes of eight strains of N. galegae having different symbiotic phenotypes, both with regard to host specificity and nitrogen fixation efficiency. These genomes were analysed together with the previously published complete genomes of N. galegae strains HAMBI 540T and HAMBI 1141. The results showed that the presence of an additional rpoN sigma factor gene in the symbiosis gene region is a characteristic specific to symbiovar orientalis, required for nitrogen fixation. Also the nifQ gene was shown to be crucial for functional symbiosis in both symbiovars. Genome-wide analyses identified additional genes characteristic of strains of the same symbiovar and of strains having similar plant growth promoting properties on Galega orientalis. Many of these genes are involved in transcriptional regulation or in metabolic functions. The results of this study confirm that the only symbiosis-related gene that is present in one symbiovar of N. galegae but not in the other is an rpoN gene. The specific function of this gene remains to be determined, however. New genes that were identified as specific for strains of one symbiovar may be involved in determining host specificity, while others are defined as potential determinant genes for differences in efficiency of nitrogen fixation.
Regularized quantile regression for SNP marker estimation of pig growth curves.

PubMed

Barroso, L M A; Nascimento, M; Nascimento, A C C; Silva, F F; Serão, N V L; Cruz, C D; Resende, M D V; Silva, F L; Azevedo, C F; Lopes, P S; Guimarães, S E F

2017-01-01

Genomic growth curves are generally defined only in terms of population mean; an alternative approach that has not yet been exploited in genomic analyses of growth curves is the Quantile Regression (QR). This methodology allows for the estimation of marker effects at different levels of the variable of interest. We aimed to propose and evaluate a regularized quantile regression for SNP marker effect estimation of pig growth curves, as well as to identify the chromosome regions of the most relevant markers and to estimate the genetic individual weight trajectory over time (genomic growth curve) under different quantiles (levels). The regularized quantile regression (RQR) enabled the discovery, at different levels of interest (quantiles), of the most relevant markers allowing for the identification of QTL regions. We found the same relevant markers simultaneously affecting different growth curve parameters (mature weight and maturity rate): two (ALGA0096701 and ALGA0029483) for RQR(0.2), one (ALGA0096701) for RQR(0.5), and one (ALGA0003761) for RQR(0.8). Three average genomic growth curves were obtained and the behavior was explained by the curve in quantile 0.2, which differed from the others. RQR allowed for the construction of genomic growth curves, which is the key to identifying and selecting the most desirable animals for breeding purposes. Furthermore, the proposed model enabled us to find, at different levels of interest (quantiles), the most relevant markers for each trait (growth curve parameter estimates) and their respective chromosomal positions (identification of new QTL regions for growth curves in pigs). These markers can be exploited under the context of marker assisted selection while aiming to change the shape of pig growth curves.
Randomized Controlled Trials to Define Viral Load Thresholds for Cytomegalovirus Pre-Emptive Therapy

PubMed Central

Griffiths, Paul D.; Rothwell, Emily; Raza, Mohammed; Wilmore, Stephanie; Doyle, Tomas; Harber, Mark; O’Beirne, James; Mackinnon, Stephen; Jones, Gareth; Thorburn, Douglas; Mattes, Frank; Nebbia, Gaia; Atabani, Sowsan; Smith, Colette; Stanton, Anna; Emery, Vincent C.

2016-01-01

Background To help decide when to start and when to stop pre-emptive therapy for cytomegalovirus infection, we conducted two open-label randomized controlled trials in renal, liver and bone marrow transplant recipients in a single centre where pre-emptive therapy is indicated if viraemia exceeds 3000 genomes/ml (2520 IU/ml) of whole blood. Methods Patients with two consecutive viraemia episodes each below 3000 genomes/ml were randomized to continue monitoring or to immediate treatment (Part A). A separate group of patients with viral load greater than 3000 genomes/ml was randomized to stop pre-emptive therapy when two consecutive levels less than 200 genomes/ml (168 IU/ml) or less than 3000 genomes/ml were obtained (Part B). For both parts, the primary endpoint was the occurrence of a separate episode of viraemia requiring treatment because it was greater than 3000 genomes/ml. Results In Part A, the primary endpoint was not significantly different between the two arms; 18/32 (56%) in the monitor arm had viraemia greater than 3000 genomes/ml compared to 10/27 (37%) in the immediate treatment arm (p = 0.193). However, the time to developing an episode of viraemia greater than 3000 genomes/ml was significantly delayed among those randomized to immediate treatment (p = 0.022). In Part B, the primary endpoint was not significantly different between the two arms; 19/55 (35%) in the less than 200 genomes/ml arm subsequently had viraemia greater than 3000 genomes/ml compared to 23/51 (45%) among those randomized to stop treatment in the less than 3000 genomes/ml arm (p = 0.322). However, the duration of antiviral treatment was significantly shorter (p = 0.0012) in those randomized to stop treatment when viraemia was less than 3000 genomes/ml. Discussion The results illustrate that patients have continuing risks for CMV infection with limited time available for intervention. We see no need to alter current rules for stopping or starting pre-emptive therapy. PMID:27684379
PhyloFlu, a DNA microarray for determining the phylogenetic origin of influenza A virus gene segments and the genomic fingerprint of viral strains.

PubMed

Paulin, Luis F; de los D Soto-Del Río, María; Sánchez, Iván; Hernández, Jesús; Gutiérrez-Ríos, Rosa M; López-Martínez, Irma; Wong-Chew, Rosa M; Parissi-Crivelli, Aurora; Isa, P; López, Susana; Arias, Carlos F

2014-03-01

Recent evidence suggests that most influenza A virus gene segments can contribute to the pathogenicity of the virus. In this regard, the hemagglutinin (HA) subtype of the circulating strains has been closely surveyed, but the reassortment of internal gene segments is usually not monitored as a potential source of an increased pathogenicity. In this work, an oligonucleotide DNA microarray (PhyloFlu) designed to determine the phylogenetic origins of the eight segments of the influenza virus genome was constructed and validated. Clades were defined for each segment and also for the 16 HA and 9 neuraminidase (NA) subtypes. Viral genetic material was amplified by reverse transcription-PCR (RT-PCR) with primers specific to the conserved 5' and 3' ends of the influenza A virus genes, followed by PCR amplification with random primers and Cy3 labeling. The microarray unambiguously determined the clades for all eight influenza virus genes in 74% (28/38) of the samples. The microarray was validated with reference strains from different animal origins, as well as from human, swine, and avian viruses from field or clinical samples. In most cases, the phylogenetic clade of each segment defined its animal host of origin. The genomic fingerprint deduced by the combined information of the individual clades allowed for the determination of the time and place that strains with the same genomic pattern were previously reported. PhyloFlu is useful for characterizing and surveying the genetic diversity and variation of animal viruses circulating in different environmental niches and for obtaining a more detailed surveillance and follow up of reassortant events that can potentially modify virus pathogenicity.
LCGbase: A Comprehensive Database for Lineage-Based Co-regulated Genes.

PubMed

Wang, Dapeng; Zhang, Yubin; Fan, Zhonghua; Liu, Guiming; Yu, Jun

2012-01-01

Animal genes of different lineages, such as vertebrates and arthropods, are well-organized and blended into dynamic chromosomal structures that represent a primary regulatory mechanism for body development and cellular differentiation. The majority of genes in a genome are actually clustered, which are evolutionarily stable to different extents and biologically meaningful when evaluated among genomes within and across lineages. Until now, many questions concerning gene organization, such as what is the minimal number of genes in a cluster and what is the driving force leading to gene co-regulation, remain to be addressed. Here, we provide a user-friendly database-LCGbase (a comprehensive database for lineage-based co-regulated genes)-hosting information on evolutionary dynamics of gene clustering and ordering within animal kingdoms in two different lineages: vertebrates and arthropods. The database is constructed on a web-based Linux-Apache-MySQL-PHP framework and effective interactive user-inquiry service. Compared to other gene annotation databases with similar purposes, our database has three comprehensible advantages. First, our database is inclusive, including all high-quality genome assemblies of vertebrates and representative arthropod species. Second, it is human-centric since we map all gene clusters from other genomes in an order of lineage-ranks (such as primates, mammals, warm-blooded, and reptiles) onto human genome and start the database from well-defined gene pairs (a minimal cluster where the two adjacent genes are oriented as co-directional, convergent, and divergent pairs) to large gene clusters. Furthermore, users can search for any adjacent genes and their detailed annotations. Third, the database provides flexible parameter definitions, such as the distance of transcription start sites between two adjacent genes, which is extendable to genes that flanking the cluster across species. We also provide useful tools for sequence alignment, gene ontology (GO) annotation, promoter identification, gene expression (co-expression), and evolutionary analysis. This database not only provides a way to define lineage-specific and species-specific gene clusters but also facilitates future studies on gene co-regulation, epigenetic control of gene expression (DNA methylation and histone marks), and chromosomal structures in a context of gene clusters and species evolution. LCGbase is freely available at http://lcgbase.big.ac.cn/LCGbase.
Recurrence due to Relapse or Reinfection With Mycobacterium tuberculosis: A Whole-Genome Sequencing Approach in a Large, Population-Based Cohort With a High HIV Infection Prevalence and Active Follow-up

PubMed Central

Guerra-Assunção, José Afonso; Houben, Rein M. G. J.; Crampin, Amelia C.; Mzembe, Themba; Mallard, Kim; Coll, Francesc; Khan, Palwasha; Banda, Louis; Chiwaya, Arthur; Pereira, Rui P. A.; McNerney, Ruth; Harris, David; Parkhill, Julian; Clark, Taane G.; Glynn, Judith R.

2015-01-01

Background. Recurrent tuberculosis is a major health burden and may be due to relapse with the original strain or reinfection with a new strain. Methods. In a population-based study in northern Malawi, patients with tuberculosis diagnosed from 1996 to 2010 were actively followed after the end of treatment. Whole-genome sequencing with approximately 100-fold coverage was performed on all available cultures. Results of IS6110 restriction fragment-length polymorphism analyses were available for cultures performed up to 2008. Results. Based on our data, a difference of ≤10 single-nucleotide polymorphisms (SNPs) was used to define relapse, and a difference of >100 SNPs was used to define reinfection. There was no evidence of mixed infections among those classified as reinfections. Of 1471 patients, 139 had laboratory-confirmed recurrences: 55 had relapse, and 20 had reinfection; for 64 type of recurrence was unclassified. Almost all relapses occurred in the first 2 years. Human immunodeficiency virus infection was associated with reinfection but not relapse. Relapses were associated with isoniazid resistance, treatment before 2007, and lineage-3 strains. We identified several gene variants associated with relapse. Lineage-2 (Beijing) was overrepresented and lineage-1 underrepresented among the reinfecting strains (P = .004). Conclusions. While some of the factors determining recurrence depend on the patient and their treatment, differences in the Mycobacterium tuberculosis genome appear to have a role in both relapse and reinfection. PMID:25336729
Genetic Architecture of Lacunar Stroke.

PubMed

Traylor, Matthew; Bevan, Steve; Baron, Jean-Claude; Hassan, Ahamad; Lewis, Cathryn M; Markus, Hugh S

2015-09-01

Lacunar strokes comprise ≈20% of all strokes. Despite this frequency, their pathogenesis is poorly understood. Previous genome-wide association studies in lacunar stroke have been disappointing, which may be because of phenotypic heterogeneity. Pathological and radiological studies suggest that there may be different pathologies underlying lacunar strokes. This has led to the suggestion of 2 subtypes: isolated lacunar infarcts and multiple lacunar infarcts and leukoaraiosis. We performed genome-wide analyses in a magnetic resonance imaging-verified cohort of 1012 younger onset lacunar stroke cases and 964 controls. Using these data, we first estimated the heritability of lacunar stroke and its 2 hypothesized subtypes, and secondly, we determined whether this is enriched for regulatory regions in the genome, as defined by data from Encyclopedia of DNA Elements (ENCODE) and other sources. Finally, we determine the evidence for a polygenic contribution from rare variation to lacunar stroke and its subtypes. Our results indicate a substantial heritable component to magnetic resonance imaging-verified lacunar stroke (20%-25%) and its 2 subtypes (isolated lacunar infarct, 15%-18%; multiple lacunar infarcts/leukoaraiosis, 23%-28%). This heritable component is significantly enriched for sites affecting expression of genes. In addition, we show that the risk of the 2 subtypes of lacunar stroke in isolation, but not in combination, is associated with rare variation in the genome. Lacunar stroke, when defined on magnetic resonance imaging, is a highly heritable complex disease. Much of this heritability arises from regions of the genome affecting gene regulation. Rare variation affects 2 subtypes of lacunar in isolation, suggesting that they may have distinct genetic susceptibility factors. © 2015 The Authors.

Draft Genome Sequence, and a Sequence-Defined Genetic Linkage Map of the Legume Crop Species Lupinus angustifolius L

PubMed Central

Zheng, Zequn; Zhang, Qisen; Zhou, Gaofeng; Sweetingham, Mark W.; Howieson, John G.; Li, Chengdao

2013-01-01

Lupin (Lupinus angustifolius L.) is the most recently domesticated crop in major agricultural cultivation. Its seeds are high in protein and dietary fibre, but low in oil and starch. Medical and dietetic studies have shown that consuming lupin-enriched food has significant health benefits. We report the draft assembly from a whole genome shotgun sequencing dataset for this legume species with 26.9x coverage of the genome, which is predicted to contain 57,807 genes. Analysis of the annotated genes with metabolic pathways provided a partial understanding of some key features of lupin, such as the amino acid profile of storage proteins in seeds. Furthermore, we applied the NGS-based RAD-sequencing technology to obtain 8,244 sequence-defined markers for anchoring the genomic sequences. A total of 4,214 scaffolds from the genome sequence assembly were aligned into the genetic map. The combination of the draft assembly and a sequence-defined genetic map made it possible to locate and study functional genes of agronomic interest. The identification of co-segregating SNP markers, scaffold sequences and gene annotation facilitated the identification of a candidate R gene associated with resistance to the major lupin disease anthracnose. We demonstrated that the combination of medium-depth genome sequencing and a high-density genetic linkage map by application of NGS technology is a cost-effective approach to generating genome sequence data and a large number of molecular markers to study the genomics, genetics and functional genes of lupin, and to apply them to molecular plant breeding. This strategy does not require prior genome knowledge, which potentiates its application to a wide range of non-model species. PMID:23734219
Draft genome sequence, and a sequence-defined genetic linkage map of the legume crop species Lupinus angustifolius L.

PubMed

Yang, Huaan; Tao, Ye; Zheng, Zequn; Zhang, Qisen; Zhou, Gaofeng; Sweetingham, Mark W; Howieson, John G; Li, Chengdao

2013-01-01

Lupin (Lupinus angustifolius L.) is the most recently domesticated crop in major agricultural cultivation. Its seeds are high in protein and dietary fibre, but low in oil and starch. Medical and dietetic studies have shown that consuming lupin-enriched food has significant health benefits. We report the draft assembly from a whole genome shotgun sequencing dataset for this legume species with 26.9x coverage of the genome, which is predicted to contain 57,807 genes. Analysis of the annotated genes with metabolic pathways provided a partial understanding of some key features of lupin, such as the amino acid profile of storage proteins in seeds. Furthermore, we applied the NGS-based RAD-sequencing technology to obtain 8,244 sequence-defined markers for anchoring the genomic sequences. A total of 4,214 scaffolds from the genome sequence assembly were aligned into the genetic map. The combination of the draft assembly and a sequence-defined genetic map made it possible to locate and study functional genes of agronomic interest. The identification of co-segregating SNP markers, scaffold sequences and gene annotation facilitated the identification of a candidate R gene associated with resistance to the major lupin disease anthracnose. We demonstrated that the combination of medium-depth genome sequencing and a high-density genetic linkage map by application of NGS technology is a cost-effective approach to generating genome sequence data and a large number of molecular markers to study the genomics, genetics and functional genes of lupin, and to apply them to molecular plant breeding. This strategy does not require prior genome knowledge, which potentiates its application to a wide range of non-model species.
Genome scale engineering techniques for metabolic engineering.

PubMed

Liu, Rongming; Bassalo, Marcelo C; Zeitoun, Ramsey I; Gill, Ryan T

2015-11-01

Metabolic engineering has expanded from a focus on designs requiring a small number of genetic modifications to increasingly complex designs driven by advances in genome-scale engineering technologies. Metabolic engineering has been generally defined by the use of iterative cycles of rational genome modifications, strain analysis and characterization, and a synthesis step that fuels additional hypothesis generation. This cycle mirrors the Design-Build-Test-Learn cycle followed throughout various engineering fields that has recently become a defining aspect of synthetic biology. This review will attempt to summarize recent genome-scale design, build, test, and learn technologies and relate their use to a range of metabolic engineering applications. Copyright © 2015 International Metabolic Engineering Society. Published by Elsevier Inc. All rights reserved.
Sauropod dinosaurs evolved moderately sized genomes unrelated to body size

PubMed Central

Organ, Chris L.; Brusatte, Stephen L.; Stein, Koen

2009-01-01

Sauropodomorph dinosaurs include the largest land animals to have ever lived, some reaching up to 10 times the mass of an African elephant. Despite their status defining the upper range for body size in land animals, it remains unknown whether sauropodomorphs evolved larger-sized genomes than non-avian theropods, their sister taxon, or whether a relationship exists between genome size and body size in dinosaurs, two questions critical for understanding broad patterns of genome evolution in dinosaurs. Here we report inferences of genome size for 10 sauropodomorph taxa. The estimates are derived from a Bayesian phylogenetic generalized least squares approach that generates posterior distributions of regression models relating genome size to osteocyte lacunae volume in extant tetrapods. We estimate that the average genome size of sauropodomorphs was 2.02 pg (range of species means: 1.77–2.21 pg), a value in the upper range of extant birds (mean = 1.42 pg, range: 0.97–2.16 pg) and near the average for extant non-avian reptiles (mean = 2.24 pg, range: 1.05–5.44 pg). The results suggest that the variation in size and architecture of genomes in extinct dinosaurs was lower than the variation found in mammals. A substantial difference in genome size separates the two major clades within dinosaurs, Ornithischia (large genomes) and Saurischia (moderate to small genomes). We find no relationship between body size and estimated genome size in extinct dinosaurs, which suggests that neutral forces did not dominate the evolution of genome size in this group. PMID:19793755
Analyses of mitochondrial amino acid sequence datasets support the proposal that specimens of Hypodontus macropi from three species of macropodid hosts represent distinct species

PubMed Central

2013-01-01

Background Hypodontus macropi is a common intestinal nematode of a range of kangaroos and wallabies (macropodid marsupials). Based on previous multilocus enzyme electrophoresis (MEE) and nuclear ribosomal DNA sequence data sets, H. macropi has been proposed to be complex of species. To test this proposal using independent molecular data, we sequenced the whole mitochondrial (mt) genomes of individuals of H. macropi from three different species of hosts (Macropus robustus robustus, Thylogale billardierii and Macropus [Wallabia] bicolor) as well as that of Macropicola ocydromi (a related nematode), and undertook a comparative analysis of the amino acid sequence datasets derived from these genomes. Results The mt genomes sequenced by next-generation (454) technology from H. macropi from the three host species varied from 13,634 bp to 13,699 bp in size. Pairwise comparisons of the amino acid sequences predicted from these three mt genomes revealed differences of 5.8% to 18%. Phylogenetic analysis of the amino acid sequence data sets using Bayesian Inference (BI) showed that H. macropi from the three different host species formed distinct, well-supported clades. In addition, sliding window analysis of the mt genomes defined variable regions for future population genetic studies of H. macropi in different macropodid hosts and geographical regions around Australia. Conclusions The present analyses of inferred mt protein sequence datasets clearly supported the hypothesis that H. macropi from M. robustus robustus, M. bicolor and T. billardierii represent distinct species. PMID:24261823
Reptiles and mammals have differentially retained long conserved noncoding sequences from the amniote ancestor.

PubMed

Janes, D E; Chapus, C; Gondo, Y; Clayton, D F; Sinha, S; Blatti, C A; Organ, C L; Fujita, M K; Balakrishnan, C N; Edwards, S V

2011-01-01

Many noncoding regions of genomes appear to be essential to genome function. Conservation of large numbers of noncoding sequences has been reported repeatedly among mammals but not thus far among birds and reptiles. By searching genomes of chicken (Gallus gallus), zebra finch (Taeniopygia guttata), and green anole (Anolis carolinensis), we quantified the conservation among birds and reptiles and across amniotes of long, conserved noncoding sequences (LCNS), which we define as sequences ≥500 bp in length and exhibiting ≥95% similarity between species. We found 4,294 LCNS shared between chicken and zebra finch and 574 LCNS shared by the two birds and Anolis. The percent of genomes comprised by LCNS in the two birds (0.0024%) is notably higher than the percent in mammals (<0.0003% to <0.001%), differences that we show may be explained in part by differences in genome-wide substitution rates. We reconstruct a large number of LCNS for the amniote ancestor (ca. 8,630) and hypothesize differential loss and substantial turnover of these sites in descendent lineages. By contrast, we estimated a small role for recruitment of LCNS via acquisition of novel functions over time. Across amniotes, LCNS are significantly enriched with transcription factor binding sites for many developmental genes, and 2.9% of LCNS shared between the two birds show evidence of expression in brain expressed sequence tag databases. These results show that the rate of retention of LCNS from the amniote ancestor differs between mammals and Reptilia (including birds) and that this may reflect differing roles and constraints in gene regulation.
Reptiles and Mammals Have Differentially Retained Long Conserved Noncoding Sequences from the Amniote Ancestor

PubMed Central

Janes, D.E.; Chapus, C.; Gondo, Y.; Clayton, D.F.; Sinha, S.; Blatti, C.A.; Organ, C.L.; Fujita, M.K.; Balakrishnan, C.N.; Edwards, S.V.

2010-01-01

Many noncoding regions of genomes appear to be essential to genome function. Conservation of large numbers of noncoding sequences has been reported repeatedly among mammals but not thus far among birds and reptiles. By searching genomes of chicken (Gallus gallus), zebra finch (Taeniopygia guttata), and green anole (Anolis carolinensis), we quantified the conservation among birds and reptiles and across amniotes of long, conserved noncoding sequences (LCNS), which we define as sequences ≥500 bp in length and exhibiting ≥95% similarity between species. We found 4,294 LCNS shared between chicken and zebra finch and 574 LCNS shared by the two birds and Anolis. The percent of genomes comprised by LCNS in the two birds (0.0024%) is notably higher than the percent in mammals (<0.0003% to <0.001%), differences that we show may be explained in part by differences in genome-wide substitution rates. We reconstruct a large number of LCNS for the amniote ancestor (ca. 8,630) and hypothesize differential loss and substantial turnover of these sites in descendent lineages. By contrast, we estimated a small role for recruitment of LCNS via acquisition of novel functions over time. Across amniotes, LCNS are significantly enriched with transcription factor binding sites for many developmental genes, and 2.9% of LCNS shared between the two birds show evidence of expression in brain expressed sequence tag databases. These results show that the rate of retention of LCNS from the amniote ancestor differs between mammals and Reptilia (including birds) and that this may reflect differing roles and constraints in gene regulation. PMID:21183607
Arthropod genomic resources for the 21st century

USDA-ARS?s Scientific Manuscript database

Genome references are foundational for high quality entomological research today. Species, sub populations and taxonomy are defined by gene flow and genome sequences. Gene content in arthropods is often directly reflective of life history, for example, diet and symbiont related gene loss is observed...
Genomics of Immune Diseases and New Therapies.

PubMed

Lenardo, Michael; Lo, Bernice; Lucas, Carrie L

2016-05-20

Genomic DNA sequencing technologies have been one of the great advances of the 21st century, having decreased in cost by seven orders of magnitude and opening up new fields of investigation throughout research and clinical medicine. Genomics coupled with biochemical investigation has allowed the molecular definition of a growing number of new genetic diseases that reveal new concepts of immune regulation. Also, defining the genetic pathogenesis of these diseases has led to improved diagnosis, prognosis, genetic counseling, and, most importantly, new therapies. We highlight the investigational journey from patient phenotype to treatment using the newly defined XMEN disease, caused by the genetic loss of the MAGT1 magnesium transporter, as an example. This disease illustrates how genomics yields new fundamental immunoregulatory insights as well as how research genomics is integrated into clinical immunology. At the end, we discuss two other recently described diseases, CHAI/LATAIE (CTLA-4 deficiency) and PASLI (PI3K dysregulation), as additional examples of the journey from unknown immunological diseases to new precision medicine treatments using genomics.
A system-level model for the microbial regulatory genome.

PubMed

Brooks, Aaron N; Reiss, David J; Allard, Antoine; Wu, Wei-Ju; Salvanha, Diego M; Plaisier, Christopher L; Chandrasekaran, Sriram; Pan, Min; Kaur, Amardeep; Baliga, Nitin S

2014-07-15

Microbes can tailor transcriptional responses to diverse environmental challenges despite having streamlined genomes and a limited number of regulators. Here, we present data-driven models that capture the dynamic interplay of the environment and genome-encoded regulatory programs of two types of prokaryotes: Escherichia coli (a bacterium) and Halobacterium salinarum (an archaeon). The models reveal how the genome-wide distributions of cis-acting gene regulatory elements and the conditional influences of transcription factors at each of those elements encode programs for eliciting a wide array of environment-specific responses. We demonstrate how these programs partition transcriptional regulation of genes within regulons and operons to re-organize gene-gene functional associations in each environment. The models capture fitness-relevant co-regulation by different transcriptional control mechanisms acting across the entire genome, to define a generalized, system-level organizing principle for prokaryotic gene regulatory networks that goes well beyond existing paradigms of gene regulation. An online resource (http://egrin2.systemsbiology.net) has been developed to facilitate multiscale exploration of conditional gene regulation in the two prokaryotes. © 2014 The Authors. Published under the terms of the CC BY 4.0 license.
Molecular cloning and physical mapping of the genome of fish lymphocystis disease virus.

PubMed

Darai, G; Delius, H; Clarke, J; Apfel, H; Schnitzler, P; Flügel, R M

1985-10-30

A defined and complete gene library of the fish lymphocystis disease virus (FLDV) genome was established. FLDV DNA was cleaved with EcoRI, BamHI, EcoRI/BamHI and EcoRI/HindIII and the resulting fragments were inserted into the corresponding sites of the pACYC184 or pAT153 plasmid vectors using T4 DNA ligase. Since FLDV DNA is highly methylated at CpG sequences (Darai et al., 1983; Wagner et al., 1985), an Escherichia coli GC-3 strain was required to amplify the recombinant plasmids harboring the FLDV DNA fragments. Bacterial colonies harboring recombinant plasmids were selected. All cloned fragments were individually identified by digestion of the recombinant plasmid DNA with different restriction enzymes and screened by hybridization of recombinant plasmid DNA to viral DNA. This analysis revealed that sequences representing 100% of the viral genome were cloned. Using these recombinant plasmids, the physical maps of the genome were constructed for BamHI, EcoRI, BestEII, and PstI restriction endonucleases. Although the FLDV genome is linear, due to circular permutation the restriction maps are circular.
National Science Foundation-sponsored workshop report. Draft plan for soybean genomics.

PubMed

Stacey, Gary; Vodkin, Lila; Parrott, Wayne A; Shoemaker, Randy C

2004-05-01

Recent efforts to coordinate and define a research strategy for soybean (Glycine max) genomics began with the establishment of a Soybean Genetics Executive Committee, which will serve as a communication focal point between the soybean research community and granting agencies. Secondly, a workshop was held to define a strategy to incorporate existing tools into a framework for advancing soybean genomics research. This workshop identified and ranked research priorities essential to making more informed decisions as to how to proceed with large scale sequencing and other genomics efforts. Most critical among these was the need to finalize a physical map and to obtain a better understanding of genome microstructure. Addressing these research needs will require pilot work on new technologies to demonstrate an ability to discriminate between recently duplicated regions in the soybean genome and pilot projects to analyze an adequate amount of random genome sequence to identify and catalog common repeats. The development of additional markers, reverse genetics tools, and bioinformatics is also necessary. Successful implementation of these goals will require close coordination among various working groups.
Segmental duplications: evolution and impact among the current Lepidoptera genomes.

PubMed

Zhao, Qian; Ma, Dongna; Vasseur, Liette; You, Minsheng

2017-07-06

Structural variation among genomes is now viewed to be as important as single nucleoid polymorphisms in influencing the phenotype and evolution of a species. Segmental duplication (SD) is defined as segments of DNA with homologous sequence. Here, we performed a systematic analysis of segmental duplications (SDs) among five lepidopteran reference genomes (Plutella xylostella, Danaus plexippus, Bombyx mori, Manduca sexta and Heliconius melpomene) to understand their potential impact on the evolution of these species. We find that the SDs content differed substantially among species, ranging from 1.2% of the genome in B. mori to 15.2% in H. melpomene. Most SDs formed very high identity (similarity higher than 90%) blocks but had very few large blocks. Comparative analysis showed that most of the SDs arose after the divergence of each linage and we found that P. xylostella and H. melpomene showed more duplications than other species, suggesting they might be able to tolerate extensive levels of variation in their genomes. Conserved ancestral and species specific SD events were assessed, revealing multiple examples of the gain, loss or maintenance of SDs over time. SDs content analysis showed that most of the genes embedded in SDs regions belonged to species-specific SDs ("Unique" SDs). Functional analysis of these genes suggested their potential roles in the lineage-specific evolution. SDs and flanking regions often contained transposable elements (TEs) and this association suggested some involvement in SDs formation. Further studies on comparison of gene expression level between SDs and non-SDs showed that the expression level of genes embedded in SDs was significantly lower, suggesting that structure changes in the genomes are involved in gene expression differences in species. The results showed that most of the SDs were "unique SDs", which originated after species formation. Functional analysis suggested that SDs might play different roles in different species. Our results provide a valuable resource beyond the genetic mutation to explore the genome structure for future Lepidoptera research.
Genomic Sequencing of Bordetella pertussis for Epidemiology and Global Surveillance of Whooping Cough.

PubMed

Bouchez, Valérie; Guglielmini, Julien; Dazas, Mélody; Landier, Annie; Toubiana, Julie; Guillot, Sophie; Criscuolo, Alexis; Brisse, Sylvain

2018-06-01

Bordetella pertussis causes whooping cough, a highly contagious respiratory disease that is reemerging in many world regions. The spread of antigen-deficient strains may threaten acellular vaccine efficacy. Dynamics of strain transmission are poorly defined because of shortcomings in current strain genotyping methods. Our objective was to develop a whole-genome genotyping strategy with sufficient resolution for local epidemiologic questions and sufficient reproducibility to enable international comparisons of clinical isolates. We defined a core genome multilocus sequence typing scheme comprising 2,038 loci and demonstrated its congruence with whole-genome single-nucleotide polymorphism variation. Most cases of intrafamilial groups of isolates or of multiple isolates recovered from the same patient were distinguished from temporally and geographically cocirculating isolates. However, epidemiologically unrelated isolates were sometimes nearly undistinguishable. We set up a publicly accessible core genome multilocus sequence typing database to enable global comparisons of B. pertussis isolates, opening the way for internationally coordinated surveillance.
Sequencing of a QTL-rich region of the Theobroma cacao genome using pooled BACs and the identification of trait specific candidate genes

USDA-ARS?s Scientific Manuscript database

Background: BAC-based physical maps provide for sequencing across an entire genome or selected sub-genome regions of biological interest. Using the minimum tiling path as a guide, it is possible to select specific BAC clones from prioritized genome sections such as a genetically defined QTL interv...
Physical and genetic-interaction density reveals functional organization and informs significance cutoffs in genome-wide screens

PubMed Central

Dittmar, John C.; Pierce, Steven; Rothstein, Rodney; Reid, Robert J. D.

2013-01-01

Genome-wide experiments often measure quantitative differences between treated and untreated cells to identify affected strains. For these studies, statistical models are typically used to determine significance cutoffs. We developed a method termed “CLIK” (Cutoff Linked to Interaction Knowledge) that overlays biological knowledge from the interactome on screen results to derive a cutoff. The method takes advantage of the fact that groups of functionally related interacting genes often respond similarly to experimental conditions and, thus, cluster in a ranked list of screen results. We applied CLIK analysis to five screens of the yeast gene disruption library and found that it defined a significance cutoff that differed from traditional statistics. Importantly, verification experiments revealed that the CLIK cutoff correlated with the position in the rank order where the rate of true positives drops off significantly. In addition, the gene sets defined by CLIK analysis often provide further biological perspectives. For example, applying CLIK analysis retrospectively to a screen for cisplatin sensitivity allowed us to identify the importance of the Hrq1 helicase in DNA crosslink repair. Furthermore, we demonstrate the utility of CLIK to determine optimal treatment conditions by analyzing genome-wide screens at multiple rapamycin concentrations. We show that CLIK is an extremely useful tool for evaluating screen quality, determining screen cutoffs, and comparing results between screens. Furthermore, because CLIK uses previously annotated interaction data to determine biologically informed cutoffs, it provides additional insights into screen results, which supplement traditional statistical approaches. PMID:23589890
Co-occurring genomic alterations define major subsets of KRAS-mutant lung adenocarcinoma with distinct biology, immune profiles, and therapeutic vulnerabilities.

PubMed

Skoulidis, Ferdinandos; Byers, Lauren A; Diao, Lixia; Papadimitrakopoulou, Vassiliki A; Tong, Pan; Izzo, Julie; Behrens, Carmen; Kadara, Humam; Parra, Edwin R; Canales, Jaime Rodriguez; Zhang, Jianjun; Giri, Uma; Gudikote, Jayanthi; Cortez, Maria A; Yang, Chao; Fan, Youhong; Peyton, Michael; Girard, Luc; Coombes, Kevin R; Toniatti, Carlo; Heffernan, Timothy P; Choi, Murim; Frampton, Garrett M; Miller, Vincent; Weinstein, John N; Herbst, Roy S; Wong, Kwok-Kin; Zhang, Jianhua; Sharma, Padmanee; Mills, Gordon B; Hong, Waun K; Minna, John D; Allison, James P; Futreal, Andrew; Wang, Jing; Wistuba, Ignacio I; Heymach, John V

2015-08-01

The molecular underpinnings that drive the heterogeneity of KRAS-mutant lung adenocarcinoma are poorly characterized. We performed an integrative analysis of genomic, transcriptomic, and proteomic data from early-stage and chemorefractory lung adenocarcinoma and identified three robust subsets of KRAS-mutant lung adenocarcinoma dominated, respectively, by co-occurring genetic events in STK11/LKB1 (the KL subgroup), TP53 (KP), and CDKN2A/B inactivation coupled with low expression of the NKX2-1 (TTF1) transcription factor (KC). We further revealed biologically and therapeutically relevant differences between the subgroups. KC tumors frequently exhibited mucinous histology and suppressed mTORC1 signaling. KL tumors had high rates of KEAP1 mutational inactivation and expressed lower levels of immune markers, including PD-L1. KP tumors demonstrated higher levels of somatic mutations, inflammatory markers, immune checkpoint effector molecules, and improved relapse-free survival. Differences in drug sensitivity patterns were also observed; notably, KL cells showed increased vulnerability to HSP90-inhibitor therapy. This work provides evidence that co-occurring genomic alterations identify subgroups of KRAS-mutant lung adenocarcinoma with distinct biology and therapeutic vulnerabilities. Co-occurring genetic alterations in STK11/LKB1, TP53, and CDKN2A/B-the latter coupled with low TTF1 expression-define three major subgroups of KRAS-mutant lung adenocarcinoma with distinct biology, patterns of immune-system engagement, and therapeutic vulnerabilities. ©2015 American Association for Cancer Research.
KinFin: Software for Taxon-Aware Analysis of Clustered Protein Sequences.

PubMed

Laetsch, Dominik R; Blaxter, Mark L

2017-10-05

The field of comparative genomics is concerned with the study of similarities and differences between the information encoded in the genomes of organisms. A common approach is to define gene families by clustering protein sequences based on sequence similarity, and analyze protein cluster presence and absence in different species groups as a guide to biology. Due to the high dimensionality of these data, downstream analysis of protein clusters inferred from large numbers of species, or species with many genes, is nontrivial, and few solutions exist for transparent, reproducible, and customizable analyses. We present KinFin, a streamlined software solution capable of integrating data from common file formats and delivering aggregative annotation of protein clusters. KinFin delivers analyses based on systematic taxonomy of the species analyzed, or on user-defined, groupings of taxa, for example, sets based on attributes such as life history traits, organismal phenotypes, or competing phylogenetic hypotheses. Results are reported through graphical and detailed text output files. We illustrate the utility of the KinFin pipeline by addressing questions regarding the biology of filarial nematodes, which include parasites of veterinary and medical importance. We resolve the phylogenetic relationships between the species and explore functional annotation of proteins in clusters in key lineages and between custom taxon sets, identifying gene families of interest. KinFin can easily be integrated into existing comparative genomic workflows, and promotes transparent and reproducible analysis of clustered protein data. Copyright © 2017 Laetsch and Blaxter.
Ecophysiology of Freshwater Verrucomicrobia Inferred from Metagenome-Assembled Genomes

PubMed Central

He, Shaomei; Stevens, Sarah L. R.; Chan, Leong-Keat; Bertilsson, Stefan; Glavina del Rio, Tijana; Tringe, Susannah G.; Malmstrom, Rex R.

2017-01-01

ABSTRACT Microbes are critical in carbon and nutrient cycling in freshwater ecosystems. Members of the Verrucomicrobia are ubiquitous in such systems, and yet their roles and ecophysiology are not well understood. In this study, we recovered 19 Verrucomicrobia draft genomes by sequencing 184 time-series metagenomes from a eutrophic lake and a humic bog that differ in carbon source and nutrient availabilities. These genomes span four of the seven previously defined Verrucomicrobia subdivisions and greatly expand knowledge of the genomic diversity of freshwater Verrucomicrobia. Genome analysis revealed their potential role as (poly)saccharide degraders in freshwater, uncovered interesting genomic features for this lifestyle, and suggested their adaptation to nutrient availabilities in their environments. Verrucomicrobia populations differ significantly between the two lakes in glycoside hydrolase gene abundance and functional profiles, reflecting the autochthonous and terrestrially derived allochthonous carbon sources of the two ecosystems, respectively. Interestingly, a number of genomes recovered from the bog contained gene clusters that potentially encode a novel porin-multiheme cytochrome c complex and might be involved in extracellular electron transfer in the anoxic humus-rich environment. Notably, most epilimnion genomes have large numbers of so-called “Planctomycete-specific” cytochrome c-encoding genes, which exhibited distribution patterns nearly opposite to those seen with glycoside hydrolase genes, probably associated with the different levels of environmental oxygen availability and carbohydrate complexity between lakes/layers. Overall, the recovered genomes represent a major step toward understanding the role, ecophysiology, and distribution of Verrucomicrobia in freshwater. IMPORTANCE Freshwater Verrucomicrobia spp. are cosmopolitan in lakes and rivers, and yet their roles and ecophysiology are not well understood, as cultured freshwater Verrucomicrobia spp. are restricted to one subdivision of this phylum. Here, we greatly expanded the known genomic diversity of this freshwater lineage by recovering 19 Verrucomicrobia draft genomes from 184 metagenomes collected from a eutrophic lake and a humic bog across multiple years. Most of these genomes represent the first freshwater representatives of several Verrucomicrobia subdivisions. Genomic analysis revealed Verrucomicrobia to be potential (poly)saccharide degraders and suggested their adaptation to carbon sources of different origins in the two contrasting ecosystems. We identified putative extracellular electron transfer genes and so-called “Planctomycete-specific” cytochrome c-encoding genes and identified their distinct distribution patterns between the lakes/layers. Overall, our analysis greatly advances the understanding of the function, ecophysiology, and distribution of freshwater Verrucomicrobia, while highlighting their potential role in freshwater carbon cycling. PMID:28959738
Analysis of Multiallelic CNVs by Emulsion Haplotype Fusion PCR.

PubMed

Tyson, Jess; Armour, John A L

2017-01-01

Emulsion-fusion PCR recovers long-range sequence information by combining products in cis from individual genomic DNA molecules. Emulsion droplets act as very numerous small reaction chambers in which different PCR products from a single genomic DNA molecule are condensed into short joint products, to unite sequences in cis from widely separated genomic sites. These products can therefore provide information about the arrangement of sequences and variants at a larger scale than established long-read sequencing methods. The method has been useful in defining the phase of variants in haplotypes, the typing of inversions, and determining the configuration of sequence variants in multiallelic CNVs. In this description we outline the rationale for the application of emulsion-fusion PCR methods to the analysis of multiallelic CNVs, and give practical details for our own implementation of the method in that context.

Complete Genome Sequence of Germline Chromosomally Integrated Human Herpesvirus 6A and Analyses Integration Sites Define a New Human Endogenous Virus with Potential to Reactivate as an Emerging Infection.

PubMed

Tweedy, Joshua; Spyrou, Maria Alexandra; Pearson, Max; Lassner, Dirk; Kuhl, Uwe; Gompels, Ursula A

2016-01-15

Human herpesvirus-6A and B (HHV-6A, HHV-6B) have recently defined endogenous genomes, resulting from integration into the germline: chromosomally-integrated "CiHHV-6A/B". These affect approximately 1.0% of human populations, giving potential for virus gene expression in every cell. We previously showed that CiHHV-6A was more divergent than CiHHV-6B by examining four genes in 44 European CiHHV-6A/B cardiac/haematology patients. There was evidence for gene expression/reactivation, implying functional non-defective genomes. To further define the relationship between HHV-6A and CiHHV-6A we used next-generation sequencing to characterize genomes from three CiHHV-6A cardiac patients. Comparisons to known exogenous HHV-6A showed CiHHV-6A genomes formed a separate clade; including all 85 non-interrupted genes and necessary cis-acting signals for reactivation as infectious virus. Greater single nucleotide polymorphism (SNP) density was defined in 16 genes and the direct repeats (DR) terminal regions. Using these SNPs, deep sequencing analyses demonstrated superinfection with exogenous HHV-6A in two of the CiHHV-6A patients with recurrent cardiac disease. Characterisation of the integration sites in twelve patients identified the human chromosome 17p subtelomere as a prevalent site, which had specific repeat structures and phylogenetically related CiHHV-6A coding sequences indicating common ancestral origins. Overall CiHHV-6A genomes were similar, but distinct from known exogenous HHV-6A virus, and have the capacity to reactivate as emerging virus infections.
Complete Genome Sequence of Germline Chromosomally Integrated Human Herpesvirus 6A and Analyses Integration Sites Define a New Human Endogenous Virus with Potential to Reactivate as an Emerging Infection

PubMed Central

Tweedy, Joshua; Spyrou, Maria Alexandra; Pearson, Max; Lassner, Dirk; Kuhl, Uwe; Gompels, Ursula A.

2016-01-01

Human herpesvirus-6A and B (HHV-6A, HHV-6B) have recently defined endogenous genomes, resulting from integration into the germline: chromosomally-integrated “CiHHV-6A/B”. These affect approximately 1.0% of human populations, giving potential for virus gene expression in every cell. We previously showed that CiHHV-6A was more divergent than CiHHV-6B by examining four genes in 44 European CiHHV-6A/B cardiac/haematology patients. There was evidence for gene expression/reactivation, implying functional non-defective genomes. To further define the relationship between HHV-6A and CiHHV-6A we used next-generation sequencing to characterize genomes from three CiHHV-6A cardiac patients. Comparisons to known exogenous HHV-6A showed CiHHV-6A genomes formed a separate clade; including all 85 non-interrupted genes and necessary cis-acting signals for reactivation as infectious virus. Greater single nucleotide polymorphism (SNP) density was defined in 16 genes and the direct repeats (DR) terminal regions. Using these SNPs, deep sequencing analyses demonstrated superinfection with exogenous HHV-6A in two of the CiHHV-6A patients with recurrent cardiac disease. Characterisation of the integration sites in twelve patients identified the human chromosome 17p subtelomere as a prevalent site, which had specific repeat structures and phylogenetically related CiHHV-6A coding sequences indicating common ancestral origins. Overall CiHHV-6A genomes were similar, but distinct from known exogenous HHV-6A virus, and have the capacity to reactivate as emerging virus infections. PMID:26784220
Genome-wide association mapping of qualitatively inherited traits in a germplasm collection

USDA-ARS?s Scientific Manuscript database

Genome-wide association (GWA) has been used as a tool for dissecting the genetic architecture of quantitatively inherited traits. We demonstrate here that GWA can also be highly useful for detecting the genomic locations of major genes governing categorically defined phenotype variants that exist fo...
Phylogenetic conservatism of thermal traits explains dispersal limitation and genomic differentiation of Streptomyces sister-taxa.

PubMed

Choudoir, Mallory J; Buckley, Daniel H

2018-06-07

The latitudinal diversity gradient is a pattern of biogeography observed broadly in plants and animals but largely undocumented in terrestrial microbial systems. Although patterns of microbial biogeography across broad taxonomic scales have been described in a range of contexts, the mechanisms that generate biogeographic patterns between closely related taxa remain incompletely characterized. Adaptive processes are a major driver of microbial biogeography, but there is less understanding of how microbial biogeography and diversification are shaped by dispersal limitation and drift. We recently described a latitudinal diversity gradient of species richness and intraspecific genetic diversity in Streptomyces by using a geographically explicit culture collection. Within this geographically explicit culture collection, we have identified Streptomyces sister-taxa whose geographic distribution is delimited by latitude. These sister-taxa differ in geographic distribution, genomic diversity, and ecological traits despite having nearly identical SSU rRNA gene sequences. Comparative genomic analysis reveals genomic differentiation of these sister-taxa consistent with restricted gene flow across latitude. Furthermore, we show phylogenetic conservatism of thermal traits between the sister-taxa suggesting that thermal trait adaptation limits dispersal and gene flow across climate regimes as defined by latitude. Such phylogenetic conservatism of thermal traits is commonly associated with latitudinal diversity gradients for plants and animals. These data provide further support for the hypothesis that the Streptomyces latitudinal diversity gradient was formed as a result of historical demographic processes defined by dispersal limitation and driven by paleoclimate dynamics.
Genome-Wide Transposon Mutagenesis in Pathogenic Leptospira Species▿ ‡

PubMed Central

Murray, Gerald L.; Morel, Viviane; Cerqueira, Gustavo M.; Croda, Julio; Srikram, Amporn; Henry, Rebekah; Ko, Albert I.; Dellagostin, Odir A.; Bulach, Dieter M.; Sermswan, Rasana W.; Adler, Ben; Picardeau, Mathieu

2009-01-01

Leptospira interrogans is the most common cause of leptospirosis in humans and animals. Genetic analysis of L. interrogans has been severely hindered by a lack of tools for genetic manipulation. Recently we developed the mariner-based transposon Himar1 to generate the first defined mutants in L. interrogans. In this study, a total of 929 independent transposon mutants were obtained and the location of insertion determined. Of these mutants, 721 were located in the protein coding regions of 551 different genes. While sequence analysis of transposon insertion sites indicated that transposition occurred in an essentially random fashion in the genome, 25 unique transposon mutants were found to exhibit insertions into genes encoding 16S or 23S rRNAs, suggesting these genes are insertional hot spots in the L. interrogans genome. In contrast, loci containing notionally essential genes involved in lipopolysaccharide and heme biosynthesis showed few transposon insertions. The effect of gene disruption on the virulence of a selected set of defined mutants was investigated using the hamster model of leptospirosis. Two attenuated mutants with disruptions in hypothetical genes were identified, thus validating the use of transposon mutagenesis for the identification of novel virulence factors in L. interrogans. This library provides a valuable resource for the study of gene function in L. interrogans. Combined with the genome sequences of L. interrogans, this provides an opportunity to investigate genes that contribute to pathogenesis and will provide a better understanding of the biology of L. interrogans. PMID:19047402
Genome-wide transposon mutagenesis in pathogenic Leptospira species.

PubMed

Murray, Gerald L; Morel, Viviane; Cerqueira, Gustavo M; Croda, Julio; Srikram, Amporn; Henry, Rebekah; Ko, Albert I; Dellagostin, Odir A; Bulach, Dieter M; Sermswan, Rasana W; Adler, Ben; Picardeau, Mathieu

2009-02-01

Leptospira interrogans is the most common cause of leptospirosis in humans and animals. Genetic analysis of L. interrogans has been severely hindered by a lack of tools for genetic manipulation. Recently we developed the mariner-based transposon Himar1 to generate the first defined mutants in L. interrogans. In this study, a total of 929 independent transposon mutants were obtained and the location of insertion determined. Of these mutants, 721 were located in the protein coding regions of 551 different genes. While sequence analysis of transposon insertion sites indicated that transposition occurred in an essentially random fashion in the genome, 25 unique transposon mutants were found to exhibit insertions into genes encoding 16S or 23S rRNAs, suggesting these genes are insertional hot spots in the L. interrogans genome. In contrast, loci containing notionally essential genes involved in lipopolysaccharide and heme biosynthesis showed few transposon insertions. The effect of gene disruption on the virulence of a selected set of defined mutants was investigated using the hamster model of leptospirosis. Two attenuated mutants with disruptions in hypothetical genes were identified, thus validating the use of transposon mutagenesis for the identification of novel virulence factors in L. interrogans. This library provides a valuable resource for the study of gene function in L. interrogans. Combined with the genome sequences of L. interrogans, this provides an opportunity to investigate genes that contribute to pathogenesis and will provide a better understanding of the biology of L. interrogans.
Genetic characterization of human herpesvirus type 1: Full-length genome sequence of strain obtained from an encephalitis case from India.

PubMed

Bondre, Vijay P; Sankararaman, Vasudha; Andhare, Vijaysinh; Tupekar, Manisha; Sapkal, Gajanan N

2016-11-01

Human herpes simplex virus 1 (HSV-1) is the most common cause of sporadic encephalitis in humans that contributes to >10 per cent of the encephalitis cases occurring worldwide. Availability of limited full genome sequences from a small number of isolates resulted in poor understanding of host and viral factors responsible for variable clinical outcome. In this study genetic relationship, extent and source of recombination using full-length genome sequence derived from a newly isolated HSV-1 isolate was studied in comparison with those sampled from patients with varied clinical outcome. Full genome sequence of HSV-1 isolated from cerebrospinal fluid (CSF) of a patient with acute encephalitis syndrome (AES) by inoculation in baby hamster kidney-21 (BHK-21) cells was determined using next-generation sequencing (NGS) technology. Phylogenetic analysis of the newly generated sequence in comparison with 33 additional full-length genomes defined genetic relationship with worldwide distributed strains. The bootscan and similarity plot analysis defined recombination crossovers and similarities between newly isolated Indian HSV-1 with six Asian and a total of 34 worldwide isolated strains. Mapping of 376,332 reads amplified from HSV-1 DNA by NGS generated full-length genome of 151,024 bp from newly isolated Indian HSV-1. Phylogenetic analysis classified worldwide distributed strains into three major evolutionary lineages correlating to their geographic distribution. Lineage 1 containing strains were isolated from America and Europe; lineage 2 contained all the strains from Asian countries along with the North American KOS and RE strains whereas the South African isolates were distributed into two groups under lineage 3. Recombination analysis confirmed events of recombination in Indian HSV-1 genome resulting from mixing of different strains evolved in Asian countries. Our results showed that the full-length genome sequence generated from an Indian HSV-1 isolate shared close genetic relationship with the American KOS and Chinese CR38 strains which belonged to the Asian genetic lineage. Recombination analysis of Indian isolate demonstrated multiple recombination crossover points throughout the genome. This full-length genome sequence amplified from the Indian isolate would be helpful to study HSV evolution, genetic basis of differential pathogenesis, host-virus interactions and viral factors contributing towards differential clinical outcome in human infections.
Genetic characterization of human herpesvirus type 1: Full-length genome sequence of strain obtained from an encephalitis case from India

PubMed Central

Bondre, Vijay P.; Sankararaman, Vasudha; Andhare, Vijaysinh; Tupekar, Manisha; Sapkal, Gajanan N.

2016-01-01

Background & objectives: Human herpes simplex virus 1 (HSV-1) is the most common cause of sporadic encephalitis in humans that contributes to >10 per cent of the encephalitis cases occurring worldwide. Availability of limited full genome sequences from a small number of isolates resulted in poor understanding of host and viral factors responsible for variable clinical outcome. In this study genetic relationship, extent and source of recombination using full-length genome sequence derived from a newly isolated HSV-1 isolate was studied in comparison with those sampled from patients with varied clinical outcome. Methods: Full genome sequence of HSV-1 isolated from cerebrospinal fluid (CSF) of a patient with acute encephalitis syndrome (AES) by inoculation in baby hamster kidney-21 (BHK-21) cells was determined using next-generation sequencing (NGS) technology. Phylogenetic analysis of the newly generated sequence in comparison with 33 additional full-length genomes defined genetic relationship with worldwide distributed strains. The bootscan and similarity plot analysis defined recombination crossovers and similarities between newly isolated Indian HSV-1 with six Asian and a total of 34 worldwide isolated strains. Results: Mapping of 376,332 reads amplified from HSV-1 DNA by NGS generated full-length genome of 151,024 bp from newly isolated Indian HSV-1. Phylogenetic analysis classified worldwide distributed strains into three major evolutionary lineages correlating to their geographic distribution. Lineage 1 containing strains were isolated from America and Europe; lineage 2 contained all the strains from Asian countries along with the North American KOS and RE strains whereas the South African isolates were distributed into two groups under lineage 3. Recombination analysis confirmed events of recombination in Indian HSV-1 genome resulting from mixing of different strains evolved in Asian countries. Interpretation & conclusions: Our results showed that the full-length genome sequence generated from an Indian HSV-1 isolate shared close genetic relationship with the American KOS and Chinese CR38 strains which belonged to the Asian genetic lineage. Recombination analysis of Indian isolate demonstrated multiple recombination crossover points throughout the genome. This full-length genome sequence amplified from the Indian isolate would be helpful to study HSV evolution, genetic basis of differential pathogenesis, host-virus interactions and viral factors contributing towards differential clinical outcome in human infections. PMID:28361829
YersiniaBase: a genomic resource and analysis platform for comparative analysis of Yersinia.

PubMed

Tan, Shi Yang; Dutta, Avirup; Jakubovics, Nicholas S; Ang, Mia Yang; Siow, Cheuk Chuen; Mutha, Naresh Vr; Heydari, Hamed; Wee, Wei Yee; Wong, Guat Jah; Choo, Siew Woh

2015-01-16

Yersinia is a Gram-negative bacteria that includes serious pathogens such as the Yersinia pestis, which causes plague, Yersinia pseudotuberculosis, Yersinia enterocolitica. The remaining species are generally considered non-pathogenic to humans, although there is evidence that at least some of these species can cause occasional infections using distinct mechanisms from the more pathogenic species. With the advances in sequencing technologies, many genomes of Yersinia have been sequenced. However, there is currently no specialized platform to hold the rapidly-growing Yersinia genomic data and to provide analysis tools particularly for comparative analyses, which are required to provide improved insights into their biology, evolution and pathogenicity. To facilitate the ongoing and future research of Yersinia, especially those generally considered non-pathogenic species, a well-defined repository and analysis platform is needed to hold the Yersinia genomic data and analysis tools for the Yersinia research community. Hence, we have developed the YersiniaBase, a robust and user-friendly Yersinia resource and analysis platform for the analysis of Yersinia genomic data. YersiniaBase has a total of twelve species and 232 genome sequences, of which the majority are Yersinia pestis. In order to smooth the process of searching genomic data in a large database, we implemented an Asynchronous JavaScript and XML (AJAX)-based real-time searching system in YersiniaBase. Besides incorporating existing tools, which include JavaScript-based genome browser (JBrowse) and Basic Local Alignment Search Tool (BLAST), YersiniaBase also has in-house developed tools: (1) Pairwise Genome Comparison tool (PGC) for comparing two user-selected genomes; (2) Pathogenomics Profiling Tool (PathoProT) for comparative pathogenomics analysis of Yersinia genomes; (3) YersiniaTree for constructing phylogenetic tree of Yersinia. We ran analyses based on the tools and genomic data in YersiniaBase and the preliminary results showed differences in virulence genes found in Yersinia pestis and Yersinia pseudotuberculosis compared to other Yersinia species, and differences between Yersinia enterocolitica subsp. enterocolitica and Yersinia enterocolitica subsp. palearctica. YersiniaBase offers free access to wide range of genomic data and analysis tools for the analysis of Yersinia. YersiniaBase can be accessed at http://yersinia.um.edu.my .
A comprehensively molecular haplotype-resolved genome of a European individual

PubMed Central

Suk, Eun-Kyung; McEwen, Gayle K.; Duitama, Jorge; Nowick, Katja; Schulz, Sabrina; Palczewski, Stefanie; Schreiber, Stefan; Holloway, Dustin T.; McLaughlin, Stephen; Peckham, Heather; Lee, Clarence; Huebsch, Thomas; Hoehe, Margret R.

2011-01-01

Independent determination of both haplotype sequences of an individual genome is essential to relate genetic variation to genome function, phenotype, and disease. To address the importance of phase, we have generated the most complete haplotype-resolved genome to date, “Max Planck One” (MP1), by fosmid pool-based next generation sequencing. Virtually all SNPs (>99%) and 80,000 indels were phased into haploid sequences of up to 6.3 Mb (N50 ∼1 Mb). The completeness of phasing allowed determination of the concrete molecular haplotype pairs for the vast majority of genes (81%) including potential regulatory sequences, of which >90% were found to be constituted by two different molecular forms. A subset of 159 genes with potentially severe mutations in either cis or trans configurations exemplified in particular the role of phase for gene function, disease, and clinical interpretation of personal genomes (e.g., BRCA1). Extended genomic regions harboring manifold combinations of physically and/or functionally related genes and regulatory elements were resolved into their underlying “haploid landscapes,” which may define the functional genome. Moreover, the majority of genes and functional sequences were found to contain individual or rare SNPs, which cannot be phased from population data alone, emphasizing the importance of molecular phasing for characterizing a genome in its molecular individuality. Our work provides the foundation to understand that the distinction of molecular haplotypes is essential to resolve the (inherently individual) biology of genes, genomes, and disease, establishing a reference point for “phase-sensitive” personal genomics. MP1's annotated haploid genomes are available as a public resource. PMID:21813624
Mycobacterial species as case-study of comparative genome analysis.

PubMed

Zakham, F; Belayachi, L; Ussery, D; Akrim, M; Benjouad, A; El Aouad, R; Ennaji, M M

2011-02-08

The genus Mycobacterium represents more than 120 species including important pathogens of human and cause major public health problems and illnesses. Further, with more than 100 genome sequences from this genus, comparative genome analysis can provide new insights for better understanding the evolutionary events of these species and improving drugs, vaccines, and diagnostics tools for controlling Mycobacterial diseases. In this present study we aim to outline a comparative genome analysis of fourteen Mycobacterial genomes: M. avium subsp. paratuberculosis K—10, M. bovis AF2122/97, M. bovis BCG str. Pasteur 1173P2, M. leprae Br4923, M. marinum M, M. sp. KMS, M. sp. MCS, M. tuberculosis CDC1551, M. tuberculosis F11, M. tuberculosis H37Ra, M. tuberculosis H37Rv, M. tuberculosis KZN 1435 , M. ulcerans Agy99,and M. vanbaalenii PYR—1, For this purpose a comparison has been done based on their length of genomes, GC content, number of genes in different data bases (Genbank, Refseq, and Prodigal). The BLAST matrix of these genomes has been figured to give a lot of information about the similarity between species in a simple scheme. As a result of multiple genome analysis, the pan and core genome have been defined for twelve Mycobacterial species. We have also introduced the genome atlas of the reference strain M. tuberculosis H37Rv which can give a good overview of this genome. And for examining the phylogenetic relationships among these bacteria, a phylogenic tree has been constructed from 16S rRNA gene for tuberculosis and non tuberculosis Mycobacteria to understand the evolutionary events of these species.
Genomic insights from whole genome sequencing of four clonal outbreak Campylobacter jejuni assessed within the global C. jejuni population.

PubMed

Clark, Clifford G; Berry, Chrystal; Walker, Matthew; Petkau, Aaron; Barker, Dillon O R; Guan, Cai; Reimer, Aleisha; Taboada, Eduardo N

2016-12-03

Whole genome sequencing (WGS) is useful for determining clusters of human cases, investigating outbreaks, and defining the population genetics of bacteria. It also provides information about other aspects of bacterial biology, including classical typing results, virulence, and adaptive strategies of the organism. Cell culture invasion and protein expression patterns of four related multilocus sequence type 21 (ST21) C. jejuni isolates from a significant Canadian water-borne outbreak were previously associated with the presence of a CJIE1 prophage. Whole genome sequencing was used to examine the genetic diversity among these isolates and confirm that previous observations could be attributed to differential prophage carriage. Moreover, we sought to determine the presence of genome sequences that could be used as surrogate markers to delineate outbreak-associated isolates. Differential carriage of the CJIE1 prophage was identified as the major genetic difference among the four outbreak isolates. High quality single-nucleotide variant (hqSNV) and core genome multilocus sequence typing (cgMLST) clustered these isolates within expanded datasets consisting of additional C. jejuni strains. The number and location of homopolymeric tract regions was identical in all four outbreak isolates but differed from all other C. jejuni examined. Comparative genomics and PCR amplification enabled the identification of large chromosomal inversions of approximately 93 kb and 388 kb within the outbreak isolates associated with transducer-like proteins containing long nucleotide repeat sequences. The 93-kb inversion was characteristic of the outbreak-associated isolates, and the gene content of this inverted region displayed high synteny with the reference strain. The four outbreak isolates were clonally derived and differed mainly in the presence of the CJIE1 prophage, validating earlier findings linking the prophage to phenotypic differences in virulence assays and protein expression. The identification of large, genetically syntenous chromosomal inversions in the genomes of outbreak-associated isolates provided a unique method for discriminating outbreak isolates from the background population. Transducer-like proteins appear to be associated with the chromosomal inversions. CgMLST and hqSNV analysis also effectively delineated the outbreak isolates within the larger C. jejuni population structure.
A rigorous approach to facilitate and guarantee the correctness of the genetic testing management in human genome information systems.

PubMed

Araújo, Luciano V; Malkowski, Simon; Braghetto, Kelly R; Passos-Bueno, Maria R; Zatz, Mayana; Pu, Calton; Ferreira, João E

2011-12-22

Recent medical and biological technology advances have stimulated the development of new testing systems that have been providing huge, varied amounts of molecular and clinical data. Growing data volumes pose significant challenges for information processing systems in research centers. Additionally, the routines of genomics laboratory are typically characterized by high parallelism in testing and constant procedure changes. This paper describes a formal approach to address this challenge through the implementation of a genetic testing management system applied to human genome laboratory. We introduced the Human Genome Research Center Information System (CEGH) in Brazil, a system that is able to support constant changes in human genome testing and can provide patients updated results based on the most recent and validated genetic knowledge. Our approach uses a common repository for process planning to ensure reusability, specification, instantiation, monitoring, and execution of processes, which are defined using a relational database and rigorous control flow specifications based on process algebra (ACP). The main difference between our approach and related works is that we were able to join two important aspects: 1) process scalability achieved through relational database implementation, and 2) correctness of processes using process algebra. Furthermore, the software allows end users to define genetic testing without requiring any knowledge about business process notation or process algebra. This paper presents the CEGH information system that is a Laboratory Information Management System (LIMS) based on a formal framework to support genetic testing management for Mendelian disorder studies. We have proved the feasibility and showed usability benefits of a rigorous approach that is able to specify, validate, and perform genetic testing using easy end user interfaces.
A rigorous approach to facilitate and guarantee the correctness of the genetic testing management in human genome information systems

PubMed Central

2011-01-01

Background Recent medical and biological technology advances have stimulated the development of new testing systems that have been providing huge, varied amounts of molecular and clinical data. Growing data volumes pose significant challenges for information processing systems in research centers. Additionally, the routines of genomics laboratory are typically characterized by high parallelism in testing and constant procedure changes. Results This paper describes a formal approach to address this challenge through the implementation of a genetic testing management system applied to human genome laboratory. We introduced the Human Genome Research Center Information System (CEGH) in Brazil, a system that is able to support constant changes in human genome testing and can provide patients updated results based on the most recent and validated genetic knowledge. Our approach uses a common repository for process planning to ensure reusability, specification, instantiation, monitoring, and execution of processes, which are defined using a relational database and rigorous control flow specifications based on process algebra (ACP). The main difference between our approach and related works is that we were able to join two important aspects: 1) process scalability achieved through relational database implementation, and 2) correctness of processes using process algebra. Furthermore, the software allows end users to define genetic testing without requiring any knowledge about business process notation or process algebra. Conclusions This paper presents the CEGH information system that is a Laboratory Information Management System (LIMS) based on a formal framework to support genetic testing management for Mendelian disorder studies. We have proved the feasibility and showed usability benefits of a rigorous approach that is able to specify, validate, and perform genetic testing using easy end user interfaces. PMID:22369688
Reconstructing the backbone of the Saccharomycotina yeast phylogeny using genome-scale data

DOE PAGES

Shen, Xing -Xing; Zhou, Xiaofan; Kominek, Jacek; ...

2016-09-26

Understanding the phylogenetic relationships among the yeasts of the subphylum Saccharomycotina is a prerequisite for understanding the evolution of their metabolisms and ecological lifestyles. In the last two decades, the use of rDNA and multilocus data sets has greatly advanced our understanding of the yeast phylogeny, but many deep relationships remain unsupported. In contrast, phylogenomic analyses have involved relatively few taxa and lineages that were often selected with limited considerations for covering the breadth of yeast biodiversity. Here we used genome sequence data from 86 publicly available yeast genomes representing nine of the 11 known major lineages and 10 nonyeastmore » fungal outgroups to generate a 1233-gene, 96-taxon data matrix. Species phylogenies reconstructed using two different methods (concatenation and coalescence) and two data matrices (amino acids or the first two codon positions) yielded identical and highly supported relationships between the nine major lineages. Aside from the lineage comprised by the family Pichiaceae, all other lineages were monophyletic. Most interrelationships among yeast species were robust across the two methods and data matrices. Furthermore, eight of the 93 internodes conflicted between analyses or data sets, including the placements of: the clade defined by species that have reassigned the CUG codon to encode serine, instead of leucine; the clade defined by a whole genome duplication; and the species Ascoidea rubescens. These phylogenomic analyses provide a robust roadmap for future comparative work across the yeast subphylum in the disciplines of taxonomy, molecular genetics, evolutionary biology, ecology, and biotechnology. To further this end, we have also provided a BLAST server to query the 86 Saccharomycotina genomes, which can be found at http://y1000plus.org/blast.« less
Genome-wide study of resistant hypertension identified from electronic health records.

PubMed

Dumitrescu, Logan; Ritchie, Marylyn D; Denny, Joshua C; El Rouby, Nihal M; McDonough, Caitrin W; Bradford, Yuki; Ramirez, Andrea H; Bielinski, Suzette J; Basford, Melissa A; Chai, High Seng; Peissig, Peggy; Carrell, David; Pathak, Jyotishman; Rasmussen, Luke V; Wang, Xiaoming; Pacheco, Jennifer A; Kho, Abel N; Hayes, M Geoffrey; Matsumoto, Martha; Smith, Maureen E; Li, Rongling; Cooper-DeHoff, Rhonda M; Kullo, Iftikhar J; Chute, Christopher G; Chisholm, Rex L; Jarvik, Gail P; Larson, Eric B; Carey, David; McCarty, Catherine A; Williams, Marc S; Roden, Dan M; Bottinger, Erwin; Johnson, Julie A; de Andrade, Mariza; Crawford, Dana C

2017-01-01

Resistant hypertension is defined as high blood pressure that remains above treatment goals in spite of the concurrent use of three antihypertensive agents from different classes. Despite the important health consequences of resistant hypertension, few studies of resistant hypertension have been conducted. To perform a genome-wide association study for resistant hypertension, we defined and identified cases of resistant hypertension and hypertensives with treated, controlled hypertension among >47,500 adults residing in the US linked to electronic health records (EHRs) and genotyped as part of the electronic MEdical Records & GEnomics (eMERGE) Network. Electronic selection logic using billing codes, laboratory values, text queries, and medication records was used to identify resistant hypertension cases and controls at each site, and a total of 3,006 cases of resistant hypertension and 876 controlled hypertensives were identified among eMERGE Phase I and II sites. After imputation and quality control, a total of 2,530,150 SNPs were tested for an association among 2,830 multi-ethnic cases of resistant hypertension and 876 controlled hypertensives. No test of association was genome-wide significant in the full dataset or in the dataset limited to European American cases (n = 1,719) and controls (n = 708). The most significant finding was CLNK rs13144136 at p = 1.00x10-6 (odds ratio = 0.68; 95% CI = 0.58-0.80) in the full dataset with similar results in the European American only dataset. We also examined whether SNPs known to influence blood pressure or hypertension also influenced resistant hypertension. None was significant after correction for multiple testing. These data highlight both the difficulties and the potential utility of EHR-linked genomic data to study clinically-relevant traits such as resistant hypertension.
Reconstructing the Backbone of the Saccharomycotina Yeast Phylogeny Using Genome-Scale Data

PubMed Central

Shen, Xing-Xing; Zhou, Xiaofan; Kominek, Jacek; Kurtzman, Cletus P.; Hittinger, Chris Todd; Rokas, Antonis

2016-01-01

Understanding the phylogenetic relationships among the yeasts of the subphylum Saccharomycotina is a prerequisite for understanding the evolution of their metabolisms and ecological lifestyles. In the last two decades, the use of rDNA and multilocus data sets has greatly advanced our understanding of the yeast phylogeny, but many deep relationships remain unsupported. In contrast, phylogenomic analyses have involved relatively few taxa and lineages that were often selected with limited considerations for covering the breadth of yeast biodiversity. Here we used genome sequence data from 86 publicly available yeast genomes representing nine of the 11 known major lineages and 10 nonyeast fungal outgroups to generate a 1233-gene, 96-taxon data matrix. Species phylogenies reconstructed using two different methods (concatenation and coalescence) and two data matrices (amino acids or the first two codon positions) yielded identical and highly supported relationships between the nine major lineages. Aside from the lineage comprised by the family Pichiaceae, all other lineages were monophyletic. Most interrelationships among yeast species were robust across the two methods and data matrices. However, eight of the 93 internodes conflicted between analyses or data sets, including the placements of: the clade defined by species that have reassigned the CUG codon to encode serine, instead of leucine; the clade defined by a whole genome duplication; and the species Ascoidea rubescens. These phylogenomic analyses provide a robust roadmap for future comparative work across the yeast subphylum in the disciplines of taxonomy, molecular genetics, evolutionary biology, ecology, and biotechnology. To further this end, we have also provided a BLAST server to query the 86 Saccharomycotina genomes, which can be found at http://y1000plus.org/blast. PMID:27672114
Reconstructing the backbone of the Saccharomycotina yeast phylogeny using genome-scale data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shen, Xing -Xing; Zhou, Xiaofan; Kominek, Jacek

Understanding the phylogenetic relationships among the yeasts of the subphylum Saccharomycotina is a prerequisite for understanding the evolution of their metabolisms and ecological lifestyles. In the last two decades, the use of rDNA and multilocus data sets has greatly advanced our understanding of the yeast phylogeny, but many deep relationships remain unsupported. In contrast, phylogenomic analyses have involved relatively few taxa and lineages that were often selected with limited considerations for covering the breadth of yeast biodiversity. Here we used genome sequence data from 86 publicly available yeast genomes representing nine of the 11 known major lineages and 10 nonyeastmore » fungal outgroups to generate a 1233-gene, 96-taxon data matrix. Species phylogenies reconstructed using two different methods (concatenation and coalescence) and two data matrices (amino acids or the first two codon positions) yielded identical and highly supported relationships between the nine major lineages. Aside from the lineage comprised by the family Pichiaceae, all other lineages were monophyletic. Most interrelationships among yeast species were robust across the two methods and data matrices. Furthermore, eight of the 93 internodes conflicted between analyses or data sets, including the placements of: the clade defined by species that have reassigned the CUG codon to encode serine, instead of leucine; the clade defined by a whole genome duplication; and the species Ascoidea rubescens. These phylogenomic analyses provide a robust roadmap for future comparative work across the yeast subphylum in the disciplines of taxonomy, molecular genetics, evolutionary biology, ecology, and biotechnology. To further this end, we have also provided a BLAST server to query the 86 Saccharomycotina genomes, which can be found at http://y1000plus.org/blast.« less
Understanding New Types of Evidence Ready for Translation into Nursing Informatics.

PubMed

McCormick, Kathleen

2016-01-01

Nurses are the primary deliverers of patient care and observers of patient side effects to medications. The primary objective of this tutorial is to bring the participants up to date in genomic applications for nursing from birth until death. A secondary objective is to define at least 17 pharmacogenomics evidence guidelines ready for implementation into the Electronic Health Record. The target audience are nurses in practice, implementers of EHRs, nursing in leadership and policy-making positions, those focused on defining new areas for nursing research, and educators who are in need of defining criteria for integrating genomics into nursing education.
A PCR method for the detection and differentiation of Lentinus edodes and Trametes versicolor in defined-mixed cultures used for wastewater treatment.

PubMed

García-Mena, Jaime; Cano-Ramirez, Claudia; Garibay-Orijel, Claudio; Ramirez-Canseco, Sergio; Poggi-Varaldo, Héctor M

2005-06-01

A PCR-based method for the quantitative detection of Lentinus edodes and Trametes versicolor, two ligninolytic fungi applied for wastewater treatment and bioremediation, was developed. Genomic DNA was used to optimize a PCR method targeting the conserved copper-binding sequence of laccase genes. The method allowed the quantitative detection and differentiation of these fungi in single and defined-mixed cultures after fractionation of the PCR products by electrophoresis in agarose gels. Amplified products of about 150 bp for L. edodes, and about 200 bp for T. versicolor were purified and cloned. The PCR method showed a linear detection response in the 1.0 microg-1 ng range. The same method was tested with genomic DNA from a third fungus (Phanerochaete chrysosporium), yielding a fragment of about 400 bp. Southern-blot and DNA sequence analysis indicated that a specific PCR product was amplified from each genome, and that these corresponded to sequences of laccase genes. This PCR protocol permits the detection and differentiation of three ligninolytic fungi by amplifying DNA fragments of different sizes using a single pair of primers, without further enzymatic restriction of the PCR products. This method has potential use in the monitoring, evaluation, and improvement of fungal cultures used in wastewater treatment processes.

Genome scanning of Amazonian Plasmodium falciparum shows subtelomeric instability and clindamycin-resistant parasites

PubMed Central

Dharia, Neekesh V.; Plouffe, David; Bopp, Selina E.R.; González-Páez, Gonzalo E.; Lucas, Carmen; Salas, Carola; Soberon, Valeria; Bursulaya, Badry; Kochel, Tadeusz J.; Bacon, David J.; Winzeler, Elizabeth A.

2010-01-01

Here, we fully characterize the genomes of 14 Plasmodium falciparum patient isolates taken recently from the Iquitos region using genome scanning, a microarray-based technique that delineates the majority of single-base changes, indels, and copy number variants distinguishing the coding regions of two clones. We show that the parasite population in the Peruvian Amazon bears a limited number of genotypes and low recombination frequencies. Despite the essentially clonal nature of some isolates, we see high frequencies of mutations in subtelomeric highly variable genes and internal var genes, indicating mutations arising during self-mating or mitotic replication. The data also reveal that one or two meioses separate different isolates, showing that P. falciparum clones isolated from different individuals in defined geographical regions could be useful in linkage analyses or quantitative trait locus studies. Through pairwise comparisons of different isolates we discovered point mutations in the apicoplast genome that are close to known mutations that confer clindamycin resistance in other species, but which were hitherto unknown in malaria parasites. Subsequent drug sensitivity testing revealed over 100-fold increase of clindamycin EC50 in strains harboring one of these mutations. This evidence of clindamycin-resistant parasites in the Amazon suggests that a shift should be made in health policy away from quinine + clindamycin therapy for malaria in pregnant women and infants, and that the development of new lincosamide antibiotics for malaria should be reconsidered. PMID:20829224
Genomics sequence analysis of the United States infectious laryngotracheitis vaccine strains chicken embryo origin (CEO) and tissue culture origin (TCO)

USDA-ARS?s Scientific Manuscript database

The genomic sequences of low and high passages of the United States infectious laryngotracheitis (ILT) vaccine strains CEO and TCO were determined using hybrid next generation sequencing in order to define genomic changes associated with attenuation and reversion to virulence. Phylogenetic analysis ...
Population structure in the model grass Brachypodium distachyon is highly correlated with flowering differences across broad geographic areas

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tyler, Ludmila; Lee, Scott J.; Young, Nelson D.

The small, annual grass Brachypodium distachyon (L.) Beauv., a close relative of wheat ( Triticum aestivum L.) and barley ( Hordeum vulgare L.), is a powerful model system for cereals and bioenergy grasses. Genome-wide association studies (GWAS) of natural variation can elucidate the genetic basis of complex traits but have been so far limited in B. distachyon by the lack of large numbers of well-characterized and sufficiently diverse accessions. Here, we report on genotyping-by-sequencing (GBS) of 84 B. distachyon, seven B. hybridum, and three B. stacei accessions with diverse geographic origins including Albania, Armenia, Georgia, Italy, Spain, and Turkey. Overmore » 90,000 high-quality single-nucleotide polymorphisms (SNPs) distributed across the Bd21 reference genome were identified. Our results confirm the hybrid nature of the B. hybridum genome, which appears as a mosaic of B. distachyon-like and B. stacei-like sequences. Analysis of more than 50,000 SNPs for the B. distachyon accessions revealed three distinct, genetically defined populations. Surprisingly, these genomic profiles are associated with differences in flowering time rather than with broad geographic origin. High levels of differentiation in loci associated with floral development support the differences in flowering phenology between B. distachyon populations. Genome-wide association studies combining genotypic and phenotypic data also suggest the presence of one or more photoperiodism, circadian clock, and vernalization genes in loci associated with flowering time variation within B. distachyon populations. As a result, our characterization elucidates genes underlying population differences, expands the germplasm resources available for Brachypodium, and illustrates the feasibility and limitations of GWAS in this model grass.« less
Population structure in the model grass Brachypodium distachyon is highly correlated with flowering differences across broad geographic areas

DOE PAGES

Tyler, Ludmila; Lee, Scott J.; Young, Nelson D.; ...

2016-04-29

The small, annual grass Brachypodium distachyon (L.) Beauv., a close relative of wheat ( Triticum aestivum L.) and barley ( Hordeum vulgare L.), is a powerful model system for cereals and bioenergy grasses. Genome-wide association studies (GWAS) of natural variation can elucidate the genetic basis of complex traits but have been so far limited in B. distachyon by the lack of large numbers of well-characterized and sufficiently diverse accessions. Here, we report on genotyping-by-sequencing (GBS) of 84 B. distachyon, seven B. hybridum, and three B. stacei accessions with diverse geographic origins including Albania, Armenia, Georgia, Italy, Spain, and Turkey. Overmore » 90,000 high-quality single-nucleotide polymorphisms (SNPs) distributed across the Bd21 reference genome were identified. Our results confirm the hybrid nature of the B. hybridum genome, which appears as a mosaic of B. distachyon-like and B. stacei-like sequences. Analysis of more than 50,000 SNPs for the B. distachyon accessions revealed three distinct, genetically defined populations. Surprisingly, these genomic profiles are associated with differences in flowering time rather than with broad geographic origin. High levels of differentiation in loci associated with floral development support the differences in flowering phenology between B. distachyon populations. Genome-wide association studies combining genotypic and phenotypic data also suggest the presence of one or more photoperiodism, circadian clock, and vernalization genes in loci associated with flowering time variation within B. distachyon populations. As a result, our characterization elucidates genes underlying population differences, expands the germplasm resources available for Brachypodium, and illustrates the feasibility and limitations of GWAS in this model grass.« less
A Rapid Method of Genomic Array Analysis of Scaffold/Matrix Attachment Regions (S/MARs) Identifies a 2.5-Mb Region of Enhanced Scaffold/Matrix Attachment at a Human Neocentromere

PubMed Central

Sumer, Huseyin; Craig, Jeffrey M.; Sibson, Mandy; Choo, K.H. Andy

2003-01-01

Human neocentromeres are fully functional centromeres that arise at previously noncentromeric regions of the genome. We have tested a rapid procedure of genomic array analysis of chromosome scaffold/matrix attachment regions (S/MARs), involving the isolation of S/MAR DNA and hybridization of this DNA to a genomic BAC/PAC array. Using this procedure, we have defined a 2.5-Mb domain of S/MAR-enriched chromatin that fully encompasses a previously mapped centromere protein-A (CENP-A)-associated domain at a human neocentromere. We have independently verified this procedure using a previously established fluorescence in situ hybridization method on salt-treated metaphase chromosomes. In silico sequence analysis of the S/MAR-enriched and surrounding regions has revealed no outstanding sequence-related predisposition. This study defines the S/MAR-enriched domain of a higher eukaryotic centromere and provides a method that has broad application for the mapping of S/MAR attachment sites over large genomic regions or throughout a genome. PMID:12840048
Comparative Genomics of Interreplichore Translocations in Bacteria: A Measure of Chromosome Topology?

PubMed

Khedkar, Supriya; Seshasayee, Aswin Sai Narain

2016-06-01

Genomes evolve not only in base sequence but also in terms of their architecture, defined by gene organization and chromosome topology. Whereas genome sequence data inform us about the changes in base sequences for a large variety of organisms, the study of chromosome topology is restricted to a few model organisms studied using microscopy and chromosome conformation capture techniques. Here, we exploit whole genome sequence data to study the link between gene organization and chromosome topology in bacteria. Using comparative genomics across ∼250 pairs of closely related bacteria we show that: (a) many organisms show a high degree of interreplichore translocations throughout the chromosome and not limited to the inversion-prone terminus (ter) or the origin of replication (oriC); (b) translocation maps may reflect chromosome topologies; and (c) symmetric interreplichore translocations do not disrupt the distance of a gene from oriC or affect gene expression states or strand biases in gene densities. In summary, we suggest that translocation maps might be a first line in defining a gross chromosome topology given a pair of closely related genome sequences. Copyright © 2016 Khedkar and Seshasayee.
Comparative Genomics of Interreplichore Translocations in Bacteria: A Measure of Chromosome Topology?

PubMed Central

Khedkar, Supriya; Seshasayee, Aswin Sai Narain

2016-01-01

Genomes evolve not only in base sequence but also in terms of their architecture, defined by gene organization and chromosome topology. Whereas genome sequence data inform us about the changes in base sequences for a large variety of organisms, the study of chromosome topology is restricted to a few model organisms studied using microscopy and chromosome conformation capture techniques. Here, we exploit whole genome sequence data to study the link between gene organization and chromosome topology in bacteria. Using comparative genomics across ∼250 pairs of closely related bacteria we show that: (a) many organisms show a high degree of interreplichore translocations throughout the chromosome and not limited to the inversion-prone terminus (ter) or the origin of replication (oriC); (b) translocation maps may reflect chromosome topologies; and (c) symmetric interreplichore translocations do not disrupt the distance of a gene from oriC or affect gene expression states or strand biases in gene densities. In summary, we suggest that translocation maps might be a first line in defining a gross chromosome topology given a pair of closely related genome sequences. PMID:27172194
The common ground of genomics and systems biology

PubMed Central

2014-01-01

The rise of systems biology is intertwined with that of genomics, yet their primordial relationship to one another is ill-defined. We discuss how the growth of genomics provided a critical boost to the popularity of systems biology. We describe the parts of genomics that share common areas of interest with systems biology today in the areas of gene expression, network inference, chromatin state analysis, pathway analysis, personalized medicine, and upcoming areas of synergy as genomics continues to expand its scope across all biomedical fields. PMID:25033072
Defining Genome Project Standards in a New Era of Sequencing (GSC8 Meeting)

ScienceCinema

Chain, Patrick

2018-01-15

The Genomic Standards Consortium was formed in September 2005. It is an international, open-membership working body which promotes standardization in the description of genomes and the exchange and integration of genomic data. The 2009 meeting was an activity of a five-year funding Research Coordination Network from the National Science Foundation and was organized held at the DOE Joint Genome Institute with organizational support provided by the JGI and by the University of California - San Diego.
DEFINING THE CHEMICAL SPACE OF PUBLIC GENOMIC DATA.

EPA Science Inventory

The pharmaceutical industry has demonstrated success in integrating of chemogenomic knowledge into predictive toxicological models, due in part to industry's access to large amounts of proprietary and commercial reference genomic data sets.
Kinase gene fusions in defined subsets of melanoma.

PubMed

Turner, Jacqueline; Couts, Kasey; Sheren, Jamie; Saichaemchan, Siriwimon; Ariyawutyakorn, Witthawat; Avolio, Izabela; Cabral, Ethan; Glogowska, Magdelena; Amato, Carol; Robinson, Steven; Hintzsche, Jennifer; Applegate, Allison; Seelenfreund, Eric; Gonzalez, Rita; Wells, Keith; Bagby, Stacey; Tentler, John; Tan, Aik-Choon; Wisell, Joshua; Varella-Garcia, Marileila; Robinson, William

2017-01-01

Genomic rearrangements resulting in activating kinase fusions have been increasingly described in a number of cancers including malignant melanoma, but their frequency in specific melanoma subtypes has not been reported. We used break-apart fluorescence in situ hybridization (FISH) to identify genomic rearrangements in tissues from 59 patients with various types of malignant melanoma including acral lentiginous, mucosal, superficial spreading, and nodular. We identified four genomic rearrangements involving the genes BRAF, RET, and ROS1. Of these, three were confirmed by Immunohistochemistry (IHC) or sequencing and one was found to be an ARMC10-BRAF fusion that has not been previously reported in melanoma. These fusions occurred in different subtypes of melanoma but all in tumors lacking known driver mutations. Our data suggest gene fusions are more common than previously thought and should be further explored particularly in melanomas lacking known driver mutations. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Kinase Gene Fusions in Defined Subsets of Melanoma

PubMed Central

Turner, Jacqueline; Couts, Kasey; Sheren, Jamie; Saichaemchan, Siriwimon; Ariyawutyakorn, Witthawat; Avolio, Izabela; Cabral, Ethan; Glogowska, Magdelena; Amato, Carol; Robinson, Steven; Hintzsche, Jennifer; Applegate, Allison; Seelenfreund, Eric; Gonzalez, Rita; Wells, Keith; Bagby, Stacey; Tentler, John; Tan, Aik-Choon; Wisell, Joshua; Varella-Garcia, Marileila; Robinson, William

2017-01-01

Summary Genomic rearrangements resulting in activating kinase fusions have been increasingly described in a number of cancers including malignant melanoma, but their frequency in specific melanoma subtypes has not been reported. We used break-apart fluorescence in-situ hybridization (FISH) to identify genomic rearrangements in tissues from 59 patients with various types of malignant melanoma including acral lentiginous, mucosal, superficial spreading, and nodular. We identified four genomic rearrangements involving the genes BRAF, RET, and ROS1. Of these, three were confirmed by IHC or sequencing and one was found to be an ARMC10-BRAF fusion that has not been previously reported in melanoma. These fusions occurred in different subtypes of melanoma but all in tumors lacking known driver mutations. Our data suggest gene fusions are more common than previously thought-and should be further explored particularly in melanomas lacking known driver mutations. PMID:27864876
Whole-Genome and Epigenomic Landscapes of Etiologically Distinct Subtypes of Cholangiocarcinoma

PubMed Central

Jusakul, Apinya; Cutcutache, Ioana; Yong, Chern Han; Lim, Jing Quan; Huang, Mi Ni; Padmanabhan, Nisha; Nellore, Vishwa; Kongpetch, Sarinya; Ng, Alvin Wei Tian; Ng, Ley Moy; Choo, Su Pin; Myint, Swe Swe; Thanan, Raynoo; Nagarajan, Sanjanaa; Lim, Weng Khong; Ng, Cedric Chuan Young; Boot, Arnoud; Liu, Mo; Ong, Choon Kiat; Rajasegaran, Vikneswari; Lie, Stefanus; Lim, Alvin Soon Tiong; Lim, Tse Hui; Tan, Jing; Loh, Jia Liang; McPherson, John R.; Khuntikeo, Narong; Bhudhisawasdi, Vajaraphongsa; Yongvanit, Puangrat; Wongkham, Sopit; Totoki, Yasushi; Nakamura, Hiromi; Arai, Yasuhito; Yamasaki, Satoshi; Chow, Pierce Kah-Hoe; Chung, Alexander Yaw Fui; Ooi, London Lucien Peng Jin; Lim, Kiat Hon; Dima, Simona; Duda, Dan G.; Popescu, Irinel; Broet, Philippe; Hsieh, Sen-Yung; Yu, Ming-Chin; Scarpa, Aldo; Lai, Jiaming; Luo, Di-Xian; Carvalho, André Lopes; Vettore, André Luiz; Rhee, Hyungjin; Park, Young Nyun; Alexandrov, Ludmil B.; Gordân, Raluca; Rozen, Steven G.; Shibata, Tatsuhiro; Pairojkul, Chawalit; Teh, Bin Tean; Tan, Patrick

2017-01-01

Cholangiocarcinoma (CCA) is a hepatobiliary malignancy exhibiting high incidence in countries with endemic liver-fluke infection. We analysed 489 CCAs from 10 countries, combining whole-genome (71 cases), targeted/exome, copy-number, gene expression, and DNA methylation information. Integrative clustering defined four CCA clusters – Fluke-Positive CCAs (Clusters 1/2) are enriched in ERBB2 amplifications and TP53 mutations, conversely Fluke-Negative CCAs (Clusters 3/4) exhibit high copy-number alterations and PD-1/PD-L2 expression, or epigenetic mutations (IDH1/2, BAP1) and FGFR/PRKA-related gene rearrangements. Whole-genome analysis highlighted FGFR2 3′UTR deletion as a mechanism of FGFR2 upregulation. Integration of non-coding promoter mutations with protein-DNA binding profiles demonstrates pervasive modulation of H3K27me3-associated sites in CCA. Clusters 1 and 4 exhibit distinct DNA hypermethylation patterns targeting either CpG islands or shores – mutation signature and subclonality analysis suggests that these reflect different mutational pathways. Our results exemplify how genetics, epigenetics and environmental carcinogens can interplay across different geographies to generate distinct molecular subtypes of cancer. PMID:28667006
A deep-branching clade of retrovirus-like retrotransposons in bdelloid rotifers

PubMed Central

Gladyshev, Eugene A.; Meselson, Matthew; Arkhipova, Irina R.

2007-01-01

Rotifers of class Bdelloidea, a group of aquatic invertebrates in which males and meiosis have never been documented, are also unusual in their lack of multicopy LINE-like and gypsy-like retrotransposons, groups inhabiting the genomes of nearly all other metazoans. Bdelloids do contain numerous DNA transposons, both intact and decayed, and domesticated Penelope-like retroelements Athena, concentrated at telomeric regions. Here we describe two LTR retrotransposons, each found at low copy number in a different bdelloid species, which define a clade different from previously known clades of LTR retrotransposons. Like bdelloid DNA transposons and Athena, these elements have been found preferentially in telomeric regions. Unlike bdelloid DNA transposons, many of which are decayed, the newly described elements, named Vesta and Juno, inhabiting the genomes of Philodina roseola and Adineta vaga, respectively, appear to be intact and to represent recent insertions, possibly from an exogenous source. We describe the retrovirus-like structure of the new elements, containing gag, pol, and env-like open reading frames, and discuss their possible origins, transmission, and behavior in bdelloid genomes. PMID:17129685
Genome-wide CpG island methylation and intergenic demethylation propensities vary among different tumor sites.

PubMed

Lee, Seung-Tae; Wiemels, Joseph L

2016-02-18

The epigenetic landscape of cancer includes both focal hypermethylation and broader hypomethylation in a genome-wide manner. By means of a comprehensive genomic analysis on 6637 tissues of 21 tumor types, we here show that the degrees of overall methylation in CpG island (CGI) and demethylation in intergenic regions, defined as 'backbone', largely vary among different tumors. Depending on tumor type, both CGI methylation and backbone demethylation are often associated with clinical, epidemiological and biological features such as age, sex, smoking history, anatomic location, histological type and grade, stage, molecular subtype and biological pathways. We found connections between CGI methylation and hypermutability, microsatellite instability, IDH1 mutation, 19p gain and polycomb features, and backbone demethylation with chromosomal instability, NSD1 and TP53 mutations, 5q and 19p loss and long repressive domains. These broad epigenetic patterns add a new dimension to our understanding of tumor biology and its clinical implications. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Ensembl comparative genomics resources.

PubMed

Herrero, Javier; Muffato, Matthieu; Beal, Kathryn; Fitzgerald, Stephen; Gordon, Leo; Pignatelli, Miguel; Vilella, Albert J; Searle, Stephen M J; Amode, Ridwan; Brent, Simon; Spooner, William; Kulesha, Eugene; Yates, Andrew; Flicek, Paul

2016-01-01

Evolution provides the unifying framework with which to understand biology. The coherent investigation of genic and genomic data often requires comparative genomics analyses based on whole-genome alignments, sets of homologous genes and other relevant datasets in order to evaluate and answer evolutionary-related questions. However, the complexity and computational requirements of producing such data are substantial: this has led to only a small number of reference resources that are used for most comparative analyses. The Ensembl comparative genomics resources are one such reference set that facilitates comprehensive and reproducible analysis of chordate genome data. Ensembl computes pairwise and multiple whole-genome alignments from which large-scale synteny, per-base conservation scores and constrained elements are obtained. Gene alignments are used to define Ensembl Protein Families, GeneTrees and homologies for both protein-coding and non-coding RNA genes. These resources are updated frequently and have a consistent informatics infrastructure and data presentation across all supported species. Specialized web-based visualizations are also available including synteny displays, collapsible gene tree plots, a gene family locator and different alignment views. The Ensembl comparative genomics infrastructure is extensively reused for the analysis of non-vertebrate species by other projects including Ensembl Genomes and Gramene and much of the information here is relevant to these projects. The consistency of the annotation across species and the focus on vertebrates makes Ensembl an ideal system to perform and support vertebrate comparative genomic analyses. We use robust software and pipelines to produce reference comparative data and make it freely available. Database URL: http://www.ensembl.org. © The Author(s) 2016. Published by Oxford University Press.
Ensembl comparative genomics resources

PubMed Central

Muffato, Matthieu; Beal, Kathryn; Fitzgerald, Stephen; Gordon, Leo; Pignatelli, Miguel; Vilella, Albert J.; Searle, Stephen M. J.; Amode, Ridwan; Brent, Simon; Spooner, William; Kulesha, Eugene; Yates, Andrew; Flicek, Paul

2016-01-01

Evolution provides the unifying framework with which to understand biology. The coherent investigation of genic and genomic data often requires comparative genomics analyses based on whole-genome alignments, sets of homologous genes and other relevant datasets in order to evaluate and answer evolutionary-related questions. However, the complexity and computational requirements of producing such data are substantial: this has led to only a small number of reference resources that are used for most comparative analyses. The Ensembl comparative genomics resources are one such reference set that facilitates comprehensive and reproducible analysis of chordate genome data. Ensembl computes pairwise and multiple whole-genome alignments from which large-scale synteny, per-base conservation scores and constrained elements are obtained. Gene alignments are used to define Ensembl Protein Families, GeneTrees and homologies for both protein-coding and non-coding RNA genes. These resources are updated frequently and have a consistent informatics infrastructure and data presentation across all supported species. Specialized web-based visualizations are also available including synteny displays, collapsible gene tree plots, a gene family locator and different alignment views. The Ensembl comparative genomics infrastructure is extensively reused for the analysis of non-vertebrate species by other projects including Ensembl Genomes and Gramene and much of the information here is relevant to these projects. The consistency of the annotation across species and the focus on vertebrates makes Ensembl an ideal system to perform and support vertebrate comparative genomic analyses. We use robust software and pipelines to produce reference comparative data and make it freely available. Database URL: http://www.ensembl.org. PMID:26896847
Structure and polymorphism of the mouse myelin/oligodendrocyte glycoprotein gene

DOE Office of Scientific and Technical Information (OSTI.GOV)

Daubas, P.; Pham-Dinh, D.; Dautigny, A.

1994-09-01

The authors have isolated and characterized genomic clones containing the mouse myelin/oligodendrocyte glycoprotein (MOG) gene. It spans a region of 12.5 kb and consists of eight exons. Its exon-intron structure differs from that of classical MHC-class I genes, with which it is linked in the mouse genome. Nucleotide sequencing of the 5{prime} flanking region revelas that it contains several putative protein-binding sites, some of them in common with other myelin gene promoters. One intragenic polymorphism has been identified: it consists of a GA repeat, defining at least three alleles in mouse inbred strains, and is easily detectable using the polymerasemore » chain reaction method.« less
Bioprinting for stem cell research

PubMed Central

Tasoglu, Savas; Demirci, Utkan

2012-01-01

Recently, there has been a growing interest to apply bioprinting techniques to stem cell research. Several bioprinting methods have been developed utilizing acoustics, piezoelectricity, and lasers to deposit living cells onto receiving substrates. Using these technologies, spatially defined gradients of immobilized proteins can be engineered to direct stem cell differentiation into multiple subpopulations of different lineages. Stem cells can also be patterned in a high-throughput manner onto flexible implementation patches for tissue regeneration or onto substrates with the goal of accessing encapsulated stem cell of interest for genomic analysis. Here, we review recent achievements with bioprinting technologies in stem cell research, and identify future challenges and potential applications including tissue engineering and regenerative medicine, wound healing, and genomics. PMID:23260439
COPS: Detecting Co-Occurrence and Spatial Arrangement of Transcription Factor Binding Motifs in Genome-Wide Datasets

PubMed Central

Lohmann, Ingrid

2012-01-01

In multi-cellular organisms, spatiotemporal activity of cis-regulatory DNA elements depends on their occupancy by different transcription factors (TFs). In recent years, genome-wide ChIP-on-Chip, ChIP-Seq and DamID assays have been extensively used to unravel the combinatorial interaction of TFs with cis-regulatory modules (CRMs) in the genome. Even though genome-wide binding profiles are increasingly becoming available for different TFs, single TF binding profiles are in most cases not sufficient for dissecting complex regulatory networks. Thus, potent computational tools detecting statistically significant and biologically relevant TF-motif co-occurrences in genome-wide datasets are essential for analyzing context-dependent transcriptional regulation. We have developed COPS (Co-Occurrence Pattern Search), a new bioinformatics tool based on a combination of association rules and Markov chain models, which detects co-occurring TF binding sites (BSs) on genomic regions of interest. COPS scans DNA sequences for frequent motif patterns using a Frequent-Pattern tree based data mining approach, which allows efficient performance of the software with respect to both data structure and implementation speed, in particular when mining large datasets. Since transcriptional gene regulation very often relies on the formation of regulatory protein complexes mediated by closely adjoining TF binding sites on CRMs, COPS additionally detects preferred short distance between co-occurring TF motifs. The performance of our software with respect to biological significance was evaluated using three published datasets containing genomic regions that are independently bound by several TFs involved in a defined biological process. In sum, COPS is a fast, efficient and user-friendly tool mining statistically and biologically significant TFBS co-occurrences and therefore allows the identification of TFs that combinatorially regulate gene expression. PMID:23272209

Independent evolution of the core and accessory gene sets in the genus Neisseria: insights gained from the genome of Neisseria lactamica isolate 020-06

PubMed Central

2010-01-01

Background The genus Neisseria contains two important yet very different pathogens, N. meningitidis and N. gonorrhoeae, in addition to non-pathogenic species, of which N. lactamica is the best characterized. Genomic comparisons of these three bacteria will provide insights into the mechanisms and evolution of pathogenesis in this group of organisms, which are applicable to understanding these processes more generally. Results Non-pathogenic N. lactamica exhibits very similar population structure and levels of diversity to the meningococcus, whilst gonococci are essentially recent descendents of a single clone. All three species share a common core gene set estimated to comprise around 1190 CDSs, corresponding to about 60% of the genome. However, some of the nucleotide sequence diversity within this core genome is particular to each group, indicating that cross-species recombination is rare in this shared core gene set. Other than the meningococcal cps region, which encodes the polysaccharide capsule, relatively few members of the large accessory gene pool are exclusive to one species group, and cross-species recombination within this accessory genome is frequent. Conclusion The three Neisseria species groups represent coherent biological and genetic groupings which appear to be maintained by low rates of inter-species horizontal genetic exchange within the core genome. There is extensive evidence for exchange among positively selected genes and the accessory genome and some evidence of hitch-hiking of housekeeping genes with other loci. It is not possible to define a 'pathogenome' for this group of organisms and the disease causing phenotypes are therefore likely to be complex, polygenic, and different among the various disease-associated phenotypes observed. PMID:21092259
A Glimpse into the Satellite DNA Library in Characidae Fish (Teleostei, Characiformes)

PubMed Central

Utsunomia, Ricardo; Ruiz-Ruano, Francisco J.; Silva, Duílio M. Z. A.; Serrano, Érica A.; Rosa, Ivana F.; Scudeler, Patrícia E. S.; Hashimoto, Diogo T.; Oliveira, Claudio; Camacho, Juan Pedro M.; Foresti, Fausto

2017-01-01

Satellite DNA (satDNA) is an abundant fraction of repetitive DNA in eukaryotic genomes and plays an important role in genome organization and evolution. In general, satDNA sequences follow a concerted evolutionary pattern through the intragenomic homogenization of different repeat units. In addition, the satDNA library hypothesis predicts that related species share a series of satDNA variants descended from a common ancestor species, with differential amplification of different satDNA variants. The finding of a same satDNA family in species belonging to different genera within Characidae fish provided the opportunity to test both concerted evolution and library hypotheses. For this purpose, we analyzed here sequence variation and abundance of this satDNA family in ten species, by a combination of next generation sequencing (NGS), PCR and Sanger sequencing, and fluorescence in situ hybridization (FISH). We found extensive between-species variation for the number and size of pericentromeric FISH signals. At genomic level, the analysis of 1000s of DNA sequences obtained by Illumina sequencing and PCR amplification allowed defining 150 haplotypes which were linked in a common minimum spanning tree, where different patterns of concerted evolution were apparent. This also provided a glimpse into the satDNA library of this group of species. In consistency with the library hypothesis, different variants for this satDNA showed high differences in abundance between species, from highly abundant to simply relictual variants. PMID:28855916
Sequence capture by hybridization to explore modern and ancient genomic diversity in model and nonmodel organisms

PubMed Central

Gasc, Cyrielle; Peyretaillade, Eric

2016-01-01

Abstract The recent expansion of next-generation sequencing has significantly improved biological research. Nevertheless, deep exploration of genomes or metagenomic samples remains difficult because of the sequencing depth and the associated costs required. Therefore, different partitioning strategies have been developed to sequence informative subsets of studied genomes. Among these strategies, hybridization capture has proven to be an innovative and efficient tool for targeting and enriching specific biomarkers in complex DNA mixtures. It has been successfully applied in numerous areas of biology, such as exome resequencing for the identification of mutations underlying Mendelian or complex diseases and cancers, and its usefulness has been demonstrated in the agronomic field through the linking of genetic variants to agricultural phenotypic traits of interest. Moreover, hybridization capture has provided access to underexplored, but relevant fractions of genomes through its ability to enrich defined targets and their flanking regions. Finally, on the basis of restricted genomic information, this method has also allowed the expansion of knowledge of nonreference species and ancient genomes and provided a better understanding of metagenomic samples. In this review, we present the major advances and discoveries permitted by hybridization capture and highlight the potency of this approach in all areas of biology. PMID:27105841
Sequence capture by hybridization to explore modern and ancient genomic diversity in model and nonmodel organisms.

PubMed

Gasc, Cyrielle; Peyretaillade, Eric; Peyret, Pierre

2016-06-02

The recent expansion of next-generation sequencing has significantly improved biological research. Nevertheless, deep exploration of genomes or metagenomic samples remains difficult because of the sequencing depth and the associated costs required. Therefore, different partitioning strategies have been developed to sequence informative subsets of studied genomes. Among these strategies, hybridization capture has proven to be an innovative and efficient tool for targeting and enriching specific biomarkers in complex DNA mixtures. It has been successfully applied in numerous areas of biology, such as exome resequencing for the identification of mutations underlying Mendelian or complex diseases and cancers, and its usefulness has been demonstrated in the agronomic field through the linking of genetic variants to agricultural phenotypic traits of interest. Moreover, hybridization capture has provided access to underexplored, but relevant fractions of genomes through its ability to enrich defined targets and their flanking regions. Finally, on the basis of restricted genomic information, this method has also allowed the expansion of knowledge of nonreference species and ancient genomes and provided a better understanding of metagenomic samples. In this review, we present the major advances and discoveries permitted by hybridization capture and highlight the potency of this approach in all areas of biology. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Is junk DNA bunk? A critique of ENCODE.

PubMed

Doolittle, W Ford

2013-04-02

Do data from the Encyclopedia Of DNA Elements (ENCODE) project render the notion of junk DNA obsolete? Here, I review older arguments for junk grounded in the C-value paradox and propose a thought experiment to challenge ENCODE's ontology. Specifically, what would we expect for the number of functional elements (as ENCODE defines them) in genomes much larger than our own genome? If the number were to stay more or less constant, it would seem sensible to consider the rest of the DNA of larger genomes to be junk or, at least, assign it a different sort of role (structural rather than informational). If, however, the number of functional elements were to rise significantly with C-value then, (i) organisms with genomes larger than our genome are more complex phenotypically than we are, (ii) ENCODE's definition of functional element identifies many sites that would not be considered functional or phenotype-determining by standard uses in biology, or (iii) the same phenotypic functions are often determined in a more diffuse fashion in larger-genomed organisms. Good cases can be made for propositions ii and iii. A larger theoretical framework, embracing informational and structural roles for DNA, neutral as well as adaptive causes of complexity, and selection as a multilevel phenomenon, is needed.
Is junk DNA bunk? A critique of ENCODE

PubMed Central

Doolittle, W. Ford

2013-01-01

Do data from the Encyclopedia Of DNA Elements (ENCODE) project render the notion of junk DNA obsolete? Here, I review older arguments for junk grounded in the C-value paradox and propose a thought experiment to challenge ENCODE’s ontology. Specifically, what would we expect for the number of functional elements (as ENCODE defines them) in genomes much larger than our own genome? If the number were to stay more or less constant, it would seem sensible to consider the rest of the DNA of larger genomes to be junk or, at least, assign it a different sort of role (structural rather than informational). If, however, the number of functional elements were to rise significantly with C-value then, (i) organisms with genomes larger than our genome are more complex phenotypically than we are, (ii) ENCODE’s definition of functional element identifies many sites that would not be considered functional or phenotype-determining by standard uses in biology, or (iii) the same phenotypic functions are often determined in a more diffuse fashion in larger-genomed organisms. Good cases can be made for propositions ii and iii. A larger theoretical framework, embracing informational and structural roles for DNA, neutral as well as adaptive causes of complexity, and selection as a multilevel phenomenon, is needed. PMID:23479647
Cytoplasmic male sterility-associated chimeric open reading frames identified by mitochondrial genome sequencing of four Cajanus genotypes.

PubMed

Tuteja, Reetu; Saxena, Rachit K; Davila, Jaime; Shah, Trushar; Chen, Wenbin; Xiao, Yong-Li; Fan, Guangyi; Saxena, K B; Alverson, Andrew J; Spillane, Charles; Town, Christopher; Varshney, Rajeev K

2013-10-01

The hybrid pigeonpea (Cajanus cajan) breeding technology based on cytoplasmic male sterility (CMS) is currently unique among legumes and displays major potential for yield increase. CMS is defined as a condition in which a plant is unable to produce functional pollen grains. The novel chimeric open reading frames (ORFs) produced as a results of mitochondrial genome rearrangements are considered to be the main cause of CMS. To identify these CMS-related ORFs in pigeonpea, we sequenced the mitochondrial genomes of three C. cajan lines (the male-sterile line ICPA 2039, the maintainer line ICPB 2039, and the hybrid line ICPH 2433) and of the wild relative (Cajanus cajanifolius ICPW 29). A single, circular-mapping molecule of length 545.7 kb was assembled and annotated for the ICPA 2039 line. Sequence annotation predicted 51 genes, including 34 protein-coding and 17 RNA genes. Comparison of the mitochondrial genomes from different Cajanus genotypes identified 31 ORFs, which differ between lines within which CMS is present or absent. Among these chimeric ORFs, 13 were identified by comparison of the related male-sterile and maintainer lines. These ORFs display features that are known to trigger CMS in other plant species and to represent the most promising candidates for CMS-related mitochondrial rearrangements in pigeonpea.
Cytoplasmic Male Sterility-Associated Chimeric Open Reading Frames Identified by Mitochondrial Genome Sequencing of Four Cajanus Genotypes

PubMed Central

Tuteja, Reetu; Saxena, Rachit K.; Davila, Jaime; Shah, Trushar; Chen, Wenbin; Xiao, Yong-Li; Fan, Guangyi; Saxena, K. B.; Alverson, Andrew J.; Spillane, Charles; Town, Christopher; Varshney, Rajeev K.

2013-01-01

The hybrid pigeonpea (Cajanus cajan) breeding technology based on cytoplasmic male sterility (CMS) is currently unique among legumes and displays major potential for yield increase. CMS is defined as a condition in which a plant is unable to produce functional pollen grains. The novel chimeric open reading frames (ORFs) produced as a results of mitochondrial genome rearrangements are considered to be the main cause of CMS. To identify these CMS-related ORFs in pigeonpea, we sequenced the mitochondrial genomes of three C. cajan lines (the male-sterile line ICPA 2039, the maintainer line ICPB 2039, and the hybrid line ICPH 2433) and of the wild relative (Cajanus cajanifolius ICPW 29). A single, circular-mapping molecule of length 545.7 kb was assembled and annotated for the ICPA 2039 line. Sequence annotation predicted 51 genes, including 34 protein-coding and 17 RNA genes. Comparison of the mitochondrial genomes from different Cajanus genotypes identified 31 ORFs, which differ between lines within which CMS is present or absent. Among these chimeric ORFs, 13 were identified by comparison of the related male-sterile and maintainer lines. These ORFs display features that are known to trigger CMS in other plant species and to represent the most promising candidates for CMS-related mitochondrial rearrangements in pigeonpea. PMID:23792890
Common structural and epigenetic changes in the genome of castration-resistant prostate cancer.

PubMed

Friedlander, Terence W; Roy, Ritu; Tomlins, Scott A; Ngo, Vy T; Kobayashi, Yasuko; Azameera, Aruna; Rubin, Mark A; Pienta, Kenneth J; Chinnaiyan, Arul; Ittmann, Michael M; Ryan, Charles J; Paris, Pamela L

2012-02-01

Progression of primary prostate cancer to castration-resistant prostate cancer (CRPC) is associated with numerous genetic and epigenetic alterations that are thought to promote survival at metastatic sites. In this study, we investigated gene copy number and CpG methylation status in CRPC to gain insight into specific pathophysiologic pathways that are active in this advanced form of prostate cancer. Our analysis defined and validated 495 genes exhibiting significant differences in CRPC in gene copy number, including gains in androgen receptor (AR) and losses of PTEN and retinoblastoma 1 (RB1). Significant copy number differences existed between tumors with or without AR gene amplification, including a common loss of AR repressors in AR-unamplified tumors. Simultaneous gene methylation and allelic deletion occurred frequently in RB1 and HSD17B2, the latter of which is involved in testosterone metabolism. Lastly, genomic DNA from most CRPC was hypermethylated compared with benign prostate tissue. Our findings establish a comprehensive methylation signature that couples epigenomic and structural analyses, thereby offering insights into the genomic alterations in CRPC that are associated with a circumvention of hormonal therapy. Genes identified in this integrated genomic study point to new drug targets in CRPC, an incurable disease state which remains the chief therapeutic challenge. ©2012 AACR.
Parallel or convergent evolution in human population genomic data revealed by genotype networks.

PubMed

R Vahdati, Ali; Wagner, Andreas

2016-08-02

Genotype networks are representations of genetic variation data that are complementary to phylogenetic trees. A genotype network is a graph whose nodes are genotypes (DNA sequences) with the same broadly defined phenotype. Two nodes are connected if they differ in some minimal way, e.g., in a single nucleotide. We analyze human genome variation data from the 1,000 genomes project, and construct haploid genotype (haplotype) networks for 12,235 protein coding genes. The structure of these networks varies widely among genes, indicating different patterns of variation despite a shared evolutionary history. We focus on those genes whose genotype networks show many cycles, which can indicate homoplasy, i.e., parallel or convergent evolution, on the sequence level. For 42 genes, the observed number of cycles is so large that it cannot be explained by either chance homoplasy or recombination. When analyzing possible explanations, we discovered evidence for positive selection in 21 of these genes and, in addition, a potential role for constrained variation and purifying selection. Balancing selection plays at most a small role. The 42 genes with excess cycles are enriched in functions related to immunity and response to pathogens. Genotype networks are representations of genetic variation data that can help understand unusual patterns of genomic variation.
A high-resolution map of the three-dimensional chromatin interactome in human cells.

PubMed

Jin, Fulai; Li, Yan; Dixon, Jesse R; Selvaraj, Siddarth; Ye, Zhen; Lee, Ah Young; Yen, Chia-An; Schmitt, Anthony D; Espinoza, Celso A; Ren, Bing

2013-11-14

A large number of cis-regulatory sequences have been annotated in the human genome, but defining their target genes remains a challenge. One strategy is to identify the long-range looping interactions at these elements with the use of chromosome conformation capture (3C)-based techniques. However, previous studies lack either the resolution or coverage to permit a whole-genome, unbiased view of chromatin interactions. Here we report a comprehensive chromatin interaction map generated in human fibroblasts using a genome-wide 3C analysis method (Hi-C). We determined over one million long-range chromatin interactions at 5-10-kb resolution, and uncovered general principles of chromatin organization at different types of genomic features. We also characterized the dynamics of promoter-enhancer contacts after TNF-α signalling in these cells. Unexpectedly, we found that TNF-α-responsive enhancers are already in contact with their target promoters before signalling. Such pre-existing chromatin looping, which also exists in other cell types with different extracellular signalling, is a strong predictor of gene induction. Our observations suggest that the three-dimensional chromatin landscape, once established in a particular cell type, is relatively stable and could influence the selection or activation of target genes by a ubiquitous transcription activator in a cell-specific manner.
Ancient genomic architecture for mammalian olfactory receptor clusters

PubMed Central

Aloni, Ronny; Olender, Tsviya; Lancet, Doron

2006-01-01

Background Mammalian olfactory receptor (OR) genes reside in numerous genomic clusters of up to several dozen genes. Whole-genome sequence alignment nets of five mammals allow their comprehensive comparison, aimed at reconstructing the ancestral olfactory subgenome. Results We developed a new and general tool for genome-wide definition of genomic gene clusters conserved in multiple species. Syntenic orthologs, defined as gene pairs showing conservation of both genomic location and coding sequence, were subjected to a graph theory algorithm for discovering CLICs (clusters in conservation). When applied to ORs in five mammals, including the marsupial opossum, more than 90% of the OR genes were found within a framework of 48 multi-species CLICs, invoking a general conservation of gene order and composition. A detailed analysis of individual CLICs revealed multiple differences among species, interpretable through species-specific genomic rearrangements and reflecting complex mammalian evolutionary dynamics. One significant instance involves CLIC #1, which lacks a human member, implying the human-specific deletion of an OR cluster, whose mouse counterpart has been tentatively associated with isovaleric acid odorant detection. Conclusion The identified multi-species CLICs demonstrate that most of the mammalian OR clusters have a common ancestry, preceding the split between marsupials and placental mammals. However, only two of these CLICs were capable of incorporating chicken OR genes, parsimoniously implying that all other CLICs emerged subsequent to the avian-mammalian divergence. PMID:17010214
Rice functional genomics research in China.

PubMed

Han, Bin; Xue, Yongbiao; Li, Jiayang; Deng, Xing-Wang; Zhang, Qifa

2007-06-29

Rice functional genomics is a scientific approach that seeks to identify and define the function of rice genes, and uncover when and how genes work together to produce phenotypic traits. Rapid progress in rice genome sequencing has facilitated research in rice functional genomics in China. The Ministry of Science and Technology of China has funded two major rice functional genomics research programmes for building up the infrastructures of the functional genomics study such as developing rice functional genomics tools and resources. The programmes were also aimed at cloning and functional analyses of a number of genes controlling important agronomic traits from rice. National and international collaborations on rice functional genomics study are accelerating rice gene discovery and application.
Genomic sequence analysis of the United States infectious laryngotracheitis vaccine strains chicken embryo origin (CEO) and tissue culture origin (TCO)

USDA-ARS?s Scientific Manuscript database

The genomic sequences of low and high passages of U.S. infectious laryngotracheitis (ILT) vaccine strains chicken embryo origin (CEO) and tissue culture origin (TCO) these strains were determined using hybrid next generation sequencing in order to define relevant genomic changes associated with att...
Comprehensive analysis of RNA-seq data reveals the complexity of the transcriptome in Brassica rapa.

PubMed

Tong, Chaobo; Wang, Xiaowu; Yu, Jingyin; Wu, Jian; Li, Wanshun; Huang, Junyan; Dong, Caihua; Hua, Wei; Liu, Shengyi

2013-10-07

The species Brassica rapa (2n=20, AA) is an important vegetable and oilseed crop, and serves as an excellent model for genomic and evolutionary research in Brassica species. With the availability of whole genome sequence of B. rapa, it is essential to further determine the activity of all functional elements of the B. rapa genome and explore the transcriptome on a genome-wide scale. Here, RNA-seq data was employed to provide a genome-wide transcriptional landscape and characterization of the annotated and novel transcripts and alternative splicing events across tissues. RNA-seq reads were generated using the Illumina platform from six different tissues (root, stem, leaf, flower, silique and callus) of the B. rapa accession Chiifu-401-42, the same line used for whole genome sequencing. First, these data detected the widespread transcription of the B. rapa genome, leading to the identification of numerous novel transcripts and definition of 5'/3' UTRs of known genes. Second, 78.8% of the total annotated genes were detected as expressed and 45.8% were constitutively expressed across all tissues. We further defined several groups of genes: housekeeping genes, tissue-specific expressed genes and co-expressed genes across tissues, which will serve as a valuable repository for future crop functional genomics research. Third, alternative splicing (AS) is estimated to occur in more than 29.4% of intron-containing B. rapa genes, and 65% of them were commonly detected in more than two tissues. Interestingly, genes with high rate of AS were over-represented in GO categories relating to transcriptional regulation and signal transduction, suggesting potential importance of AS for playing regulatory role in these genes. Further, we observed that intron retention (IR) is predominant in the AS events and seems to preferentially occurred in genes with short introns. The high-resolution RNA-seq analysis provides a global transcriptional landscape as a complement to the B. rapa genome sequence, which will advance our understanding of the dynamics and complexity of the B. rapa transcriptome. The atlas of gene expression in different tissues will be useful for accelerating research on functional genomics and genome evolution in Brassica species.
Common position of indels that cause deviations from canonical genome organization in different measles virus strains.

PubMed

Ivancic-Jelecki, Jelena; Slovic, Anamarija; Šantak, Maja; Tešović, Goran; Forcic, Dubravko

2016-07-29

The canonical genome organization of measles virus (MV) is characterized by total size of 15 894 nucleotides (nts) and defined length of every genomic region, both coding and non-coding. Only rarely have reports of strains possessing non-canonical genomic properties (possessing indels, with or without the change of total genome length) been published. The observed mutations are mutually compensatory in a sense that the total genome length remains polyhexameric. Although programmed and highly precise pseudo-templated nucleotide additions during transcription are inherent to polymerases of all viruses belonging to family Paramyxoviridae, a similar mechanism that would serve to non-randomly correct genome length, if an indel has occurred during replication, has so far not been described in the context of a complete virus genome. We compiled all complete MV genomic sequences (64 in total) available in open access sequence databases. Multiple sequence comparisons and phylogenetic analyses were performed with the aim of exploring whether non-recombinant and non-evolutionary linked measles strains that show deviations from canonical genome organization possess a common genetic characteristic. In 11 MV sequences we detected deviations from canonical genome organization due to short indels located within homopolymeric stretches or next to them. In nine out of 11 identified non-canonical MV sequences, a common feature was observed: one mutation, either an insertion or a deletion, was located in a 28 nts long region in F gene 5' untranslated region (positions 5051-5078 in genomic cDNA of canonical strains). This segment is composed of five tandemly linked homopolymeric stretches, its consensus sequence is G6-7C7-8A6-7G1-3C5-6. Although none of the mononucleotide repeats within this segment has fixed length, the total number of nts in canonical strains is always 28. These nine non-canonical strains, as well as the tenth (not mutated in 5051-5078 segment), can be grouped in three clusters, based on their passage histories/epidemiological data/genetic similarities. There are no indications that the 3 clusters are evolutionary linked, other than the fact that they all belong to clade D. A common narrow genomic region was found to be mutated in different, non-related, wild type strains suggesting that this region might have a function in non-random genome length corrections occurring during MV replication.
Next-Generation High-Throughput Functional Annotation of Microbial Genomes.

PubMed

Baric, Ralph S; Crosson, Sean; Damania, Blossom; Miller, Samuel I; Rubin, Eric J

2016-10-04

Host infection by microbial pathogens cues global changes in microbial and host cell biology that facilitate microbial replication and disease. The complete maps of thousands of bacterial and viral genomes have recently been defined; however, the rate at which physiological or biochemical functions have been assigned to genes has greatly lagged. The National Institute of Allergy and Infectious Diseases (NIAID) addressed this gap by creating functional genomics centers dedicated to developing high-throughput approaches to assign gene function. These centers require broad-based and collaborative research programs to generate and integrate diverse data to achieve a comprehensive understanding of microbial pathogenesis. High-throughput functional genomics can lead to new therapeutics and better understanding of the next generation of emerging pathogens by rapidly defining new general mechanisms by which organisms cause disease and replicate in host tissues and by facilitating the rate at which functional data reach the scientific community. Copyright © 2016 Baric et al.
GUIDEseq: a bioconductor package to analyze GUIDE-Seq datasets for CRISPR-Cas nucleases.

PubMed

Zhu, Lihua Julie; Lawrence, Michael; Gupta, Ankit; Pagès, Hervé; Kucukural, Alper; Garber, Manuel; Wolfe, Scot A

2017-05-15

Genome editing technologies developed around the CRISPR-Cas9 nuclease system have facilitated the investigation of a broad range of biological questions. These nucleases also hold tremendous promise for treating a variety of genetic disorders. In the context of their therapeutic application, it is important to identify the spectrum of genomic sequences that are cleaved by a candidate nuclease when programmed with a particular guide RNA, as well as the cleavage efficiency of these sites. Powerful new experimental approaches, such as GUIDE-seq, facilitate the sensitive, unbiased genome-wide detection of nuclease cleavage sites within the genome. Flexible bioinformatics analysis tools for processing GUIDE-seq data are needed. Here, we describe an open source, open development software suite, GUIDEseq, for GUIDE-seq data analysis and annotation as a Bioconductor package in R. The GUIDEseq package provides a flexible platform with more than 60 adjustable parameters for the analysis of datasets associated with custom nuclease applications. These parameters allow data analysis to be tailored to different nuclease platforms with different length and complexity in their guide and PAM recognition sequences or their DNA cleavage position. They also enable users to customize sequence aggregation criteria, and vary peak calling thresholds that can influence the number of potential off-target sites recovered. GUIDEseq also annotates potential off-target sites that overlap with genes based on genome annotation information, as these may be the most important off-target sites for further characterization. In addition, GUIDEseq enables the comparison and visualization of off-target site overlap between different datasets for a rapid comparison of different nuclease configurations or experimental conditions. For each identified off-target, the GUIDEseq package outputs mapped GUIDE-Seq read count as well as cleavage score from a user specified off-target cleavage score prediction algorithm permitting the identification of genomic sequences with unexpected cleavage activity. The GUIDEseq package enables analysis of GUIDE-data from various nuclease platforms for any species with a defined genomic sequence. This software package has been used successfully to analyze several GUIDE-seq datasets. The software, source code and documentation are freely available at http://www.bioconductor.org/packages/release/bioc/html/GUIDEseq.html .
CTCF-Mediated Human 3D Genome Architecture Reveals Chromatin Topology for Transcription.

PubMed

Tang, Zhonghui; Luo, Oscar Junhong; Li, Xingwang; Zheng, Meizhen; Zhu, Jacqueline Jufen; Szalaj, Przemyslaw; Trzaskoma, Pawel; Magalska, Adriana; Wlodarczyk, Jakub; Ruszczycki, Blazej; Michalski, Paul; Piecuch, Emaly; Wang, Ping; Wang, Danjuan; Tian, Simon Zhongyuan; Penrad-Mobayed, May; Sachs, Laurent M; Ruan, Xiaoan; Wei, Chia-Lin; Liu, Edison T; Wilczynski, Grzegorz M; Plewczynski, Dariusz; Li, Guoliang; Ruan, Yijun

2015-12-17

Spatial genome organization and its effect on transcription remains a fundamental question. We applied an advanced chromatin interaction analysis by paired-end tag sequencing (ChIA-PET) strategy to comprehensively map higher-order chromosome folding and specific chromatin interactions mediated by CCCTC-binding factor (CTCF) and RNA polymerase II (RNAPII) with haplotype specificity and nucleotide resolution in different human cell lineages. We find that CTCF/cohesin-mediated interaction anchors serve as structural foci for spatial organization of constitutive genes concordant with CTCF-motif orientation, whereas RNAPII interacts within these structures by selectively drawing cell-type-specific genes toward CTCF foci for coordinated transcription. Furthermore, we show that haplotype variants and allelic interactions have differential effects on chromosome configuration, influencing gene expression, and may provide mechanistic insights into functions associated with disease susceptibility. 3D genome simulation suggests a model of chromatin folding around chromosomal axes, where CTCF is involved in defining the interface between condensed and open compartments for structural regulation. Our 3D genome strategy thus provides unique insights in the topological mechanism of human variations and diseases. Copyright © 2015 Elsevier Inc. All rights reserved.
Differential nuclear scaffold/matrix attachment marks expressed genes.

PubMed

Linnemann, Amelia K; Platts, Adrian E; Krawetz, Stephen A

2009-02-15

It is well established that nuclear architecture plays a key role in poising regions of the genome for transcription. This may be achieved using scaffold/matrix attachment regions (S/MARs) that establish loop domains. However, the relationship between changes in the physical structure of the genome as mediated by attachment to the nuclear scaffold/matrix and gene expression is not clearly understood. To define the role of S/MARs in organizing our genome and to resolve the often contradictory loci-specific studies, we have surveyed the S/MARs in HeLa S3 cells on human chromosomes 14-18 by array comparative genomic hybridization. Comparison of LIS (lithium 3,5-diiodosalicylate) extraction to identify SARs and 2 m NaCl extraction to identify MARs revealed that approximately one-half of the sites were in common. The results presented in this study suggest that SARs 5' of a gene are associated with transcript presence whereas MARs contained within a gene are associated with silenced genes. The varied functions of the S/MARs as revealed by the different extraction methods highlights their unique functional contribution.

Differential nuclear scaffold/matrix attachment marks expressed genes†

PubMed Central

Linnemann, Amelia K.; Platts, Adrian E.; Krawetz, Stephen A.

2009-01-01

It is well established that nuclear architecture plays a key role in poising regions of the genome for transcription. This may be achieved using scaffold/matrix attachment regions (S/MARs) that establish loop domains. However, the relationship between changes in the physical structure of the genome as mediated by attachment to the nuclear scaffold/matrix and gene expression is not clearly understood. To define the role of S/MARs in organizing our genome and to resolve the often contradictory loci-specific studies, we have surveyed the S/MARs in HeLa S3 cells on human chromosomes 14–18 by array comparative genomic hybridization. Comparison of LIS (lithium 3,5-diiodosalicylate) extraction to identify SARs and 2 m NaCl extraction to identify MARs revealed that approximately one-half of the sites were in common. The results presented in this study suggest that SARs 5′ of a gene are associated with transcript presence whereas MARs contained within a gene are associated with silenced genes. The varied functions of the S/MARs as revealed by the different extraction methods highlights their unique functional contribution. PMID:19017725
Linking Bacillus cereus Genotypes and Carbohydrate Utilization Capacity.

PubMed

Warda, Alicja K; Siezen, Roland J; Boekhorst, Jos; Wells-Bennik, Marjon H J; de Jong, Anne; Kuipers, Oscar P; Nierop Groot, Masja N; Abee, Tjakko

2016-01-01

We characterised carbohydrate utilisation of 20 newly sequenced Bacillus cereus strains isolated from food products and food processing environments and two laboratory strains, B. cereus ATCC 10987 and B. cereus ATCC 14579. Subsequently, genome sequences of these strains were analysed together with 11 additional B. cereus reference genomes to provide an overview of the different types of carbohydrate transporters and utilization systems found in B. cereus strains. The combined application of API tests, defined growth media experiments and comparative genomics enabled us to link the carbohydrate utilisation capacity of 22 B. cereus strains with their genome content and in some cases to the panC phylogenetic grouping. A core set of carbohydrates including glucose, fructose, maltose, trehalose, N-acetyl-glucosamine, and ribose could be used by all strains, whereas utilisation of other carbohydrates like xylose, galactose, and lactose, and typical host-derived carbohydrates such as fucose, mannose, N-acetyl-galactosamine and inositol is limited to a subset of strains. Finally, the roles of selected carbohydrate transporters and utilisation systems in specific niches such as soil, foods and the human host are discussed.
VCF-Explorer: filtering and analysing whole genome VCF files.

PubMed

Akgün, Mete; Demirci, Hüseyin

2017-11-01

The decreasing cost in high-throughput technologies led to a number of sequencing projects consisting of thousands of whole genomes. The paradigm shift from exome to whole genome brings a significant increase in the size of output files. Most of the existing tools which are developed to analyse exome files are not adequate for larger VCF files produced by whole genome studies. In this work we present VCF-Explorer, a variant analysis software capable of handling large files. Memory efficiency and avoiding computationally costly pre-processing step enable to carry out the analysis to be performed with ordinary computers. VCF-Explorer provides an easy to use environment where users can define various types of queries based on variant and sample genotype level annotations. VCF-Explorer can be run in different environments and computational platforms ranging from a standard laptop to a high performance server. VCF-Explorer is freely available at: http://vcfexplorer.sourceforge.net/. mete.akgun@tubitak.gov.tr. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Linking Bacillus cereus Genotypes and Carbohydrate Utilization Capacity

PubMed Central

Warda, Alicja K.; Siezen, Roland J.; Boekhorst, Jos; Wells-Bennik, Marjon H. J.; de Jong, Anne; Kuipers, Oscar P.; Nierop Groot, Masja N.; Abee, Tjakko

2016-01-01

We characterised carbohydrate utilisation of 20 newly sequenced Bacillus cereus strains isolated from food products and food processing environments and two laboratory strains, B. cereus ATCC 10987 and B. cereus ATCC 14579. Subsequently, genome sequences of these strains were analysed together with 11 additional B. cereus reference genomes to provide an overview of the different types of carbohydrate transporters and utilization systems found in B. cereus strains. The combined application of API tests, defined growth media experiments and comparative genomics enabled us to link the carbohydrate utilisation capacity of 22 B. cereus strains with their genome content and in some cases to the panC phylogenetic grouping. A core set of carbohydrates including glucose, fructose, maltose, trehalose, N-acetyl-glucosamine, and ribose could be used by all strains, whereas utilisation of other carbohydrates like xylose, galactose, and lactose, and typical host-derived carbohydrates such as fucose, mannose, N-acetyl-galactosamine and inositol is limited to a subset of strains. Finally, the roles of selected carbohydrate transporters and utilisation systems in specific niches such as soil, foods and the human host are discussed. PMID:27272929
Computational Approaches to Identify Promoters and cis-Regulatory Elements in Plant Genomes1

PubMed Central

Rombauts, Stephane; Florquin, Kobe; Lescot, Magali; Marchal, Kathleen; Rouzé, Pierre; Van de Peer, Yves

2003-01-01

The identification of promoters and their regulatory elements is one of the major challenges in bioinformatics and integrates comparative, structural, and functional genomics. Many different approaches have been developed to detect conserved motifs in a set of genes that are either coregulated or orthologous. However, although recent approaches seem promising, in general, unambiguous identification of regulatory elements is not straightforward. The delineation of promoters is even harder, due to its complex nature, and in silico promoter prediction is still in its infancy. Here, we review the different approaches that have been developed for identifying promoters and their regulatory elements. We discuss the detection of cis-acting regulatory elements using word-counting or probabilistic methods (so-called “search by signal” methods) and the delineation of promoters by considering both sequence content and structural features (“search by content” methods). As an example of search by content, we explored in greater detail the association of promoters with CpG islands. However, due to differences in sequence content, the parameters used to detect CpG islands in humans and other vertebrates cannot be used for plants. Therefore, a preliminary attempt was made to define parameters that could possibly define CpG and CpNpG islands in Arabidopsis, by exploring the compositional landscape around the transcriptional start site. To this end, a data set of more than 5,000 gene sequences was built, including the promoter region, the 5′-untranslated region, and the first introns and coding exons. Preliminary analysis shows that promoter location based on the detection of potential CpG/CpNpG islands in the Arabidopsis genome is not straightforward. Nevertheless, because the landscape of CpG/CpNpG islands differs considerably between promoters and introns on the one side and exons (whether coding or not) on the other, more sophisticated approaches can probably be developed for the successful detection of “putative” CpG and CpNpG islands in plants. PMID:12857799
Hypermutation In Pancreatic Cancer.

PubMed

Humphris, Jeremy L; Patch, Ann-Marie; Nones, Katia; Bailey, Peter J; Johns, Amber L; McKay, Skye; Chang, David K; Miller, David K; Pajic, Marina; Kassahn, Karin S; Quinn, Michael C J; Bruxner, Timothy J C; Christ, Angelika N; Harliwong, Ivon; Idrisoglu, Senel; Manning, Suzanne; Nourse, Craig; Nourbakhsh, Ehsan; Stone, Andrew; Wilson, Peter J; Anderson, Matthew; Fink, J Lynn; Holmes, Oliver; Kazakoff, Stephen; Leonard, Conrad; Newell, Felicity; Waddell, Nick; Wood, Scott; Mead, Ronald S; Xu, Qinying; Wu, Jianmin; Pinese, Mark; Cowley, Mark J; Jones, Marc D; Nagrial, Adnan M; Chin, Venessa T; Chantrill, Lorraine A; Mawson, Amanda; Chou, Angela; Scarlett, Christopher J; Pinho, Andreia V; Rooman, Ilse; Giry-Laterriere, Marc; Samra, Jaswinder S; Kench, James G; Merrett, Neil D; Toon, Christopher W; Epari, Krishna; Nguyen, Nam Q; Barbour, Andrew; Zeps, Nikolajs; Jamieson, Nigel B; McKay, Colin J; Carter, C Ross; Dickson, Euan J; Graham, Janet S; Duthie, Fraser; Oien, Karin; Hair, Jane; Morton, Jennifer P; Sansom, Owen J; Grützmann, Robert; Hruban, Ralph H; Maitra, Anirban; Iacobuzio-Donahue, Christine A; Schulick, Richard D; Wolfgang, Christopher L; Morgan, Richard A; Lawlor, Rita T; Rusev, Borislav; Corbo, Vincenzo; Salvia, Roberto; Cataldo, Ivana; Tortora, Giampaolo; Tempero, Margaret A; Hofmann, Oliver; Eshleman, James R; Pilarsky, Christian; Scarpa, Aldo; Musgrove, Elizabeth A; Gill, Anthony J; Pearson, John V; Grimmond, Sean M; Waddell, Nicola; Biankin, Andrew V

2017-01-01

Pancreatic cancer is molecularly diverse, with few effective therapies. Increased mutation burden and defective DNA repair are associated with response to immune checkpoint inhibitors in several other cancer types. We interrogated 385 pancreatic cancer genomes to define hypermutation and its causes. Mutational signatures inferring defects in DNA repair were enriched in those with the highest mutation burdens. Mismatch repair deficiency was identified in 1% of tumors harboring different mechanisms of somatic inactivation of MLH1 and MSH2. Defining mutation load in individual pancreatic cancers and the optimal assay for patient selection may inform clinical trial design for immunotherapy in pancreatic cancer. Copyright © 2017 AGA Institute. Published by Elsevier Inc. All rights reserved.
A 1463 Gene Cattle–Human Comparative Map With Anchor Points Defined by Human Genome Sequence Coordinates

PubMed Central

Everts-van der Wind, Annelie; Kata, Srinivas R.; Band, Mark R.; Rebeiz, Mark; Larkin, Denis M.; Everts, Robin E.; Green, Cheryl A.; Liu, Lei; Natarajan, Shreedhar; Goldammer, Tom; Lee, Jun Heon; McKay, Stephanie; Womack, James E.; Lewin, Harris A.

2004-01-01

A second-generation 5000 rad radiation hybrid (RH) map of the cattle genome was constructed primarily using cattle ESTs that were targeted to gaps in the existing cattle–human comparative map, as well as to sparsely populated map intervals. A total of 870 targeted markers were added, bringing the number of markers mapped on the RH5000 panel to 1913. Of these, 1463 have significant BLASTN hits (E < e–5) against the human genome sequence. A cattle–human comparative map was created using human genome sequence coordinates of the paired orthologs. One-hundred and ninety-five conserved segments (defined by two or more genes) were identified between the cattle and human genomes, of which 31 are newly discovered and 34 were extended singletons on the first-generation map. The new map represents an improvement of 20% genome-wide comparative coverage compared with the first-generation map. Analysis of gene content within human genome regions where there are gaps in the comparative map revealed gaps with both significantly greater and significantly lower gene content. The new, more detailed cattle–human comparative map provides an improved resource for the analysis of mammalian chromosome evolution, the identification of candidate genes for economically important traits, and for proper alignment of sequence contigs on cattle chromosomes. PMID:15231756
Computing prokaryotic gene ubiquity: rescuing the core from extinction.

PubMed

Charlebois, Robert L; Doolittle, W Ford

2004-12-01

The genomic core concept has found several uses in comparative and evolutionary genomics. Defined as the set of all genes common to (ubiquitous among) all genomes in a phylogenetically coherent group, core size decreases as the number and phylogenetic diversity of the relevant group increases. Here, we focus on methods for defining the size and composition of the core of all genes shared by sequenced genomes of prokaryotes (Bacteria and Archaea). There are few (almost certainly less than 50) genes shared by all of the 147 genomes compared, surely insufficient to conduct all essential functions. Sequencing and annotation errors are responsible for the apparent absence of some genes, while very limited but genuine disappearances (from just one or a few genomes) can account for several others. Core size will continue to decrease as more genome sequences appear, unless the requirement for ubiquity is relaxed. Such relaxation seems consistent with any reasonable biological purpose for seeking a core, but it renders the problem of definition more problematic. We propose an alternative approach (the phylogenetically balanced core), which preserves some of the biological utility of the core concept. Cores, however delimited, preferentially contain informational rather than operational genes; we present a new hypothesis for why this might be so.
The consequences of chromosomal aneuploidy on the transcriptome of cancer cells☆

PubMed Central

Ried, Thomas; Hu, Yue; Difilippantonio, Michael J.; Ghadimi, B. Michael; Grade, Marian; Camps, Jordi

2016-01-01

Chromosomal aneuploidies are a defining feature of carcinomas, i.e., tumors of epithelial origin. Such aneuploidies result in tumor specific genomic copy number alterations. The patterns of genomic imbalances are tumor specific, and to a certain extent specific for defined stages of tumor development. Genomic imbalances occur already in premalignant precursor lesions, i.e., before the transition to invasive disease, and their distribution is maintained in metastases, and in cell lines derived from primary tumors. These observations are consistent with the interpretation that tumor specific genomic imbalances are drivers of malignant transformation. Naturally, this precipitates the question of how such imbalances influence the expression of resident genes. A number of laboratories have systematically integrated copy number alterations with gene expression changes in primary tumors and metastases, cell lines, and experimental models of aneuploidy to address the question as to whether genomic imbalances deregulate the expression of one or few key genes, or rather affect the cancer transcriptome more globally. The majority of these studies showed that gene expression levels follow genomic copy number. Therefore, gross genomic copy number changes, including aneuploidies of entire chromosome arms and chromosomes, result in a massive deregulation of the transcriptome of cancer cells. This article is part of a Special Issue entitled: Chromatin in time and space. PMID:22426433
The hunt for origins of DNA replication in multicellular eukaryotes

PubMed Central

Urban, John M.; Foulk, Michael S.; Casella, Cinzia

2015-01-01

Origins of DNA replication (ORIs) occur at defined regions in the genome. Although DNA sequence defines the position of ORIs in budding yeast, the factors for ORI specification remain elusive in metazoa. Several methods have been used recently to map ORIs in metazoan genomes with the hope that features for ORI specification might emerge. These methods are reviewed here with analysis of their advantages and shortcomings. The various factors that may influence ORI selection for initiation of DNA replication are discussed. PMID:25926981
The Physcomitrella patens gene atlas project: large-scale RNA-seq based expression data.

PubMed

Perroud, Pierre-François; Haas, Fabian B; Hiss, Manuel; Ullrich, Kristian K; Alboresi, Alessandro; Amirebrahimi, Mojgan; Barry, Kerrie; Bassi, Roberto; Bonhomme, Sandrine; Chen, Haodong; Coates, Juliet C; Fujita, Tomomichi; Guyon-Debast, Anouchka; Lang, Daniel; Lin, Junyan; Lipzen, Anna; Nogué, Fabien; Oliver, Melvin J; Ponce de León, Inés; Quatrano, Ralph S; Rameau, Catherine; Reiss, Bernd; Reski, Ralf; Ricca, Mariana; Saidi, Younousse; Sun, Ning; Szövényi, Péter; Sreedasyam, Avinash; Grimwood, Jane; Stacey, Gary; Schmutz, Jeremy; Rensing, Stefan A

2018-07-01

High-throughput RNA sequencing (RNA-seq) has recently become the method of choice to define and analyze transcriptomes. For the model moss Physcomitrella patens, although this method has been used to help analyze specific perturbations, no overall reference dataset has yet been established. In the framework of the Gene Atlas project, the Joint Genome Institute selected P. patens as a flagship genome, opening the way to generate the first comprehensive transcriptome dataset for this moss. The first round of sequencing described here is composed of 99 independent libraries spanning 34 different developmental stages and conditions. Upon dataset quality control and processing through read mapping, 28 509 of the 34 361 v3.3 gene models (83%) were detected to be expressed across the samples. Differentially expressed genes (DEGs) were calculated across the dataset to permit perturbation comparisons between conditions. The analysis of the three most distinct and abundant P. patens growth stages - protonema, gametophore and sporophyte - allowed us to define both general transcriptional patterns and stage-specific transcripts. As an example of variation of physico-chemical growth conditions, we detail here the impact of ammonium supplementation under standard growth conditions on the protonemal transcriptome. Finally, the cooperative nature of this project allowed us to analyze inter-laboratory variation, as 13 different laboratories around the world provided samples. We compare differences in the replication of experiments in a single laboratory and between different laboratories. © 2018 The Authors The Plant Journal © 2018 John Wiley & Sons Ltd.
Congruence as a measurement of extended haplotype structure across the genome

PubMed Central

2012-01-01

Background Historically, extended haplotypes have been defined using only a few data points, such as alleles for several HLA genes in the MHC. High-density SNP data, and the increasing affordability of whole genome SNP typing, creates the opportunity to define higher resolution extended haplotypes. This drives the need for new tools that support quantification and visualization of extended haplotypes as defined by as many as 2000 SNPs. Confronted with high-density SNP data across the major histocompatibility complex (MHC) for 2,300 complete families, compiled by the Type 1 Diabetes Genetics Consortium (T1DGC), we developed software for studying extended haplotypes. Methods The software, called ExHap (Extended Haplotype), uses a similarity measurement we term congruence to identify and quantify long-range allele identity. Using ExHap, we analyzed congruence in both the T1DGC data and family-phased data from the International HapMap Project. Results Congruent chromosomes from the T1DGC data have between 96.5% and 99.9% allele identity over 1,818 SNPs spanning 2.64 megabases of the MHC (HLA-DRB1 to HLA-A). Thirty-three of 132 DQ-DR-B-A defined haplotype groups have > 50% congruent chromosomes in this region. For example, 92% of chromosomes within the DR3-B8-A1 haplotype are congruent from HLA-DRB1 to HLA-A (99.8% allele identity). We also applied ExHap to all 22 autosomes for both CEU and YRI cohorts from the International HapMap Project, identifying multiple candidate extended haplotypes. Conclusions Long-range congruence is not unique to the MHC region. Patterns of allele identity on phased chromosomes provide a simple, straightforward approach to visually and quantitatively inspect complex long-range structural patterns in the genome. Such patterns aid the biologist in appreciating genetic similarities and differences across cohorts, and can lead to hypothesis generation for subsequent studies. PMID:22369243
Prophage Integrase Typing Is a Useful Indicator of Genomic Diversity in Salmonella enterica

PubMed Central

Colavecchio, Anna; D’Souza, Yasmin; Tompkins, Elizabeth; Jeukens, Julie; Freschi, Luca; Emond-Rheault, Jean-Guillaume; Kukavica-Ibrulj, Irena; Boyle, Brian; Bekal, Sadjia; Tamber, Sandeep; Levesque, Roger C.; Goodridge, Lawrence D.

2017-01-01

Salmonella enterica is a bacterial species that is a major cause of illness in humans and food-producing animals. S. enterica exhibits considerable inter-serovar diversity, as evidenced by the large number of host adapted serovars that have been identified. The development of methods to assess genome diversity in S. enterica will help to further define the limits of diversity in this foodborne pathogen. Thus, we evaluated a PCR assay, which targets prophage integrase genes, as a rapid method to investigate S. enterica genome diversity. To evaluate the PCR prophage integrase assay, 49 isolates of S. enterica were selected, including 19 clinical isolates from clonal serovars (Enteritidis and Heidelberg) that commonly cause human illness, and 30 isolates from food-associated Salmonella serovars that rarely cause human illness. The number of integrase genes identified by the PCR assay was compared to the number of integrase genes within intact prophages identified by whole genome sequencing and phage finding program PHASTER. The PCR assay identified a total of 147 prophage integrase genes within the 49 S. enterica genomes (79 integrase genes in the food-associated Salmonella isolates, 50 integrase genes in S. Enteritidis, and 18 integrase genes in S. Heidelberg). In comparison, whole genome sequencing and PHASTER identified a total of 75 prophage integrase genes within 102 intact prophages in the 49 S. enterica genomes (44 integrase genes in the food-associated Salmonella isolates, 21 integrase genes in S. Enteritidis, and 9 integrase genes in S. Heidelberg). Collectively, both the PCR assay and PHASTER identified the presence of a large diversity of prophage integrase genes in the food-associated isolates compared to the clinical isolates, thus indicating a high degree of diversity in the food-associated isolates, and confirming the clonal nature of S. Enteritidis and S. Heidelberg. Moreover, PHASTER revealed a diversity of 29 different types of prophages and 23 different integrase genes within the food-associated isolates, but only identified four different phages and integrase genes within clonal isolates of S. Enteritidis and S. Heidelberg. These results demonstrate the potential usefulness of PCR based detection of prophage integrase genes as a rapid indicator of genome diversity in S. enterica. PMID:28740489
Prophage Integrase Typing Is a Useful Indicator of Genomic Diversity in Salmonella enterica.

PubMed

Colavecchio, Anna; D'Souza, Yasmin; Tompkins, Elizabeth; Jeukens, Julie; Freschi, Luca; Emond-Rheault, Jean-Guillaume; Kukavica-Ibrulj, Irena; Boyle, Brian; Bekal, Sadjia; Tamber, Sandeep; Levesque, Roger C; Goodridge, Lawrence D

2017-01-01

Salmonella enterica is a bacterial species that is a major cause of illness in humans and food-producing animals. S. enterica exhibits considerable inter-serovar diversity, as evidenced by the large number of host adapted serovars that have been identified. The development of methods to assess genome diversity in S. enterica will help to further define the limits of diversity in this foodborne pathogen. Thus, we evaluated a PCR assay, which targets prophage integrase genes, as a rapid method to investigate S. enterica genome diversity. To evaluate the PCR prophage integrase assay, 49 isolates of S. enterica were selected, including 19 clinical isolates from clonal serovars (Enteritidis and Heidelberg) that commonly cause human illness, and 30 isolates from food-associated Salmonella serovars that rarely cause human illness. The number of integrase genes identified by the PCR assay was compared to the number of integrase genes within intact prophages identified by whole genome sequencing and phage finding program PHASTER. The PCR assay identified a total of 147 prophage integrase genes within the 49 S. enterica genomes (79 integrase genes in the food-associated Salmonella isolates, 50 integrase genes in S . Enteritidis, and 18 integrase genes in S . Heidelberg). In comparison, whole genome sequencing and PHASTER identified a total of 75 prophage integrase genes within 102 intact prophages in the 49 S. enterica genomes (44 integrase genes in the food-associated Salmonella isolates, 21 integrase genes in S . Enteritidis, and 9 integrase genes in S . Heidelberg). Collectively, both the PCR assay and PHASTER identified the presence of a large diversity of prophage integrase genes in the food-associated isolates compared to the clinical isolates, thus indicating a high degree of diversity in the food-associated isolates, and confirming the clonal nature of S . Enteritidis and S . Heidelberg. Moreover, PHASTER revealed a diversity of 29 different types of prophages and 23 different integrase genes within the food-associated isolates, but only identified four different phages and integrase genes within clonal isolates of S. Enteritidis and S. Heidelberg. These results demonstrate the potential usefulness of PCR based detection of prophage integrase genes as a rapid indicator of genome diversity in S. enterica .
Complete Genome Sequence of Dehalobacterium formicoaceticum Strain DMC, a Strictly Anaerobic Dichloromethane-Degrading Bacterium

DOE PAGES

Chen, Gao; Murdoch, Robert W.; Mack, E. Erin; ...

2017-09-14

Dehalobacterium formicoaceticum utilizes dichloromethane as the sole energy source in defined anoxic bicarbonate-buffered mineral salt medium. The products are formate, acetate, inorganic chloride, and biomass. The bacterium’s genome was sequenced using PacBio, assembled, and annotated. The complete genome consists of one 3.77-Mb circular chromosome harboring 3,935 predicted protein-encoding genes.
Complete Genome Sequence of Dehalobacterium formicoaceticum Strain DMC, a Strictly Anaerobic Dichloromethane-Degrading Bacterium

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, Gao; Murdoch, Robert W.; Mack, E. Erin

Dehalobacterium formicoaceticum utilizes dichloromethane as the sole energy source in defined anoxic bicarbonate-buffered mineral salt medium. The products are formate, acetate, inorganic chloride, and biomass. The bacterium’s genome was sequenced using PacBio, assembled, and annotated. The complete genome consists of one 3.77-Mb circular chromosome harboring 3,935 predicted protein-encoding genes.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Castelle, Cindy; Wrighton, Kelly C.; Thomas, Brian C.

Domain Archaea is currently represented by one phylum (Euryarchaeota) and two superphyla (TACK and DPANN). However, gene surveys indicate the existence of a vast diversity of uncultivated archaea for which metabolic information is lacking. We sequenced DNA from complex sediment- and groundwater-associated microbial communities sampled prior to and during an acetate biostimulation field experiment to investigate the diversity and physiology of uncultivated subsurface archaea. We sampled 15 genomes that improve resolution of a new phylum within the TACK superphylum and 119 DPANN genomes that highlight a major subdivision within the archaeal domain that separates DPANN from TACK/Euryarchaeota lineages. Within themore » DPANN superphylum, which lacks any isolated representatives, we defined two new phyla using sequences from 100 newly sampled genomes. The first new phylum, for which we propose the name Woesearchaeota, was defined using 54 new sequences. We reconstructed a complete (finished) genome for an archaeon from this phylum that is only 0.8 Mb in length and lacks almost all core biosynthetic pathways, but has genes encoding enzymes predicted to interact with bacterial cell walls, consistent with a symbiotic lifestyle. The second new phylum, for which we propose the name Pacearchaeota, was defined based on 46 newly sampled archaeal genomes. This phylum includes the first non-methanogen with an intermediate Type II/III RuBisCO. We also reconstructed a complete (1.24 Mb) genome for another DPANN archaeon, a member of the Diapherotrites phylum. Metabolic prediction and transcriptomic data indicate that this organism has a fermentation-based lifestyle. In fact, genomic analyses consistently indicate lack of recognizable pathways for sulfur, nitrogen, methane, oxygen, and metal cycling, and suggest that symbiotic and fermentation-based lifestyles are widespread across the DPANN superphylum. Thus, as for a recently identified superphylum of bacteria with small genomes and no cultivated representatives, the biogeochemical impacts of this major radiation of archaea are primarily through anaerobic carbon and hydrogen cycling.« less
Whole-Genome and Epigenomic Landscapes of Etiologically Distinct Subtypes of Cholangiocarcinoma.

PubMed

Jusakul, Apinya; Cutcutache, Ioana; Yong, Chern Han; Lim, Jing Quan; Huang, Mi Ni; Padmanabhan, Nisha; Nellore, Vishwa; Kongpetch, Sarinya; Ng, Alvin Wei Tian; Ng, Ley Moy; Choo, Su Pin; Myint, Swe Swe; Thanan, Raynoo; Nagarajan, Sanjanaa; Lim, Weng Khong; Ng, Cedric Chuan Young; Boot, Arnoud; Liu, Mo; Ong, Choon Kiat; Rajasegaran, Vikneswari; Lie, Stefanus; Lim, Alvin Soon Tiong; Lim, Tse Hui; Tan, Jing; Loh, Jia Liang; McPherson, John R; Khuntikeo, Narong; Bhudhisawasdi, Vajaraphongsa; Yongvanit, Puangrat; Wongkham, Sopit; Totoki, Yasushi; Nakamura, Hiromi; Arai, Yasuhito; Yamasaki, Satoshi; Chow, Pierce Kah-Hoe; Chung, Alexander Yaw Fui; Ooi, London Lucien Peng Jin; Lim, Kiat Hon; Dima, Simona; Duda, Dan G; Popescu, Irinel; Broet, Philippe; Hsieh, Sen-Yung; Yu, Ming-Chin; Scarpa, Aldo; Lai, Jiaming; Luo, Di-Xian; Carvalho, André Lopes; Vettore, André Luiz; Rhee, Hyungjin; Park, Young Nyun; Alexandrov, Ludmil B; Gordân, Raluca; Rozen, Steven G; Shibata, Tatsuhiro; Pairojkul, Chawalit; Teh, Bin Tean; Tan, Patrick

2017-10-01

Cholangiocarcinoma (CCA) is a hepatobiliary malignancy exhibiting high incidence in countries with endemic liver-fluke infection. We analyzed 489 CCAs from 10 countries, combining whole-genome (71 cases), targeted/exome, copy-number, gene expression, and DNA methylation information. Integrative clustering defined 4 CCA clusters-fluke-positive CCAs (clusters 1/2) are enriched in ERBB2 amplifications and TP53 mutations; conversely, fluke-negative CCAs (clusters 3/4) exhibit high copy-number alterations and PD-1 / PD-L2 expression, or epigenetic mutations ( IDH1/2, BAP1 ) and FGFR / PRKA -related gene rearrangements. Whole-genome analysis highlighted FGFR2 3' untranslated region deletion as a mechanism of FGFR2 upregulation. Integration of noncoding promoter mutations with protein-DNA binding profiles demonstrates pervasive modulation of H3K27me3-associated sites in CCA. Clusters 1 and 4 exhibit distinct DNA hypermethylation patterns targeting either CpG islands or shores-mutation signature and subclonality analysis suggests that these reflect different mutational pathways. Our results exemplify how genetics, epigenetics, and environmental carcinogens can interplay across different geographies to generate distinct molecular subtypes of cancer. Significance: Integrated whole-genome and epigenomic analysis of CCA on an international scale identifies new CCA driver genes, noncoding promoter mutations, and structural variants. CCA molecular landscapes differ radically by etiology, underscoring how distinct cancer subtypes in the same organ may arise through different extrinsic and intrinsic carcinogenic processes. Cancer Discov; 7(10); 1116-35. ©2017 AACR. This article is highlighted in the In This Issue feature, p. 1047 . ©2017 American Association for Cancer Research.
Genome-wide identification and evolutionary analysis of algal LPAT genes involved in TAG biosynthesis using bioinformatic approaches.

PubMed

Misra, Namrata; Panda, Prasanna Kumar; Parida, Bikram Kumar

2014-12-01

Lysophosphatidyl acyltransferase (LPAT) is one of the major triacylglycerol synthesis enzymes, controlling the metabolic flow of lysophosphatidic acid to phosphatidic acid. Experimental studies in Arabidopsis have shown that LPAT activity is exhibited primarily by three distinct isoforms, namely the plastid-located LPAT1, the endoplasmic reticulum-located LPAT2, and the soluble isoform of LPAT (solLPAT). In this study, 24 putative genes representing all LPAT isoforms were identified from the analysis of 11 complete genomes including green algae, red algae, diatoms and higher plants. We observed LPAT1 and solLPAT genes to be ubiquitously present in nearly all genomes examined, whereas LPAT2 genes to have evolved more recently in the plant lineage. Phylogenetic analysis indicated that LPAT1, LPAT2 and solLPAT have convergently evolved through separate evolutionary paths and belong to three different gene families, which was further evidenced by their wide divergence at gene structure and sequence level. The genome distribution supports the hypothesis that each gene encoding a LPAT is not duplicated. Mapping of exon-intron structure of LPAT genes to the domain structure of proteins across different algal and plant species indicates that exon shuffling plays no role in the evolution of LPAT genes. Besides the previously defined motifs, several conserved consensus sequences were discovered which could be useful to distinguish different LPAT isoforms. Taken together, this study will enable the generation of experimental approximations to better understand the functional role of algal LPAT in lipid accumulation.
Variability and Global Distribution of Subgenotypes of Bovine Viral Diarrhea Virus.

PubMed

Yeşilbağ, Kadir; Alpay, Gizem; Becher, Paul

2017-05-26

Bovine viral diarrhea virus (BVDV) is a globally-distributed agent responsible for numerous clinical syndromes that lead to major economic losses. Two species, BVDV-1 and BVDV-2, discriminated on the basis of genetic and antigenic differences, are classified in the genus Pestivirus within the Flaviviridae family and distributed on all of the continents. BVDV-1 can be segregated into at least twenty-one subgenotypes (1a-1u), while four subgenotypes have been described for BVDV-2 (2a-2d). With respect to published sequences, the number of virus isolates described for BVDV-1 (88.2%) is considerably higher than for BVDV-2 (11.8%). The most frequently-reported BVDV-1 subgenotype are 1b, followed by 1a and 1c. The highest number of various BVDV subgenotypes has been documented in European countries, indicating greater genetic diversity of the virus on this continent. Current segregation of BVDV field isolates and the designation of subgenotypes are not harmonized. While the species BVDV-1 and BVDV-2 can be clearly differentiated independently from the portion of the genome being compared, analysis of different genomic regions can result in inconsistent assignment of some BVDV isolates to defined subgenotypes. To avoid non-conformities the authors recommend the development of a harmonized system for subdivision of BVDV isolates into defined subgenotypes.

Sequence Diversity, Intersubgroup Relationships, and Origins of the Mouse Leukemia Gammaretroviruses of Laboratory and Wild Mice.

PubMed

Bamunusinghe, Devinka; Naghashfar, Zohreh; Buckler-White, Alicia; Plishka, Ronald; Baliji, Surendranath; Liu, Qingping; Kassner, Joshua; Oler, Andrew J; Hartley, Janet; Kozak, Christine A

2016-04-01

Mouse leukemia viruses (MLVs) are found in the common inbred strains of laboratory mice and in the house mouse subspecies ofMus musculus Receptor usage and envelope (env) sequence variation define three MLV host range subgroups in laboratory mice: ecotropic, polytropic, and xenotropic MLVs (E-, P-, and X-MLVs, respectively). These exogenous MLVs derive from endogenous retroviruses (ERVs) that were acquired by the wild mouse progenitors of laboratory mice about 1 million years ago. We analyzed the genomes of seven MLVs isolated from Eurasian and American wild mice and three previously sequenced MLVs to describe their relationships and identify their possible ERV progenitors. The phylogenetic tree based on the receptor-determining regions ofenvproduced expected host range clusters, but these clusters are not maintained in trees generated from other virus regions. Colinear alignments of the viral genomes identified segmental homologies to ERVs of different host range subgroups. Six MLVs show close relationships to a small xenotropic ERV subgroup largely confined to the inbred mouse Y chromosome.envvariations define three E-MLV subtypes, one of which carries duplications of various sizes, sequences, and locations in the proline-rich region ofenv Outside theenvregion, all E-MLVs are related to different nonecotropic MLVs. These results document the diversity in gammaretroviruses isolated from globally distributedMussubspecies, provide insight into their origins and relationships, and indicate that recombination has had an important role in the evolution of these mutagenic and pathogenic agents. Laboratory mice carry mouse leukemia viruses (MLVs) of three host range groups which were acquired from their wild mouse progenitors. We sequenced the complete genomes of seven infectious MLVs isolated from geographically separated Eurasian and American wild mice and compared them with endogenous germ line retroviruses (ERVs) acquired early in house mouse evolution. We did this because the laboratory mouse viruses derive directly from specific ERVs or arise by recombination between different ERVs. The six distinctively different wild mouse viruses appear to be recombinants, often involving different host range subgroups, and most are related to a distinctive, largely Y-chromosome-linked MLV ERV subtype. MLVs with ecotropic host ranges show the greatest variability with extensive inter- and intrasubtype envelope differences and with homologies to other host range subgroups outside the envelope. The sequence diversity among these wild mouse isolates helps define their relationships and origins and emphasizes the importance of recombination in their evolution. Copyright © 2016, American Society for Microbiology. All Rights Reserved.
Knowledge boosting: a graph-based integration approach with multi-omics data and genomic knowledge for cancer clinical outcome prediction

PubMed Central

Kim, Dokyoon; Joung, Je-Gun; Sohn, Kyung-Ah; Shin, Hyunjung; Park, Yu Rang; Ritchie, Marylyn D; Kim, Ju Han

2015-01-01

Objective Cancer can involve gene dysregulation via multiple mechanisms, so no single level of genomic data fully elucidates tumor behavior due to the presence of numerous genomic variations within or between levels in a biological system. We have previously proposed a graph-based integration approach that combines multi-omics data including copy number alteration, methylation, miRNA, and gene expression data for predicting clinical outcome in cancer. However, genomic features likely interact with other genomic features in complex signaling or regulatory networks, since cancer is caused by alterations in pathways or complete processes. Methods Here we propose a new graph-based framework for integrating multi-omics data and genomic knowledge to improve power in predicting clinical outcomes and elucidate interplay between different levels. To highlight the validity of our proposed framework, we used an ovarian cancer dataset from The Cancer Genome Atlas for predicting stage, grade, and survival outcomes. Results Integrating multi-omics data with genomic knowledge to construct pre-defined features resulted in higher performance in clinical outcome prediction and higher stability. For the grade outcome, the model with gene expression data produced an area under the receiver operating characteristic curve (AUC) of 0.7866. However, models of the integration with pathway, Gene Ontology, chromosomal gene set, and motif gene set consistently outperformed the model with genomic data only, attaining AUCs of 0.7873, 0.8433, 0.8254, and 0.8179, respectively. Conclusions Integrating multi-omics data and genomic knowledge to improve understanding of molecular pathogenesis and underlying biology in cancer should improve diagnostic and prognostic indicators and the effectiveness of therapies. PMID:25002459
Knowledge boosting: a graph-based integration approach with multi-omics data and genomic knowledge for cancer clinical outcome prediction.

PubMed

Kim, Dokyoon; Joung, Je-Gun; Sohn, Kyung-Ah; Shin, Hyunjung; Park, Yu Rang; Ritchie, Marylyn D; Kim, Ju Han

2015-01-01

Cancer can involve gene dysregulation via multiple mechanisms, so no single level of genomic data fully elucidates tumor behavior due to the presence of numerous genomic variations within or between levels in a biological system. We have previously proposed a graph-based integration approach that combines multi-omics data including copy number alteration, methylation, miRNA, and gene expression data for predicting clinical outcome in cancer. However, genomic features likely interact with other genomic features in complex signaling or regulatory networks, since cancer is caused by alterations in pathways or complete processes. Here we propose a new graph-based framework for integrating multi-omics data and genomic knowledge to improve power in predicting clinical outcomes and elucidate interplay between different levels. To highlight the validity of our proposed framework, we used an ovarian cancer dataset from The Cancer Genome Atlas for predicting stage, grade, and survival outcomes. Integrating multi-omics data with genomic knowledge to construct pre-defined features resulted in higher performance in clinical outcome prediction and higher stability. For the grade outcome, the model with gene expression data produced an area under the receiver operating characteristic curve (AUC) of 0.7866. However, models of the integration with pathway, Gene Ontology, chromosomal gene set, and motif gene set consistently outperformed the model with genomic data only, attaining AUCs of 0.7873, 0.8433, 0.8254, and 0.8179, respectively. Integrating multi-omics data and genomic knowledge to improve understanding of molecular pathogenesis and underlying biology in cancer should improve diagnostic and prognostic indicators and the effectiveness of therapies. © The Author 2014. Published by Oxford University Press on behalf of the American Medical Informatics Association.
Cryptic Distant Relatives Are Common in Both Isolated and Cosmopolitan Genetic Samples

PubMed Central

Macpherson, J. Michael; Eriksson, Nick; Saxonov, Serge; Pe'er, Itsik; Mountain, Joanna L.

2012-01-01

Although a few hundred single nucleotide polymorphisms (SNPs) suffice to infer close familial relationships, high density genome-wide SNP data make possible the inference of more distant relationships such as 2nd to 9th cousinships. In order to characterize the relationship between genetic similarity and degree of kinship given a timeframe of 100–300 years, we analyzed the sharing of DNA inferred to be identical by descent (IBD) in a subset of individuals from the 23andMe customer database (n = 22,757) and from the Human Genome Diversity Panel (HGDP-CEPH, n = 952). With data from 121 populations, we show that the average amount of DNA shared IBD in most ethnolinguistically-defined populations, for example Native American groups, Finns and Ashkenazi Jews, differs from continentally-defined populations by several orders of magnitude. Via extensive pedigree-based simulations, we determined bounds for predicted degrees of relationship given the amount of genomic IBD sharing in both endogamous and ‘unrelated’ population samples. Using these bounds as a guide, we detected tens of thousands of 2nd to 9th degree cousin pairs within a heterogenous set of 5,000 Europeans. The ubiquity of distant relatives, detected via IBD segments, in both ethnolinguistic populations and in large ‘unrelated’ populations samples has important implications for genetic genealogy, forensics and genotype/phenotype mapping studies. PMID:22509285
Back to the Origin

PubMed Central

Evertts, Adam G.

2012-01-01

In bacteria, replication is a carefully orchestrated event that unfolds the same way for each bacterium and each cell division. The process of DNA replication in bacteria optimizes cell growth and coordinates high levels of simultaneous replication and transcription. In metazoans, the organization of replication is more enigmatic. The lack of a specific sequence that defines origins of replication has, until recently, severely limited our ability to define the organizing principles of DNA replication. This question is of particular importance as emerging data suggest that replication stress is an important contributor to inherited genetic damage and the genomic instability in tumors. We consider here the replication program in several different organisms including recent genome-wide analyses of replication origins in humans. We review recent studies on the role of cytosine methylation in replication origins, the role of transcriptional looping and gene gating in DNA replication, and the role of chromatin’s 3-dimensional structure in DNA replication. We use these new findings to consider several questions surrounding DNA replication in metazoans: How are origins selected? What is the relationship between replication and transcription? How do checkpoints inhibit origin firing? Why are there early and late firing origins? We then discuss whether oncogenes promote cancer through a role in DNA replication and whether errors in DNA replication are important contributors to the genomic alterations and gene fusion events observed in cancer. We conclude with some important areas for future experimentation. PMID:23634256
Meta-analysis of genome-wide linkage studies in BMI and obesity.

PubMed

Saunders, Catherine L; Chiodini, Benedetta D; Sham, Pak; Lewis, Cathryn M; Abkevich, Victor; Adeyemo, Adebowale A; de Andrade, Mariza; Arya, Rector; Berenson, Gerald S; Blangero, John; Boehnke, Michael; Borecki, Ingrid B; Chagnon, Yvon C; Chen, Wei; Comuzzie, Anthony G; Deng, Hong-Wen; Duggirala, Ravindranath; Feitosa, Mary F; Froguel, Philippe; Hanson, Robert L; Hebebrand, Johannes; Huezo-Dias, Patricia; Kissebah, Ahmed H; Li, Weidong; Luke, Amy; Martin, Lisa J; Nash, Matthew; Ohman, Miina; Palmer, Lyle J; Peltonen, Leena; Perola, Markus; Price, R Arlen; Redline, Susan; Srinivasan, Sathanur R; Stern, Michael P; Stone, Steven; Stringham, Heather; Turner, Stephen; Wijmenga, Cisca; Collier, David A

2007-09-01

The objective was to provide an overall assessment of genetic linkage data of BMI and BMI-defined obesity using a nonparametric genome scan meta-analysis. We identified 37 published studies containing data on over 31,000 individuals from more than >10,000 families and obtained genome-wide logarithm of the odds (LOD) scores, non-parametric linkage (NPL) scores, or maximum likelihood scores (MLS). BMI was analyzed in a pooled set of all studies, as a subgroup of 10 studies that used BMI-defined obesity, and for subgroups ascertained through type 2 diabetes, hypertension, or subjects of European ancestry. Bins at chromosome 13q13.2- q33.1, 12q23-q24.3 achieved suggestive evidence of linkage to BMI in the pooled analysis and samples ascertained for hypertension. Nominal evidence of linkage to these regions and suggestive evidence for 11q13.3-22.3 were also observed for BMI-defined obesity. The FTO obesity gene locus at 16q12.2 also showed nominal evidence for linkage. However, overall distribution of summed rank p values <0.05 is not different from that expected by chance. The strongest evidence was obtained in the families ascertained for hypertension at 9q31.1-qter and 12p11.21-q23 (p < 0.01). Despite having substantial statistical power, we did not unequivocally implicate specific loci for BMI or obesity. This may be because genes influencing adiposity are of very small effect, with substantial genetic heterogeneity and variable dependence on environmental factors. However, the observation that the FTO gene maps to one of the highest ranking bins for obesity is interesting and, while not a validation of this approach, indicates that other potential loci identified in this study should be investigated further.
Genetic fine-mapping and genomic annotation defines causal mechanisms at type 2 diabetes susceptibility loci

PubMed Central

Mahajan, Anubha; Locke, Adam; Rayner, N William; Robertson, Neil; Scott, Robert A; Prokopenko, Inga; Scott, Laura J; Green, Todd; Sparso, Thomas; Thuillier, Dorothee; Yengo, Loic; Grallert, Harald; Wahl, Simone; Frånberg, Mattias; Strawbridge, Rona J; Kestler, Hans; Chheda, Himanshu; Eisele, Lewin; Gustafsson, Stefan; Steinthorsdottir, Valgerdur; Thorleifsson, Gudmar; Qi, Lu; Karssen, Lennart C; van Leeuwen, Elisabeth M; Willems, Sara M; Li, Man; Chen, Han; Fuchsberger, Christian; Kwan, Phoenix; Ma, Clement; Linderman, Michael; Lu, Yingchang; Thomsen, Soren K; Rundle, Jana K; Beer, Nicola L; van de Bunt, Martijn; Chalisey, Anil; Kang, Hyun Min; Voight, Benjamin F; Abecasis, Goncalo R; Almgren, Peter; Baldassarre, Damiano; Balkau, Beverley; Benediktsson, Rafn; Blüher, Matthias; Boeing, Heiner; Bonnycastle, Lori L; Borringer, Erwin P; Burtt, Noël P; Carey, Jason; Charpentier, Guillaume; Chines, Peter S; Cornelis, Marilyn C; Couper, David J; Crenshaw, Andrew T; van Dam, Rob M; Doney, Alex SF; Dorkhan, Mozhgan; Edkins, Sarah; Eriksson, Johan G; Esko, Tonu; Eury, Elodie; Fadista, João; Flannick, Jason; Fontanillas, Pierre; Fox, Caroline; Franks, Paul W; Gertow, Karl; Gieger, Christian; Gigante, Bruna; Gottesman, Omri; Grant, George B; Grarup, Niels; Groves, Christopher J; Hassinen, Maija; Have, Christian T; Herder, Christian; Holmen, Oddgeir L; Hreidarsson, Astradur B; Humphries, Steve E; Hunter, David J; Jackson, Anne U; Jonsson, Anna; Jørgensen, Marit E; Jørgensen, Torben; Kerrison, Nicola D; Kinnunen, Leena; Klopp, Norman; Kong, Augustine; Kovacs, Peter; Kraft, Peter; Kravic, Jasmina; Langford, Cordelia; Leander, Karin; Liang, Liming; Lichtner, Peter; Lindgren, Cecilia M; Lindholm, Eero; Linneberg, Allan; Liu, Ching-Ti; Lobbens, Stéphane; Luan, Jian’an; Lyssenko, Valeriya; Männistö, Satu; McLeod, Olga; Meyer, Julia; Mihailov, Evelin; Mirza, Ghazala; Mühleisen, Thomas W; Müller-Nurasyid, Martina; Navarro, Carmen; Nöthen, Markus M; Oskolkov, Nikolay N; Owen, Katharine R; Palli, Domenico; Pechlivanis, Sonali; Perry, John RB; Platou, Carl GP; Roden, Michael; Ruderfer, Douglas; Rybin, Denis; van der Schouw, Yvonne T; Sennblad, Bengt; Sigurðsson, Gunnar; Stančáková, Alena; Steinbach, Gerald; Storm, Petter; Strauch, Konstantin; Stringham, Heather M; Sun, Qi; Thorand, Barbara; Tikkanen, Emmi; Tonjes, Anke; Trakalo, Joseph; Tremoli, Elena; Tuomi, Tiinamaija; Wennauer, Roman; Wood, Andrew R; Zeggini, Eleftheria; Dunham, Ian; Birney, Ewan; Pasquali, Lorenzo; Ferrer, Jorge; Loos, Ruth JF; Dupuis, Josée; Florez, Jose C; Boerwinkle, Eric; Pankow, James S; van Duijn, Cornelia; Sijbrands, Eric; Meigs, James B; Hu, Frank B; Thorsteinsdottir, Unnur; Stefansson, Kari; Lakka, Timo A; Rauramaa, Rainer; Stumvoll, Michael; Pedersen, Nancy L; Lind, Lars; Keinanen-Kiukaanniemi, Sirkka M; Korpi-Hyövälti, Eeva; Saaristo, Timo E; Saltevo, Juha; Kuusisto, Johanna; Laakso, Markku; Metspalu, Andres; Erbel, Raimund; Jöckel, Karl-Heinz; Moebus, Susanne; Ripatti, Samuli; Salomaa, Veikko; Ingelsson, Erik; Boehm, Bernhard O; Bergman, Richard N; Collins, Francis S; Mohlke, Karen L; Koistinen, Heikki; Tuomilehto, Jaakko; Hveem, Kristian; Njølstad, Inger; Deloukas, Panagiotis; Donnelly, Peter J; Frayling, Timothy M; Hattersley, Andrew T; de Faire, Ulf; Hamsten, Anders; Illig, Thomas; Peters, Annette; Cauchi, Stephane; Sladek, Rob; Froguel, Philippe; Hansen, Torben; Pedersen, Oluf; Morris, Andrew D; Palmer, Collin NA; Kathiresan, Sekar; Melander, Olle; Nilsson, Peter M; Groop, Leif C; Barroso, Inês; Langenberg, Claudia; Wareham, Nicholas J; O’Callaghan, Christopher A; Gloyn, Anna L; Altshuler, David; Boehnke, Michael; Teslovich, Tanya M; McCarthy, Mark I; Morris, Andrew P

2015-01-01

We performed fine-mapping of 39 established type 2 diabetes (T2D) loci in 27,206 cases and 57,574 controls of European ancestry. We identified 49 distinct association signals at these loci, including five mapping in/near KCNQ1. “Credible sets” of variants most likely to drive each distinct signal mapped predominantly to non-coding sequence, implying that T2D association is mediated through gene regulation. Credible set variants were enriched for overlap with FOXA2 chromatin immunoprecipitation binding sites in human islet and liver cells, including at MTNR1B, where fine-mapping implicated rs10830963 as driving T2D association. We confirmed that this T2D-risk allele increases FOXA2-bound enhancer activity in islet- and liver-derived cells. We observed allele-specific differences in NEUROD1 binding in islet-derived cells, consistent with evidence that the T2D-risk allele increases islet MTNR1B expression. Our study demonstrates how integration of genetic and genomic information can define molecular mechanisms through which variants underlying association signals exert their effects on disease. PMID:26551672
Whole genome sequences of two octogenarians with sustained cognitive abilities

PubMed Central

Nickles, Dorothee; Madireddy, Lohith; Patel, Nihar; Isobe, Noriko; Miller, Bruce L.; Baranzini, Sergio E.; Kramer, Joel H.; Oksenberg, Jorge R.

2014-01-01

Although numerous genetic variants affecting aging and mortality have been identified, e.g. APOE ε4, the genetic component influencing cognitive aging has not been fully defined yet. A better knowledge of the genetics of aging will prove helpful in understanding the underlying biological processes. Here, we describe the whole genome sequences of two female octogenarians. We provide the repertoire of genomic variants that the two octogenarians have in common. We also describe the overlap with the previously reported genomes of two supercentenarians - individuals aged ≥ 110 years. We assessed the genetic disease propensities of the octogenarians and non-aged control genomes and could not find support for the hypothesis that long-lived healthy individuals might exhibit greater genetic fitness than the general population. Furthermore, there is no evidence for an accumulation of previously described variants promoting longevity in the two octogenarians. These findings suggest that genetic fitness, as currently defined, is not the sole factor enabling an increased lifespan. We identified a number of healthy-cognitive-aging candidate genetic loci awaiting confirmation in larger studies. PMID:25618617
Whole genome sequences of 2 octogenarians with sustained cognitive abilities.

PubMed

Nickles, Dorothee; Madireddy, Lohith; Patel, Nihar; Isobe, Noriko; Miller, Bruce L; Baranzini, Sergio E; Kramer, Joel H; Oksenberg, Jorge R

2015-03-01

Although numerous genetic variants affecting aging and mortality have been identified, for example, apolipoprotein E ε4, the genetic component influencing cognitive aging has not been fully defined yet. A better knowledge of the genetics of aging will prove helpful in understanding the underlying biological processes. Here, we describe the whole genome sequences of 2 female octogenarians. We provide the repertoire of genomic variants that the 2 octogenarians have in common. We also describe the overlap with the previously reported genomes of 2 supercentenarians—individuals aged ≥110 years. We assessed the genetic disease propensities of the octogenarians and non-aged control genomes and could not find support for the hypothesis that long-lived healthy individuals might exhibit greater genetic fitness than the general population. Furthermore, there is no evidence for an accumulation of previously described variants promoting longevity in the 2 octogenarians. These findings suggest that genetic fitness, as currently defined, is not the sole factor enabling an increased life span. We identified a number of healthy-cognitive-aging candidate genetic loci awaiting confirmation in larger studies. Copyright © 2015 Elsevier Inc. All rights reserved.
The importance of detailed epigenomic profiling of different cell types within organs.

PubMed

Stueve, Theresa Ryan; Marconett, Crystal N; Zhou, Beiyun; Borok, Zea; Laird-Offringa, Ite A

2016-06-01

The human body consists of hundreds of kinds of cells specified from a single genome overlaid with cell type-specific epigenetic information. Comprehensively profiling the body's distinct epigenetic landscapes will allow researchers to verify cell types used in regenerative medicine and to determine the epigenetic effects of disease, environmental exposures and genetic variation. Key marks/factors that should be investigated include regions of nucleosome-free DNA accessible to regulatory factors, histone marks defining active enhancers and promoters, DNA methylation levels, regulatory RNAs, and factors controlling the three-dimensional conformation of the genome. Here we use the lung to illustrate the importance of investigating an organ's purified cell epigenomes, and outline the challenges and promise of realizing a comprehensive catalog of primary cell epigenomes.
Decoding the genome beyond sequencing: the new phase of genomic research.

PubMed

Heng, Henry H Q; Liu, Guo; Stevens, Joshua B; Bremer, Steven W; Ye, Karen J; Abdallah, Batoul Y; Horne, Steven D; Ye, Christine J

2011-10-01

While our understanding of gene-based biology has greatly improved, it is clear that the function of the genome and most diseases cannot be fully explained by genes and other regulatory elements. Genes and the genome represent distinct levels of genetic organization with their own coding systems; Genes code parts like protein and RNA, but the genome codes the structure of genetic networks, which are defined by the whole set of genes, chromosomes and their topological interactions within a cell. Accordingly, the genetic code of DNA offers limited understanding of genome functions. In this perspective, we introduce the genome theory which calls for the departure of gene-centric genomic research. To make this transition for the next phase of genomic research, it is essential to acknowledge the importance of new genome-based biological concepts and to establish new technology platforms to decode the genome beyond sequencing. Copyright © 2011 Elsevier Inc. All rights reserved.
Analysis of the Genome Structure of the Nonpathogenic Probiotic Escherichia coli Strain Nissle 1917

PubMed Central

Grozdanov, Lubomir; Raasch, Carsten; Schulze, Jürgen; Sonnenborn, Ulrich; Gottschalk, Gerhard; Hacker, Jörg; Dobrindt, Ulrich

2004-01-01

Nonpathogenic Escherichia coli strain Nissle 1917 (O6:K5:H1) is used as a probiotic agent in medicine, mainly for the treatment of various gastroenterological diseases. To gain insight on the genetic level into its properties of colonization and commensalism, this strain's genome structure has been analyzed by three approaches: (i) sequence context screening of tRNA genes as a potential indication of chromosomal integration of horizontally acquired DNA, (ii) sequence analysis of 280 kb of genomic islands (GEIs) coding for important fitness factors, and (iii) comparison of Nissle 1917 genome content with that of other E. coli strains by DNA-DNA hybridization. PCR-based screening of 324 nonpathogenic and pathogenic E. coli isolates of different origins revealed that some chromosomal regions are frequently detectable in nonpathogenic E. coli and also among extraintestinal and intestinal pathogenic strains. Many known fitness factor determinants of strain Nissle 1917 are localized on four GEIs which have been partially sequenced and analyzed. Comparison of these data with the available knowledge of the genome structure of E. coli K-12 strain MG1655 and of uropathogenic E. coli O6 strains CFT073 and 536 revealed structural similarities on the genomic level, especially between the E. coli O6 strains. The lack of defined virulence factors (i.e., alpha-hemolysin, P-fimbrial adhesins, and the semirough lipopolysaccharide phenotype) combined with the expression of fitness factors such as microcins, different iron uptake systems, adhesins, and proteases, which may support its survival and successful colonization of the human gut, most likely contributes to the probiotic character of E. coli strain Nissle 1917. PMID:15292145
Whole-genome relationships among Francisella bacteria of diverse origins define new species and provide specific regions for detection

DOE PAGES

Challacombe, Jean Faust; Petersen, Jeannine M.; Gallegos-Graves, La Verne A.; ...

2016-11-23

Francisella tularensis is a highly virulent zoonotic pathogen that causes tularemia and, because of weaponization efforts in past world wars, is considered a tier 1 biothreat agent. Detection and surveillance of F. tularensis may be confounded by the presence of uncharacterized, closely related organisms. Through DNA-based diagnostics and environmental surveys, novel clinical and environmental Francisella isolates have been obtained in recent years. Here we present 7 new Francisella genomes and a comparison of their characteristics to each other and to 24 publicly available genomes as well as a comparative analysis of 16S rRNA and sdhA genes from over 90 Francisellamore » strains. Delineation of new species in bacteria is challenging, especially when isolates having very close genomic characteristics exhibit different physiological features—for example, when some are virulent pathogens in humans and animals while others are nonpathogenic or are opportunistic pathogens. Species resolution within Francisella varies with analyses of single genes, multiple gene or protein sets, or whole-genome comparisons of nucleic acid and amino acid sequences. Analyses focusing on single genes (16S rRNA, sdhA), multiple gene sets (virulence genes, lipopolysaccharide [LPS] biosynthesis genes, pathogenicity island), and whole-genome comparisons (nucleotide and protein) gave congruent results, but with different levels of discrimination confidence. We designate four new species within the genus; Francisella opportunistica sp. nov. (MA06-7296), Francisella salina sp. nov. (TX07-7308), Francisella uliginis sp. nov. (TX07-7310), and Francisella frigiditurris sp. nov. (CA97-1460). Lastly, this study provides a robust comparative framework to discern species and virulence features of newly detected Francisella bacteria.« less
Genomic diversity and adaptation of Salmonella enterica serovar Typhimurium from analysis of six genomes of different phage types

PubMed Central

2013-01-01

Background Salmonella enterica serovar Typhimurium (or simply Typhimurium) is the most common serovar in both human infections and farm animals in Australia and many other countries. Typhimurium is a broad host range serovar but has also evolved into host-adapted variants (i.e. isolated from a particular host such as pigeons). Six Typhimurium strains of different phage types (defined by patterns of susceptibility to lysis by a set of bacteriophages) were analysed using Illumina high-throughput genome sequencing. Results Variations between strains were mainly due to single nucleotide polymorphisms (SNPs) with an average of 611 SNPs per strain, ranging from 391 SNPs to 922 SNPs. There were seven insertions/deletions (indels) involving whole or partial gene deletions, four inactivation events due to IS200 insertion and 15 pseudogenes due to early termination. Four of these inactivated or deleted genes may be virulence related. Nine prophage or prophage remnants were identified in the six strains. Gifsy-1, Gifsy-2 and the sopE2 and sspH2 phage remnants were present in all six genomes while Fels-1, Fels-2, ST64B, ST104 and CP4-57 were variably present. Four strains carried the 90-kb plasmid pSLT which contains several known virulence genes. However, two strains were found to lack the plasmid. In addition, one strain had a novel plasmid similar to Typhi strain CT18 plasmid pHCM2. Conclusion The genome data suggest that variations between strains were mainly due to accumulation of SNPs, some of which resulted in gene inactivation. Unique genetic elements that were common between host-adapted phage types were not found. This study advanced our understanding on the evolution and adaptation of Typhimurium at genomic level. PMID:24138507
Whole-genome relationships among Francisella bacteria of diverse origins define new species and provide specific regions for detection

DOE Office of Scientific and Technical Information (OSTI.GOV)

Challacombe, Jean Faust; Petersen, Jeannine M.; Gallegos-Graves, La Verne A.

Francisella tularensis is a highly virulent zoonotic pathogen that causes tularemia and, because of weaponization efforts in past world wars, is considered a tier 1 biothreat agent. Detection and surveillance of F. tularensis may be confounded by the presence of uncharacterized, closely related organisms. Through DNA-based diagnostics and environmental surveys, novel clinical and environmental Francisella isolates have been obtained in recent years. Here we present 7 new Francisella genomes and a comparison of their characteristics to each other and to 24 publicly available genomes as well as a comparative analysis of 16S rRNA and sdhA genes from over 90 Francisellamore » strains. Delineation of new species in bacteria is challenging, especially when isolates having very close genomic characteristics exhibit different physiological features—for example, when some are virulent pathogens in humans and animals while others are nonpathogenic or are opportunistic pathogens. Species resolution within Francisella varies with analyses of single genes, multiple gene or protein sets, or whole-genome comparisons of nucleic acid and amino acid sequences. Analyses focusing on single genes (16S rRNA, sdhA), multiple gene sets (virulence genes, lipopolysaccharide [LPS] biosynthesis genes, pathogenicity island), and whole-genome comparisons (nucleotide and protein) gave congruent results, but with different levels of discrimination confidence. We designate four new species within the genus; Francisella opportunistica sp. nov. (MA06-7296), Francisella salina sp. nov. (TX07-7308), Francisella uliginis sp. nov. (TX07-7310), and Francisella frigiditurris sp. nov. (CA97-1460). Lastly, this study provides a robust comparative framework to discern species and virulence features of newly detected Francisella bacteria.« less
Draft Genome Sequences of Human Pathogenic Fungus Geomyces pannorum Sensu Lato and Bat White Nose Syndrome Pathogen Geomyces (Pseudogymnoascus) destructans.

PubMed

Chibucos, Marcus C; Crabtree, Jonathan; Nagaraj, Sushma; Chaturvedi, Sudha; Chaturvedi, Vishnu

2013-12-19

We report the draft genome sequences of Geomyces pannorum sensu lato and Geomyces (Pseudogymnoascus) destructans. G. pannorum has a larger proteome than G. destructans, containing more proteins with ascribed enzymatic functions. This dichotomy in the genomes of related psychrophilic fungi is a valuable target for defining their distinct saprobic and pathogenic attributes.
Defining Linkages between the GSC and NSF's LTER Program: How the Ecological Metadata Language (EML) Relates to GCDML and Other Outcomes

Treesearch

Inigo San Gil; Wade Sheldon; Tom Schmidt; Mark Servilla; Raul Aguilar; Corinna Gries; Tanya Gray; Dawn Field; James Cole; Jerry Yun Pan; Giri Palanisamy; Donald Henshaw; Margaret O' Brien; Linda Kinkel; Kathrine McMahon; Renzo Kottmann; Linda Amaral-Zettler; John Hobbie; Philip Goldstein; Robert P. Guralnick; James Brunt; William K. Michener

2008-01-01

The Genomic Standards Consortium (GSC) invited a representative of the Long-Term Ecological Research (LTER) to its fifth workshop to present the Ecological Metadata Language (EML) metadata standard and its relationship to the Minimum Information about a Genome/Metagenome Sequence (MIGS/MIMS) and its implementation, the Genomic Contextual Data Markup Language (GCDML)....
CGDV: a webtool for circular visualization of genomics and transcriptomics data.

PubMed

Jha, Vineet; Singh, Gulzar; Kumar, Shiva; Sonawane, Amol; Jere, Abhay; Anamika, Krishanpal

2017-10-24

Interpretation of large-scale data is very challenging and currently there is scarcity of web tools which support automated visualization of a variety of high throughput genomics and transcriptomics data and for a wide variety of model organisms along with user defined karyotypes. Circular plot provides holistic visualization of high throughput large scale data but it is very complex and challenging to generate as most of the available tools need informatics expertise to install and run them. We have developed CGDV (Circos for Genomics and Transcriptomics Data Visualization), a webtool based on Circos, for seamless and automated visualization of a variety of large scale genomics and transcriptomics data. CGDV takes output of analyzed genomics or transcriptomics data of different formats, such as vcf, bed, xls, tab limited matrix text file, CNVnator raw output and Gene fusion raw output, to plot circular view of the sample data. CGDV take cares of generating intermediate files required for circos. CGDV is freely available at https://cgdv-upload.persistent.co.in/cgdv/ . The circular plot for each data type is tailored to gain best biological insights into the data. The inter-relationship between data points, homologous sequences, genes involved in fusion events, differential expression pattern, sequencing depth, types and size of variations and enrichment of DNA binding proteins can be seen using CGDV. CGDV thus helps biologists and bioinformaticians to visualize a variety of genomics and transcriptomics data seamlessly.
The interaction between cytosine methylation and processes of DNA replication and repair shape the mutational landscape of cancer genomes

PubMed Central

Poulos, Rebecca C.

2017-01-01

Abstract Methylated cytosines (5mCs) are frequently mutated in the genome. However, no studies have yet comprehensively analysed mutation–methylation associations across cancer types. Here we analyse 916 cancer genomes, together with tissue type-specific methylation and replication timing data. We describe a strong mutation–methylation association across colorectal cancer subtypes, most interestingly in samples with microsatellite instability (MSI) or Polymerase epsilon (POLE) exonuclease domain mutations. By analysing genomic regions with differential mismatch repair (MMR) efficiency, we suggest a possible role for MMR in the correction of 5mC deamination events, potentially accounting for the high rate of 5mC mutation accumulation in MSI tumours. Additionally, we propose that mutant POLE asserts a mutator phenotype specifically at 5mCs, and we find coding mutation hotspots in POLE-mutant cancers at highly-methylated CpGs in the tumour-suppressor genes APC and TP53. Finally, using multivariable regression models, we demonstrate that different cancers exhibit distinct mutation–methylation associations, with DNA repair influencing such associations in certain cancer genomes. Taken together, we find differential associations with methylation that are vital for accurately predicting expected mutation loads across cancer types. Our findings reveal links between methylation and common mutation and repair processes, with these mechanisms defining a key part of the mutational landscape of cancer genomes. PMID:28531315
Three invariant Hi-C interaction patterns: Applications to genome assembly.

PubMed

Oddes, Sivan; Zelig, Aviv; Kaplan, Noam

2018-06-01

Assembly of reference-quality genomes from next-generation sequencing data is a key challenge in genomics. Recently, we and others have shown that Hi-C data can be used to address several outstanding challenges in the field of genome assembly. This principle has since been developed in academia and industry, and has been used in the assembly of several major genomes. In this paper, we explore the central principles underlying Hi-C-based assembly approaches, by quantitatively defining and characterizing three invariant Hi-C interaction patterns on which these approaches can build: Intrachromosomal interaction enrichment, distance-dependent interaction decay and local interaction smoothness. Specifically, we evaluate to what degree each invariant pattern holds on a single locus level in different species, cell types and Hi-C map resolutions. We find that these patterns are generally consistent across species and cell types but are affected by sequencing depth, and that matrix balancing improves consistency of loci with all three invariant patterns. Finally, we overview current Hi-C-based assembly approaches in light of these invariant patterns and demonstrate how local interaction smoothness can be used to easily detect scaffolding errors in extremely sparse Hi-C maps. We suggest that simultaneously considering all three invariant patterns may lead to better Hi-C-based genome assembly methods. Copyright © 2018 Elsevier Inc. All rights reserved.

Genomes by design

PubMed Central

Haimovich, Adrian D.; Muir, Paul; Isaacs, Farren J.

2016-01-01

Next-generation DNA sequencing has revealed the complete genome sequences of numerous organisms, establishing a fundamental and growing understanding of genetic variation and phenotypic diversity. Engineering at the gene, network and whole-genome scale aims to introduce targeted genetic changes both to explore emergent phenotypes and to introduce new functionalities. Expansion of these approaches into massively parallel platforms establishes the ability to generate targeted genome modifications, elucidating causal links between genotype and phenotype, as well as the ability to design and reprogramme organisms. In this Review, we explore techniques and applications in genome engineering, outlining key advances and defining challenges. PMID:26260262
National human genome projects: an update and an agenda.

PubMed

An, Joon Yong

2017-01-01

Population genetic and human genetic studies are being accelerated with genome technology and data sharing. Accordingly, in the past 10 years, several countries have initiated genetic research using genome technology and identified the genetic architecture of the ethnic groups living in the corresponding country or suggested the genetic foundation of a social phenomenon. Genetic research has been conducted from epidemiological studies that previously described the health or disease conditions in defined population. This perspective summarizes national genome projects conducted in the past 10 years and introduces case studies to utilize genomic data in genetic research.
Origin of amphibian and avian chromosomes by fission, fusion, and retention of ancestral chromosomes

PubMed Central

Voss, Stephen R.; Kump, D. Kevin; Putta, Srikrishna; Pauly, Nathan; Reynolds, Anna; Henry, Rema J.; Basa, Saritha; Walker, John A.; Smith, Jeramiah J.

2011-01-01

Amphibian genomes differ greatly in DNA content and chromosome size, morphology, and number. Investigations of this diversity are needed to identify mechanisms that have shaped the evolution of vertebrate genomes. We used comparative mapping to investigate the organization of genes in the Mexican axolotl (Ambystoma mexicanum), a species that presents relatively few chromosomes (n = 14) and a gigantic genome (>20 pg/N). We show extensive conservation of synteny between Ambystoma, chicken, and human, and a positive correlation between the length of conserved segments and genome size. Ambystoma segments are estimated to be four to 51 times longer than homologous human and chicken segments. Strikingly, genes demarking the structures of 28 chicken chromosomes are ordered among linkage groups defining the Ambystoma genome, and we show that these same chromosomal segments are also conserved in a distantly related anuran amphibian (Xenopus tropicalis). Using linkage relationships from the amphibian maps, we predict that three chicken chromosomes originated by fusion, nine to 14 originated by fission, and 12–17 evolved directly from ancestral tetrapod chromosomes. We further show that some ancestral segments were fused prior to the divergence of salamanders and anurans, while others fused independently and randomly as chromosome numbers were reduced in lineages leading to Ambystoma and Xenopus. The maintenance of gene order relationships between chromosomal segments that have greatly expanded and contracted in salamander and chicken genomes, respectively, suggests selection to maintain synteny relationships and/or extremely low rates of chromosomal rearrangement. Overall, the results demonstrate the value of data from diverse, amphibian genomes in studies of vertebrate genome evolution. PMID:21482624
Understanding the Broad Influence of Sex Hormones and Sex Differences in the Brain

PubMed Central

McEwen, Bruce S.; Milner, Teresa A.

2016-01-01

Sex hormones act throughout the entire brain of both males and females via both genomic and non-genomic receptors. Sex hormones can act through many cellular and molecular processes that alter structure and function of neural systems and influence behavior as well as providing neuroprotection. Within neurons, sex hormone receptors are found in nuclei and are also located near membranes where they are associated with presynaptic terminals, mitochondria, spine apparatus, post-synaptic densities. Sex hormone receptors also are found in glial cells. Hormonal regulation of a variety of signaling pathways as well as direct and indirect effects upon gene expression induce spine synapses, up- or down-regulate and alter the distribution of neurotransmitter receptors, regulate neuropeptide expression and cholinergic and GABAergic activity as well as calcium sequestration and oxidative stress. Many neural and behavioral functions are affected, including mood, cognitive function, blood pressure regulation, motor coordination, pain and opioid sensitivity. Subtle sex differences exist for many of these functions that are developmentally programmed by hormones and by not-yet-precisely-defined genetic factors including the mitochondrial genome. These sex differences and responses to sex hormones in brain regions, and upon functions not previously regarded as subject to such differences, indicates that we are entering a new era of our ability to understand and appreciate the diversity of gender-related behaviors and brain functions. PMID:27870427
Karyotype and genome size of Iberochondrostoma almacai (Teleostei, Cyprinidae) and comparison with the sister-species I.lusitanicum

PubMed Central

2009-01-01

This study aimed to define the karyotype of the recently described Iberian endemic Iberochondrostoma almacai, to revisit the previously documented chromosome polymorphisms of its sister species I.lusitanicum using C-, Ag-/CMA3 and RE-banding, and to compare the two species genome sizes. A 2n = 50 karyotype (with the exception of a triploid I.lusitanicum specimen) and a corresponding haploid chromosome formula of 7M:15SM:3A (FN = 94) were found. Multiple NORs were observed in both species (in two submetacentric chromosome pairs, one of them clearly homologous) and a higher intra and interpopulational variability was evidenced in I.lusitanicum. Flow cytometry measurements of nuclear DNA content showed some significant differences in genome size both between and within species: the genome of I. almacai was smaller than that of I.lusitanicum (mean values 2.61 and 2.93 pg, respectively), which presented a clear interpopulational variability (mean values ranging from 2.72 to 3.00 pg). These data allowed the distinction of both taxa and confirmed the existence of two well differentiated groups within I. lusitanicum: one that includes the populations from the right bank of the Tejo and Samarra drainages, and another that reunites the southern populations. The peculiar differences between the two species, presently listed as “Critically Endangered”, reinforced the importance of this study for future conservation plans. PMID:21637679
Structure of the Rift Valley fever virus nucleocapsid protein reveals another architecture for RNA encapsidation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Raymond, Donald D.; Piper, Mary E.; Gerrard, Sonja R.

2010-07-13

Rift Valley fever virus (RVFV) is a negative-sense RNA virus (genus Phlebovirus, family Bunyaviridae) that infects livestock and humans and is endemic to sub-Saharan Africa. Like all negative-sense viruses, the segmented RNA genome of RVFV is encapsidated by a nucleocapsid protein (N). The 1.93-{angstrom} crystal structure of RVFV N and electron micrographs of ribonucleoprotein (RNP) reveal an encapsidated genome of substantially different organization than in other negative-sense RNA virus families. The RNP polymer, viewed in electron micrographs of both virus RNP and RNP reconstituted from purified N with a defined RNA, has an extended structure without helical symmetry. N-RNA speciesmore » of {approx}100-kDa apparent molecular weight and heterogeneous composition were obtained by exhaustive ribonuclease treatment of virus RNP, by recombinant expression of N, and by reconstitution from purified N and an RNA oligomer. RNA-free N, obtained by denaturation and refolding, has a novel all-helical fold that is compact and well ordered at both the N and C termini. Unlike N of other negative-sense RNA viruses, RVFV N has no positively charged surface cleft for RNA binding and no protruding termini or loops to stabilize a defined N-RNA oligomer or RNP helix. A potential protein interaction site was identified in a conserved hydrophobic pocket. The nonhelical appearance of phlebovirus RNP, the heterogeneous {approx}100-kDa N-RNA multimer, and the N fold differ substantially from the RNP and N of other negative-sense RNA virus families and provide valuable insights into the structure of the encapsidated phlebovirus genome.« less
Robust Inference of Genetic Exchange Communities from Microbial Genomes Using TF-IDF.

PubMed

Cong, Yingnan; Chan, Yao-Ban; Phillips, Charles A; Langston, Michael A; Ragan, Mark A

2017-01-01

Bacteria and archaea can exchange genetic material across lineages through processes of lateral genetic transfer (LGT). Collectively, these exchange relationships can be modeled as a network and analyzed using concepts from graph theory. In particular, densely connected regions within an LGT network have been defined as genetic exchange communities (GECs). However, it has been problematic to construct networks in which edges solely represent LGT. Here we apply term frequency-inverse document frequency (TF-IDF), an alignment-free method originating from document analysis, to infer regions of lateral origin in bacterial genomes. We examine four empirical datasets of different size (number of genomes) and phyletic breadth, varying a key parameter (word length k ) within bounds established in previous work. We map the inferred lateral regions to genes in recipient genomes, and construct networks in which the nodes are groups of genomes, and the edges natively represent LGT. We then extract maximum and maximal cliques (i.e., GECs) from these graphs, and identify nodes that belong to GECs across a wide range of k . Most surviving lateral transfer has happened within these GECs. Using Gene Ontology enrichment tests we demonstrate that biological processes associated with metabolism, regulation and transport are often over-represented among the genes affected by LGT within these communities. These enrichments are largely robust to change of k .
GenoQuery: a new querying module for functional annotation in a genomic warehouse

PubMed Central

Lemoine, Frédéric; Labedan, Bernard; Froidevaux, Christine

2008-01-01

Motivation: We have to cope with both a deluge of new genome sequences and a huge amount of data produced by high-throughput approaches used to exploit these genomic features. Crossing and comparing such heterogeneous and disparate data will help improving functional annotation of genomes. This requires designing elaborate integration systems such as warehouses for storing and querying these data. Results: We have designed a relational genomic warehouse with an original multi-layer architecture made of a databases layer and an entities layer. We describe a new querying module, GenoQuery, which is based on this architecture. We use the entities layer to define mixed queries. These mixed queries allow searching for instances of biological entities and their properties in the different databases, without specifying in which database they should be found. Accordingly, we further introduce the central notion of alternative queries. Such queries have the same meaning as the original mixed queries, while exploiting complementarities yielded by the various integrated databases of the warehouse. We explain how GenoQuery computes all the alternative queries of a given mixed query. We illustrate how useful this querying module is by means of a thorough example. Availability: http://www.lri.fr/~lemoine/GenoQuery/ Contact: chris@lri.fr, lemoine@lri.fr PMID:18586731
Human centromere genomics: now it's personal.

PubMed

Hayden, Karen E

2012-07-01

Advances in human genomics have accelerated studies in evolution, disease, and cellular regulation. However, centromere sequences, defining the chromosomal interface with spindle microtubules, remain largely absent from ongoing genomic studies and disconnected from functional, genome-wide analyses. This disparity results from the challenge of predicting the linear order of multi-megabase-sized regions that are composed almost entirely of near-identical satellite DNA. Acknowledging these challenges, the field of human centromere genomics possesses the potential to rapidly advance given the availability of individual, or personalized, genome projects matched with the promise of long-read sequencing technologies. Here I review the current genomic model of human centromeres in consideration of those studies involving functional datasets that examine the role of sequence in centromere identity.
Overview: The Impact of Microbial Genomics on Food Safety

NASA Astrophysics Data System (ADS)

Milillo, Sara R.; Wiedmann, Martin; Hoelzer, Karin

The first use of the term "genome" is attributed to Hans Winkler in his 1920 publication Verbeitung und Ursache der Parthenogenesis im Pflanzen und Tierreiche (Winkler, 1920). However, it was not until 1986 that the study of genomic concepts coalesced with the creation of a new journal by the same name (McKusick, 1997). The study of genomics was initially defined as the use or the application of "informatic tools" to study features of a sequenced genome (Strauss and Falkow, 1997). Today the field of genomics is typically considered to encompass efforts to determine the nucleic acid DNA sequence of an organism as well as the expression of genetic information using high-throughput, genome-wide methods, including transcriptomic, proteomic, and metabolomic analyses.
Microeconomic principles explain an optimal genome size in bacteria.

PubMed

Ranea, Juan A G; Grant, Alastair; Thornton, Janet M; Orengo, Christine A

2005-01-01

Bacteria can clearly enhance their survival by expanding their genetic repertoire. However, the tight packing of the bacterial genome and the fact that the most evolved species do not necessarily have the biggest genomes suggest there are other evolutionary factors limiting their genome expansion. To clarify these restrictions on size, we studied those protein families contributing most significantly to bacterial-genome complexity. We found that all bacteria apply the same basic and ancestral 'molecular technology' to optimize their reproductive efficiency. The same microeconomics principles that define the optimum size in a factory can also explain the existence of a statistical optimum in bacterial genome size. This optimum is reached when the bacterial genome obtains the maximum metabolic complexity (revenue) for minimal regulatory genes (logistic cost).
Translational Implications of Tumor Heterogeneity

PubMed Central

Jamal-Hanjani, Mariam; Quezada, Sergio A.; Larkin, James; Swanton, Charles

2015-01-01

Advances in next-generation sequencing and bioinformatics have led to an unprecedented view of the cancer genome and its evolution. Genomic studies have demonstrated the complex and heterogeneous clonal landscape of tumors of different origins, and the potential impact of intratumor heterogeneity on treatment response and resistance, cancer progression and the risk of disease relapse. However, the significance of subclonal mutations, in particular mutations in driver genes, and their evolution through time and their dynamics in response to cancer therapies, is yet to be determined. The necessary tools are now available to prospectively determine whether clonal heterogeneity can be used as a biomarker of clinical outcome, and to what extent subclonal somatic alterations might influence clinical outcome. Studies that employ longitudinal tissue sampling, integrating both genomic and clinical data, have the potential to reveal the subclonal composition and track the evolution of tumors in order to address these questions, and to begin to define the breadth of genetic diversity in different tumor types, and its relevance to patient outcome. Such studies may provide further evidence for novel drug resistance mechanisms informing novel combinatorial, adaptive and tumour immune-therapies placed within the context of tumor evolution. PMID:25770293
Epigenomics in cancer management

PubMed Central

Costa, Fabricio F

2010-01-01

The identification of all epigenetic modifications implicated in gene expression is the next step for a better understanding of human biology in both normal and pathological states. This field is referred to as epigenomics, and it is defined as epigenetic changes (ie, DNA methylation, histone modifications and regulation by noncoding RNAs such as microRNAs) on a genomic scale rather than a single gene. Epigenetics modulate the structure of the chromatin, thereby affecting the transcription of genes in the genome. Different studies have already identified changes in epigenetic modifications in a few genes in specific pathways in cancers. Based on these epigenetic changes, drugs against different types of tumors were developed, which mainly target epimutations in the genome. Examples include DNA methylation inhibitors, histone modification inhibitors, and small molecules that target chromatin-remodeling proteins. However, these drugs are not specific, and side effects are a major problem; therefore, new DNA sequencing technologies combined with epigenomic tools have the potential to identify novel biomarkers and better molecular targets to treat cancers. The purpose of this review is to discuss current and emerging epigenomic tools and to address how these new technologies may impact the future of cancer management. PMID:21188117
Co-occurring genomic alterations define major subsets of KRAS - mutant lung adenocarcinoma with distinct biology, immune profiles, and therapeutic vulnerabilities

PubMed Central

Skoulidis, Ferdinandos; Byers, Lauren A.; Diao, Lixia; Papadimitrakopoulou, Vassiliki A.; Tong, Pan; Izzo, Julie; Behrens, Carmen; Kadara, Humam; Parra, Edwin R.; Canales, Jaime Rodriguez; Zhang, Jianjun; Giri, Uma; Gudikote, Jayanthi; Cortez, Maria A.; Yang, Chao; Fan, You Hong; Peyton, Michael; Girard, Luc; Coombes, Kevin R.; Toniatti, Carlo; Heffernan, Timothy P.; Choi, Murim; Frampton, Garrett M.; Miller, Vincent; Weinstein, John N.; Herbst, Roy S.; Wong, Kwok-Kin; Zhang, Jianhua; Sharma, Padmanee; Mills, Gordon B.; Hong, Waun K.; Minna, John D.; Allison, James P.; Futreal, Andrew; Wang, Jing; Wistuba, Ignacio I.; Heymach, John V.

2015-01-01

The molecular underpinnings that drive the heterogeneity of KRAS-mutant lung adenocarcinoma (LUAC) are poorly characterized. We performed an integrative analysis of genomic, transcriptomic and proteomic data from early-stage and chemo-refractory LUAC and identified three robust subsets of KRAS-mutant LUAC dominated, respectively, by co-occurring genetic events in STK11/LKB1 (the KL subgroup), TP53 (KP) and CDKN2A/B inactivation coupled with low expression of the NKX2-1 (TTF1) transcription factor (KC). We further reveal biologically and therapeutically relevant differences between the subgroups. KC tumors frequently exhibited mucinous histology and suppressed mTORC1 signaling. KL tumors had high rates of KEAP1 mutational inactivation and expressed lower levels of immune markers, including PD-L1. KP tumors demonstrated higher levels of somatic mutations, inflammatory markers, immune checkpoint effector molecules and improved relapse-free survival. Differences in drug sensitivity patterns were also observed; notably, KL cells showed increased vulnerability to HSP90-inhibitor therapy. This work provides evidence that co-occurring genomic alterations identify subgroups of KRAS-mutant LUAC with distinct biology and therapeutic vulnerabilities. PMID:26069186
Whole-Genome and Epigenomic Landscapes of Etiologically Distinct Subtypes of Cholangiocarcinoma

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jusakul, Apinya; Cutcutache, Ioana; Yong, Chern Han

Cholangiocarcinoma (CCA) is a hepatobiliary malignancy exhibiting high incidence in countries with endemic liver-fluke infection. We analysed 489 CCAs from 10 countries, combining whole-genome (71 cases), targeted/exome, copy-number, gene expression, and DNA methylation information. Integrative clustering defined four CCA clusters - Fluke- Positive CCAs (Clusters 1/2) are enriched in ERBB2 amplifications and TP53 mutations, conversely Fluke-Negative CCAs (Clusters 3/4) exhibit high copy-number alterations and PD-1/PD-L2 expression, or epigenetic mutations (IDH1/2, BAP1) and FGFR/PRKA-related gene rearrangements. Whole-genome analysis highlighted FGFR2 3’UTR deletion as a mechanism of FGFR2 upregulation. Integration of non-coding promoter mutations with protein-DNA binding profiles demonstrates pervasive modulation ofmore » H3K27me3-associated sites in CCA. Clusters 1 and 4 exhibit distinct DNA hypermethylation patterns targeting either CpG islands or shores - mutation signature and subclonality analysis suggests that these reflect different mutational pathways. Lastly, our results exemplify how genetics, epigenetics and environmental carcinogens can interplay across different geographies to generate distinct molecular subtypes of cancer.« less
Whole-Genome and Epigenomic Landscapes of Etiologically Distinct Subtypes of Cholangiocarcinoma

DOE PAGES

Jusakul, Apinya; Cutcutache, Ioana; Yong, Chern Han; ...

2017-06-30

Cholangiocarcinoma (CCA) is a hepatobiliary malignancy exhibiting high incidence in countries with endemic liver-fluke infection. We analysed 489 CCAs from 10 countries, combining whole-genome (71 cases), targeted/exome, copy-number, gene expression, and DNA methylation information. Integrative clustering defined four CCA clusters - Fluke- Positive CCAs (Clusters 1/2) are enriched in ERBB2 amplifications and TP53 mutations, conversely Fluke-Negative CCAs (Clusters 3/4) exhibit high copy-number alterations and PD-1/PD-L2 expression, or epigenetic mutations (IDH1/2, BAP1) and FGFR/PRKA-related gene rearrangements. Whole-genome analysis highlighted FGFR2 3’UTR deletion as a mechanism of FGFR2 upregulation. Integration of non-coding promoter mutations with protein-DNA binding profiles demonstrates pervasive modulation ofmore » H3K27me3-associated sites in CCA. Clusters 1 and 4 exhibit distinct DNA hypermethylation patterns targeting either CpG islands or shores - mutation signature and subclonality analysis suggests that these reflect different mutational pathways. Lastly, our results exemplify how genetics, epigenetics and environmental carcinogens can interplay across different geographies to generate distinct molecular subtypes of cancer.« less
A decision tool to guide the ethics review of a challenging breed of emerging genomic projects.

PubMed

Joly, Yann; So, Derek; Osien, Gladys; Crimi, Laura; Bobrow, Martin; Chalmers, Don; Wallace, Susan E; Zeps, Nikolajs; Knoppers, Bartha

2016-08-01

Recent projects conducted by the International Cancer Genome Consortium (ICGC) have raised the important issue of distinguishing quality assurance (QA) activities from research in the context of genomics. Research was historically defined as a systematic effort to expand a shared body of knowledge, whereas QA was defined as an effort to ascertain whether a specific project met desired standards. However, the two categories increasingly overlap due to advances in bioinformatics and the shift toward open science. As few ethics review policies take these changes into account, it is often difficult to determine the appropriate level of review. Mislabeling can result in unnecessary burdens for the investigators or, conversely, in underestimation of the risks to participants. Therefore, it is important to develop a consistent method of selecting the review process for genomics and bioinformatics projects. This paper begins by discussing two case studies from the ICGC, followed by a literature review on the distinction between QA and research and a comparative analysis of ethics review policies from Canada, the United States, the United Kingdom, and Australia. These results are synthesized into a novel two-step decision tool for researchers and policymakers, which uses traditional criteria to sort clearly defined activities while requiring the use of actual risk levels to decide more complex cases.
The Mitochondrial Genome Sequence and Molecular Phylogeny of the Turkey, Meleagris gallopavo

PubMed Central

Guan, Xiaojing; Silva, Pradeepa; Gyenai, Kwaku B.; Xu, Jun; Geng, Tuoyu; Tu, Zhijian; Samuels, David C.; Smith, Edward J.

2009-01-01

Summary The mitochondrial genome (mtGenome) has been very little studied in the turkey (Meleagris gallopavo), for which there is no publicly available whole genome mitochondrial sequence. Here, we used PCR-based methods with 19 pairs of primers designed from the chicken and other species to develop a complete turkey mtGenome sequence. A total length of 16, 717 bp of the whole turkey mtGenome was obtained, with 85% similarity to chicken mtGenome. There were 13 genes and 24 RNA (22 tRNA and 2 rRNA) annotated. The mtGenome-based phylogenetic analysis suggests that the turkey is most closely related to the chicken, Gallus gallus, and quail, Corturnix japonica. Given the importance of the mitochondria genome, the present work adds to the growing genomic resources needed to define the genetic mechanisms that underlie some economic traits in the turkey. PMID:19067672
Genomic characterization of a Helicobacter pylori isolate from a patient with gastric cancer in China

PubMed Central

2014-01-01

Background Helicobacter pylori is well known for its relationship with the occurrence of several severe gastric diseases. The mechanisms of pathogenesis triggered by H. pylori are less well known. In this study, we report the genome sequence and genomic characterizations of H. pylori strain HLJ039 that was isolated from a patient with gastric cancer in the Chinese province of Heilongjiang, where there is a high incidence of gastric cancer. To investigate potential genomic features that may be involved in pathogenesis of carcinoma, the genome was compared to three previously sequenced genomes in this area. Result We obtained 42 contigs with a total length of 1,611,192 bp and predicted 1,687 coding sequences. Compared to strains isolated from gastritis and ulcers in this area, 10 different regions were identified as being unique for HLJ039; they mainly encoded type II restriction-modification enzyme, type II m6A methylase, DNA-cytosine methyltransferase, DNA methylase, and hypothetical proteins. A unique 547-bp fragment sharing 93% identity with a hypothetical protein of Helicobacter cinaedi ATCC BAA-847 was not present in any other previous H. pylori strains. Phylogenetic analysis based on core genome single nucleotide polymorphisms shows that HLJ039 is defined as hspEAsia subgroup, which belongs to the hpEastAsia group. Conclusion DNA methylations, variations of the genomic regions involved in restriction and modification systems, are the “hot” regions that may be related to the mechanism of H. pylori-induced gastric cancer. The genome sequence will provide useful information for the deep mining of potential mechanisms related to East Asian gastric cancer. PMID:24565107
Comparative Metabolomics of Mycoplasma bovis and Mycoplasma gallisepticum Reveals Fundamental Differences in Active Metabolic Pathways and Suggests Novel Gene Annotations.

PubMed

Masukagami, Y; De Souza, D P; Dayalan, S; Bowen, C; O'Callaghan, S; Kouremenos, K; Nijagal, B; Tull, D; Tivendale, K A; Markham, P F; McConville, M J; Browning, G F; Sansom, F M

2017-01-01

Mycoplasmas are simple, but successful parasites that have the smallest genome of any free-living cell and are thought to have a highly streamlined cellular metabolism. Here, we have undertaken a detailed metabolomic analysis of two species, Mycoplasma bovis and Mycoplasma gallisepticum , which cause economically important diseases in cattle and poultry, respectively. Untargeted gas chromatography-mass spectrometry and liquid chromatography-mass spectrometry analyses of mycoplasma metabolite extracts revealed significant differences in the steady-state levels of many metabolites in central carbon metabolism, while 13 C stable isotope labeling studies revealed marked differences in carbon source utilization. These data were mapped onto in silico metabolic networks predicted from genome wide annotations. The analyses elucidated distinct differences, including a clear difference in glucose utilization, with a marked decrease in glucose uptake and glycolysis in M. bovis compared to M. gallisepticum , which may reflect differing host nutrient availabilities. The 13 C-labeling patterns also revealed several functional metabolic pathways that were previously unannotated in these species, allowing us to assign putative enzyme functions to the products of a number of genes of unknown function, especially in M. bovis . This study demonstrates the considerable potential of metabolomic analyses to assist in characterizing significant differences in the metabolism of different bacterial species and in improving genome annotation. IMPORTANCE Mycoplasmas are pathogenic bacteria that cause serious chronic infections in production animals, resulting in considerable losses worldwide, as well as causing disease in humans. These bacteria have extremely reduced genomes and are thought to have limited metabolic flexibility, even though they are highly successful persistent parasites in a diverse number of species. The extent to which different Mycoplasma species are capable of catabolizing host carbon sources and nutrients, or synthesizing essential metabolites, remains poorly defined. We have used advanced metabolomic techniques to identify metabolic pathways that are active in two species of Mycoplasma that infect distinct hosts (poultry and cattle). We show that these species exhibit marked differences in metabolite steady-state levels and carbon source utilization. This information has been used to functionally characterize previously unknown genes in the genomes of these pathogens. These species-specific differences are likely to reflect important differences in host nutrient levels and pathogenic mechanisms.

Defining the role of common variation in the genomic and biological architecture of adult human height.

PubMed

Wood, Andrew R; Esko, Tonu; Yang, Jian; Vedantam, Sailaja; Pers, Tune H; Gustafsson, Stefan; Chu, Audrey Y; Estrada, Karol; Luan, Jian'an; Kutalik, Zoltán; Amin, Najaf; Buchkovich, Martin L; Croteau-Chonka, Damien C; Day, Felix R; Duan, Yanan; Fall, Tove; Fehrmann, Rudolf; Ferreira, Teresa; Jackson, Anne U; Karjalainen, Juha; Lo, Ken Sin; Locke, Adam E; Mägi, Reedik; Mihailov, Evelin; Porcu, Eleonora; Randall, Joshua C; Scherag, André; Vinkhuyzen, Anna A E; Westra, Harm-Jan; Winkler, Thomas W; Workalemahu, Tsegaselassie; Zhao, Jing Hua; Absher, Devin; Albrecht, Eva; Anderson, Denise; Baron, Jeffrey; Beekman, Marian; Demirkan, Ayse; Ehret, Georg B; Feenstra, Bjarke; Feitosa, Mary F; Fischer, Krista; Fraser, Ross M; Goel, Anuj; Gong, Jian; Justice, Anne E; Kanoni, Stavroula; Kleber, Marcus E; Kristiansson, Kati; Lim, Unhee; Lotay, Vaneet; Lui, Julian C; Mangino, Massimo; Mateo Leach, Irene; Medina-Gomez, Carolina; Nalls, Michael A; Nyholt, Dale R; Palmer, Cameron D; Pasko, Dorota; Pechlivanis, Sonali; Prokopenko, Inga; Ried, Janina S; Ripke, Stephan; Shungin, Dmitry; Stancáková, Alena; Strawbridge, Rona J; Sung, Yun Ju; Tanaka, Toshiko; Teumer, Alexander; Trompet, Stella; van der Laan, Sander W; van Setten, Jessica; Van Vliet-Ostaptchouk, Jana V; Wang, Zhaoming; Yengo, Loïc; Zhang, Weihua; Afzal, Uzma; Arnlöv, Johan; Arscott, Gillian M; Bandinelli, Stefania; Barrett, Amy; Bellis, Claire; Bennett, Amanda J; Berne, Christian; Blüher, Matthias; Bolton, Jennifer L; Böttcher, Yvonne; Boyd, Heather A; Bruinenberg, Marcel; Buckley, Brendan M; Buyske, Steven; Caspersen, Ida H; Chines, Peter S; Clarke, Robert; Claudi-Boehm, Simone; Cooper, Matthew; Daw, E Warwick; De Jong, Pim A; Deelen, Joris; Delgado, Graciela; Denny, Josh C; Dhonukshe-Rutten, Rosalie; Dimitriou, Maria; Doney, Alex S F; Dörr, Marcus; Eklund, Niina; Eury, Elodie; Folkersen, Lasse; Garcia, Melissa E; Geller, Frank; Giedraitis, Vilmantas; Go, Alan S; Grallert, Harald; Grammer, Tanja B; Gräßler, Jürgen; Grönberg, Henrik; de Groot, Lisette C P G M; Groves, Christopher J; Haessler, Jeffrey; Hall, Per; Haller, Toomas; Hallmans, Goran; Hannemann, Anke; Hartman, Catharina A; Hassinen, Maija; Hayward, Caroline; Heard-Costa, Nancy L; Helmer, Quinta; Hemani, Gibran; Henders, Anjali K; Hillege, Hans L; Hlatky, Mark A; Hoffmann, Wolfgang; Hoffmann, Per; Holmen, Oddgeir; Houwing-Duistermaat, Jeanine J; Illig, Thomas; Isaacs, Aaron; James, Alan L; Jeff, Janina; Johansen, Berit; Johansson, Åsa; Jolley, Jennifer; Juliusdottir, Thorhildur; Junttila, Juhani; Kho, Abel N; Kinnunen, Leena; Klopp, Norman; Kocher, Thomas; Kratzer, Wolfgang; Lichtner, Peter; Lind, Lars; Lindström, Jaana; Lobbens, Stéphane; Lorentzon, Mattias; Lu, Yingchang; Lyssenko, Valeriya; Magnusson, Patrik K E; Mahajan, Anubha; Maillard, Marc; McArdle, Wendy L; McKenzie, Colin A; McLachlan, Stela; McLaren, Paul J; Menni, Cristina; Merger, Sigrun; Milani, Lili; Moayyeri, Alireza; Monda, Keri L; Morken, Mario A; Müller, Gabriele; Müller-Nurasyid, Martina; Musk, Arthur W; Narisu, Narisu; Nauck, Matthias; Nolte, Ilja M; Nöthen, Markus M; Oozageer, Laticia; Pilz, Stefan; Rayner, Nigel W; Renstrom, Frida; Robertson, Neil R; Rose, Lynda M; Roussel, Ronan; Sanna, Serena; Scharnagl, Hubert; Scholtens, Salome; Schumacher, Fredrick R; Schunkert, Heribert; Scott, Robert A; Sehmi, Joban; Seufferlein, Thomas; Shi, Jianxin; Silventoinen, Karri; Smit, Johannes H; Smith, Albert Vernon; Smolonska, Joanna; Stanton, Alice V; Stirrups, Kathleen; Stott, David J; Stringham, Heather M; Sundström, Johan; Swertz, Morris A; Syvänen, Ann-Christine; Tayo, Bamidele O; Thorleifsson, Gudmar; Tyrer, Jonathan P; van Dijk, Suzanne; van Schoor, Natasja M; van der Velde, Nathalie; van Heemst, Diana; van Oort, Floor V A; Vermeulen, Sita H; Verweij, Niek; Vonk, Judith M; Waite, Lindsay L; Waldenberger, Melanie; Wennauer, Roman; Wilkens, Lynne R; Willenborg, Christina; Wilsgaard, Tom; Wojczynski, Mary K; Wong, Andrew; Wright, Alan F; Zhang, Qunyuan; Arveiler, Dominique; Bakker, Stephan J L; Beilby, John; Bergman, Richard N; Bergmann, Sven; Biffar, Reiner; Blangero, John; Boomsma, Dorret I; Bornstein, Stefan R; Bovet, Pascal; Brambilla, Paolo; Brown, Morris J; Campbell, Harry; Caulfield, Mark J; Chakravarti, Aravinda; Collins, Rory; Collins, Francis S; Crawford, Dana C; Cupples, L Adrienne; Danesh, John; de Faire, Ulf; den Ruijter, Hester M; Erbel, Raimund; Erdmann, Jeanette; Eriksson, Johan G; Farrall, Martin; Ferrannini, Ele; Ferrières, Jean; Ford, Ian; Forouhi, Nita G; Forrester, Terrence; Gansevoort, Ron T; Gejman, Pablo V; Gieger, Christian; Golay, Alain; Gottesman, Omri; Gudnason, Vilmundur; Gyllensten, Ulf; Haas, David W; Hall, Alistair S; Harris, Tamara B; Hattersley, Andrew T; Heath, Andrew C; Hengstenberg, Christian; Hicks, Andrew A; Hindorff, Lucia A; Hingorani, Aroon D; Hofman, Albert; Hovingh, G Kees; Humphries, Steve E; Hunt, Steven C; Hypponen, Elina; Jacobs, Kevin B; Jarvelin, Marjo-Riitta; Jousilahti, Pekka; Jula, Antti M; Kaprio, Jaakko; Kastelein, John J P; Kayser, Manfred; Kee, Frank; Keinanen-Kiukaanniemi, Sirkka M; Kiemeney, Lambertus A; Kooner, Jaspal S; Kooperberg, Charles; Koskinen, Seppo; Kovacs, Peter; Kraja, Aldi T; Kumari, Meena; Kuusisto, Johanna; Lakka, Timo A; Langenberg, Claudia; Le Marchand, Loic; Lehtimäki, Terho; Lupoli, Sara; Madden, Pamela A F; Männistö, Satu; Manunta, Paolo; Marette, André; Matise, Tara C; McKnight, Barbara; Meitinger, Thomas; Moll, Frans L; Montgomery, Grant W; Morris, Andrew D; Morris, Andrew P; Murray, Jeffrey C; Nelis, Mari; Ohlsson, Claes; Oldehinkel, Albertine J; Ong, Ken K; Ouwehand, Willem H; Pasterkamp, Gerard; Peters, Annette; Pramstaller, Peter P; Price, Jackie F; Qi, Lu; Raitakari, Olli T; Rankinen, Tuomo; Rao, D C; Rice, Treva K; Ritchie, Marylyn; Rudan, Igor; Salomaa, Veikko; Samani, Nilesh J; Saramies, Jouko; Sarzynski, Mark A; Schwarz, Peter E H; Sebert, Sylvain; Sever, Peter; Shuldiner, Alan R; Sinisalo, Juha; Steinthorsdottir, Valgerdur; Stolk, Ronald P; Tardif, Jean-Claude; Tönjes, Anke; Tremblay, Angelo; Tremoli, Elena; Virtamo, Jarmo; Vohl, Marie-Claude; Amouyel, Philippe; Asselbergs, Folkert W; Assimes, Themistocles L; Bochud, Murielle; Boehm, Bernhard O; Boerwinkle, Eric; Bottinger, Erwin P; Bouchard, Claude; Cauchi, Stéphane; Chambers, John C; Chanock, Stephen J; Cooper, Richard S; de Bakker, Paul I W; Dedoussis, George; Ferrucci, Luigi; Franks, Paul W; Froguel, Philippe; Groop, Leif C; Haiman, Christopher A; Hamsten, Anders; Hayes, M Geoffrey; Hui, Jennie; Hunter, David J; Hveem, Kristian; Jukema, J Wouter; Kaplan, Robert C; Kivimaki, Mika; Kuh, Diana; Laakso, Markku; Liu, Yongmei; Martin, Nicholas G; März, Winfried; Melbye, Mads; Moebus, Susanne; Munroe, Patricia B; Njølstad, Inger; Oostra, Ben A; Palmer, Colin N A; Pedersen, Nancy L; Perola, Markus; Pérusse, Louis; Peters, Ulrike; Powell, Joseph E; Power, Chris; Quertermous, Thomas; Rauramaa, Rainer; Reinmaa, Eva; Ridker, Paul M; Rivadeneira, Fernando; Rotter, Jerome I; Saaristo, Timo E; Saleheen, Danish; Schlessinger, David; Slagboom, P Eline; Snieder, Harold; Spector, Tim D; Strauch, Konstantin; Stumvoll, Michael; Tuomilehto, Jaakko; Uusitupa, Matti; van der Harst, Pim; Völzke, Henry; Walker, Mark; Wareham, Nicholas J; Watkins, Hugh; Wichmann, H-Erich; Wilson, James F; Zanen, Pieter; Deloukas, Panos; Heid, Iris M; Lindgren, Cecilia M; Mohlke, Karen L; Speliotes, Elizabeth K; Thorsteinsdottir, Unnur; Barroso, Inês; Fox, Caroline S; North, Kari E; Strachan, David P; Beckmann, Jacques S; Berndt, Sonja I; Boehnke, Michael; Borecki, Ingrid B; McCarthy, Mark I; Metspalu, Andres; Stefansson, Kari; Uitterlinden, André G; van Duijn, Cornelia M; Franke, Lude; Willer, Cristen J; Price, Alkes L; Lettre, Guillaume; Loos, Ruth J F; Weedon, Michael N; Ingelsson, Erik; O'Connell, Jeffrey R; Abecasis, Goncalo R; Chasman, Daniel I; Goddard, Michael E; Visscher, Peter M; Hirschhorn, Joel N; Frayling, Timothy M

2014-11-01

Using genome-wide data from 253,288 individuals, we identified 697 variants at genome-wide significance that together explained one-fifth of the heritability for adult height. By testing different numbers of variants in independent studies, we show that the most strongly associated ∼2,000, ∼3,700 and ∼9,500 SNPs explained ∼21%, ∼24% and ∼29% of phenotypic variance. Furthermore, all common variants together captured 60% of heritability. The 697 variants clustered in 423 loci were enriched for genes, pathways and tissue types known to be involved in growth and together implicated genes and pathways not highlighted in earlier efforts, such as signaling by fibroblast growth factors, WNT/β-catenin and chondroitin sulfate-related genes. We identified several genes and pathways not previously connected with human skeletal growth, including mTOR, osteoglycin and binding of hyaluronic acid. Our results indicate a genetic architecture for human height that is characterized by a very large but finite number (thousands) of causal variants.
Defining the role of common variation in the genomic and biological architecture of adult human height

PubMed Central

Chu, Audrey Y; Estrada, Karol; Luan, Jian’an; Kutalik, Zoltán; Amin, Najaf; Buchkovich, Martin L; Croteau-Chonka, Damien C; Day, Felix R; Duan, Yanan; Fall, Tove; Fehrmann, Rudolf; Ferreira, Teresa; Jackson, Anne U; Karjalainen, Juha; Lo, Ken Sin; Locke, Adam E; Mägi, Reedik; Mihailov, Evelin; Porcu, Eleonora; Randall, Joshua C; Scherag, André; Vinkhuyzen, Anna AE; Westra, Harm-Jan; Winkler, Thomas W; Workalemahu, Tsegaselassie; Zhao, Jing Hua; Absher, Devin; Albrecht, Eva; Anderson, Denise; Baron, Jeffrey; Beekman, Marian; Demirkan, Ayse; Ehret, Georg B; Feenstra, Bjarke; Feitosa, Mary F; Fischer, Krista; Fraser, Ross M; Goel, Anuj; Gong, Jian; Justice, Anne E; Kanoni, Stavroula; Kleber, Marcus E; Kristiansson, Kati; Lim, Unhee; Lotay, Vaneet; Lui, Julian C; Mangino, Massimo; Leach, Irene Mateo; Medina-Gomez, Carolina; Nalls, Michael A; Nyholt, Dale R; Palmer, Cameron D; Pasko, Dorota; Pechlivanis, Sonali; Prokopenko, Inga; Ried, Janina S; Ripke, Stephan; Shungin, Dmitry; Stancáková, Alena; Strawbridge, Rona J; Sung, Yun Ju; Tanaka, Toshiko; Teumer, Alexander; Trompet, Stella; van der Laan, Sander W; van Setten, Jessica; Van Vliet-Ostaptchouk, Jana V; Wang, Zhaoming; Yengo, Loïc; Zhang, Weihua; Afzal, Uzma; Ärnlöv, Johan; Arscott, Gillian M; Bandinelli, Stefania; Barrett, Amy; Bellis, Claire; Bennett, Amanda J; Berne, Christian; Blüher, Matthias; Bolton, Jennifer L; Böttcher, Yvonne; Boyd, Heather A; Bruinenberg, Marcel; Buckley, Brendan M; Buyske, Steven; Caspersen, Ida H; Chines, Peter S; Clarke, Robert; Claudi-Boehm, Simone; Cooper, Matthew; Daw, E Warwick; De Jong, Pim A; Deelen, Joris; Delgado, Graciela; Denny, Josh C; Dhonukshe-Rutten, Rosalie; Dimitriou, Maria; Doney, Alex SF; Dörr, Marcus; Eklund, Niina; Eury, Elodie; Folkersen, Lasse; Garcia, Melissa E; Geller, Frank; Giedraitis, Vilmantas; Go, Alan S; Grallert, Harald; Grammer, Tanja B; Gräßler, Jürgen; Grönberg, Henrik; de Groot, Lisette C.P.G.M.; Groves, Christopher J; Haessler, Jeffrey; Hall, Per; Haller, Toomas; Hallmans, Goran; Hannemann, Anke; Hartman, Catharina A; Hassinen, Maija; Hayward, Caroline; Heard-Costa, Nancy L; Helmer, Quinta; Hemani, Gibran; Henders, Anjali K; Hillege, Hans L; Hlatky, Mark A; Hoffmann, Wolfgang; Hoffmann, Per; Holmen, Oddgeir; Houwing-Duistermaat, Jeanine J; Illig, Thomas; Isaacs, Aaron; James, Alan L; Jeff, Janina; Johansen, Berit; Johansson, Åsa; Jolley, Jennifer; Juliusdottir, Thorhildur; Junttila, Juhani; Kho, Abel N; Kinnunen, Leena; Klopp, Norman; Kocher, Thomas; Kratzer, Wolfgang; Lichtner, Peter; Lind, Lars; Lindström, Jaana; Lobbens, Stéphane; Lorentzon, Mattias; Lu, Yingchang; Lyssenko, Valeriya; Magnusson, Patrik KE; Mahajan, Anubha; Maillard, Marc; McArdle, Wendy L; McKenzie, Colin A; McLachlan, Stela; McLaren, Paul J; Menni, Cristina; Merger, Sigrun; Milani, Lili; Moayyeri, Alireza; Monda, Keri L; Morken, Mario A; Müller, Gabriele; Müller-Nurasyid, Martina; Musk, Arthur W; Narisu, Narisu; Nauck, Matthias; Nolte, Ilja M; Nöthen, Markus M; Oozageer, Laticia; Pilz, Stefan; Rayner, Nigel W; Renstrom, Frida; Robertson, Neil R; Rose, Lynda M; Roussel, Ronan; Sanna, Serena; Scharnagl, Hubert; Scholtens, Salome; Schumacher, Fredrick R; Schunkert, Heribert; Scott, Robert A; Sehmi, Joban; Seufferlein, Thomas; Shi, Jianxin; Silventoinen, Karri; Smit, Johannes H; Smith, Albert Vernon; Smolonska, Joanna; Stanton, Alice V; Stirrups, Kathleen; Stott, David J; Stringham, Heather M; Sundström, Johan; Swertz, Morris A; Syvänen, Ann-Christine; Tayo, Bamidele O; Thorleifsson, Gudmar; Tyrer, Jonathan P; van Dijk, Suzanne; van Schoor, Natasja M; van der Velde, Nathalie; van Heemst, Diana; van Oort, Floor VA; Vermeulen, Sita H; Verweij, Niek; Vonk, Judith M; Waite, Lindsay L; Waldenberger, Melanie; Wennauer, Roman; Wilkens, Lynne R; Willenborg, Christina; Wilsgaard, Tom; Wojczynski, Mary K; Wong, Andrew; Wright, Alan F; Zhang, Qunyuan; Arveiler, Dominique; Bakker, Stephan JL; Beilby, John; Bergman, Richard N; Bergmann, Sven; Biffar, Reiner; Blangero, John; Boomsma, Dorret I; Bornstein, Stefan R; Bovet, Pascal; Brambilla, Paolo; Brown, Morris J; Campbell, Harry; Caulfield, Mark J; Chakravarti, Aravinda; Collins, Rory; Collins, Francis S; Crawford, Dana C; Cupples, L Adrienne; Danesh, John; de Faire, Ulf; den Ruijter, Hester M; Erbel, Raimund; Erdmann, Jeanette; Eriksson, Johan G; Farrall, Martin; Ferrannini, Ele; Ferrières, Jean; Ford, Ian; Forouhi, Nita G; Forrester, Terrence; Gansevoort, Ron T; Gejman, Pablo V; Gieger, Christian; Golay, Alain; Gottesman, Omri; Gudnason, Vilmundur; Gyllensten, Ulf; Haas, David W; Hall, Alistair S; Harris, Tamara B; Hattersley, Andrew T; Heath, Andrew C; Hengstenberg, Christian; Hicks, Andrew A; Hindorff, Lucia A; Hingorani, Aroon D; Hofman, Albert; Hovingh, G Kees; Humphries, Steve E; Hunt, Steven C; Hypponen, Elina; Jacobs, Kevin B; Jarvelin, Marjo-Riitta; Jousilahti, Pekka; Jula, Antti M; Kaprio, Jaakko; Kastelein, John JP; Kayser, Manfred; Kee, Frank; Keinanen-Kiukaanniemi, Sirkka M; Kiemeney, Lambertus A; Kooner, Jaspal S; Kooperberg, Charles; Koskinen, Seppo; Kovacs, Peter; Kraja, Aldi T; Kumari, Meena; Kuusisto, Johanna; Lakka, Timo A; Langenberg, Claudia; Le Marchand, Loic; Lehtimäki, Terho; Lupoli, Sara; Madden, Pamela AF; Männistö, Satu; Manunta, Paolo; Marette, André; Matise, Tara C; McKnight, Barbara; Meitinger, Thomas; Moll, Frans L; Montgomery, Grant W; Morris, Andrew D; Morris, Andrew P; Murray, Jeffrey C; Nelis, Mari; Ohlsson, Claes; Oldehinkel, Albertine J; Ong, Ken K; Ouwehand, Willem H; Pasterkamp, Gerard; Peters, Annette; Pramstaller, Peter P; Price, Jackie F; Qi, Lu; Raitakari, Olli T; Rankinen, Tuomo; Rao, DC; Rice, Treva K; Ritchie, Marylyn; Rudan, Igor; Salomaa, Veikko; Samani, Nilesh J; Saramies, Jouko; Sarzynski, Mark A; Schwarz, Peter EH; Sebert, Sylvain; Sever, Peter; Shuldiner, Alan R; Sinisalo, Juha; Steinthorsdottir, Valgerdur; Stolk, Ronald P; Tardif, Jean-Claude; Tönjes, Anke; Tremblay, Angelo; Tremoli, Elena; Virtamo, Jarmo; Vohl, Marie-Claude; Amouyel, Philippe; Asselbergs, Folkert W; Assimes, Themistocles L; Bochud, Murielle; Boehm, Bernhard O; Boerwinkle, Eric; Bottinger, Erwin P; Bouchard, Claude; Cauchi, Stéphane; Chambers, John C; Chanock, Stephen J; Cooper, Richard S; de Bakker, Paul IW; Dedoussis, George; Ferrucci, Luigi; Franks, Paul W; Froguel, Philippe; Groop, Leif C; Haiman, Christopher A; Hamsten, Anders; Hayes, M Geoffrey; Hui, Jennie; Hunter, David J.; Hveem, Kristian; Jukema, J Wouter; Kaplan, Robert C; Kivimaki, Mika; Kuh, Diana; Laakso, Markku; Liu, Yongmei; Martin, Nicholas G; März, Winfried; Melbye, Mads; Moebus, Susanne; Munroe, Patricia B; Njølstad, Inger; Oostra, Ben A; Palmer, Colin NA; Pedersen, Nancy L; Perola, Markus; Pérusse, Louis; Peters, Ulrike; Powell, Joseph E; Power, Chris; Quertermous, Thomas; Rauramaa, Rainer; Reinmaa, Eva; Ridker, Paul M; Rivadeneira, Fernando; Rotter, Jerome I; Saaristo, Timo E; Saleheen, Danish; Schlessinger, David; Slagboom, P Eline; Snieder, Harold; Spector, Tim D; Strauch, Konstantin; Stumvoll, Michael; Tuomilehto, Jaakko; Uusitupa, Matti; van der Harst, Pim; Völzke, Henry; Walker, Mark; Wareham, Nicholas J; Watkins, Hugh; Wichmann, H-Erich; Wilson, James F; Zanen, Pieter; Deloukas, Panos; Heid, Iris M; Lindgren, Cecilia M; Mohlke, Karen L; Speliotes, Elizabeth K; Thorsteinsdottir, Unnur; Barroso, Inês; Fox, Caroline S; North, Kari E; Strachan, David P; Beckmann, Jacques S.; Berndt, Sonja I; Boehnke, Michael; Borecki, Ingrid B; McCarthy, Mark I; Metspalu, Andres; Stefansson, Kari; Uitterlinden, André G; van Duijn, Cornelia M; Franke, Lude; Willer, Cristen J; Price, Alkes L.; Lettre, Guillaume; Loos, Ruth JF; Weedon, Michael N; Ingelsson, Erik; O’Connell, Jeffrey R; Abecasis, Goncalo R; Chasman, Daniel I; Goddard, Michael E

2014-01-01

Using genome-wide data from 253,288 individuals, we identified 697 variants at genome-wide significance that together explain one-fifth of heritability for adult height. By testing different numbers of variants in independent studies, we show that the most strongly associated ~2,000, ~3,700 and ~9,500 SNPs explained ~21%, ~24% and ~29% of phenotypic variance. Furthermore, all common variants together captured the majority (60%) of heritability. The 697 variants clustered in 423 loci enriched for genes, pathways, and tissue-types known to be involved in growth and together implicated genes and pathways not highlighted in earlier efforts, such as signaling by fibroblast growth factors, WNT/beta-catenin, and chondroitin sulfate-related genes. We identified several genes and pathways not previously connected with human skeletal growth, including mTOR, osteoglycin and binding of hyaluronic acid. Our results indicate a genetic architecture for human height that is characterized by a very large but finite number (thousands) of causal variants. PMID:25282103
Exploring the Yeast Acetylome Using Functional Genomics

PubMed Central

Duffy, Supipi Kaluarachchi; Friesen, Helena; Baryshnikova, Anastasia; Lambert, Jean-Philippe; Chong, Yolanda T.; Figeys, Daniel; Andrews, Brenda

2014-01-01

SUMMARY Lysine acetylation is a dynamic posttranslational modification with a well-defined role in regulating histones. The impact of acetylation on other cellular functions remains relatively uncharacterized. We explored the budding yeast acetylome with a functional genomics approach, assessing the effects of gene overexpression in the absence of lysine deacetylases (KDACs). We generated a network of 463 synthetic dosage lethal (SDL) interactions involving class I and II KDACs, revealing many cellular pathways regulated by different KDACs. A biochemical survey of genes interacting with the KDAC RPD3 identified 72 proteins acetylated in vivo. In-depth analysis of one of these proteins, Swi4, revealed a role for acetylation in G1-specific gene expression. Acetylation of Swi4 regulates interaction with its partner Swi6, both components of the SBF transcription factor. This study expands our view of the yeast acetylome, demonstrates the utility of functional genomic screens for exploring enzymatic pathways, and provides functional information that can be mined for future studies. PMID:22579291
Detecting Signatures of Positive Selection along Defined Branches of a Population Tree Using LSD.

PubMed

Librado, Pablo; Orlando, Ludovic

2018-06-01

Identifying the genomic basis underlying local adaptation is paramount to evolutionary biology, and bears many applications in the fields of conservation biology, crop, and animal breeding, as well as personalized medicine. Although many approaches have been developed to detect signatures of positive selection within single populations and population pairs, the increasing wealth of high-throughput sequencing data requires improved methods capable of handling multiple, and ideally large number of, populations in a single analysis. In this study, we introduce LSD (levels of exclusively shared differences), a fast and flexible framework to perform genome-wide selection scans, along the internal and external branches of a given population tree. We use forward simulations to demonstrate that LSD can identify branches targeted by positive selection with remarkable sensitivity and specificity. We illustrate a range of potential applications by analyzing data from the 1000 Genomes Project and uncover a list of adaptive candidates accompanying the expansion of anatomically modern humans out of Africa and their spread to Europe.
Whole-Genome Characterization of Epidemic Neisseria meningitidis Serogroup C and Resurgence of Serogroup W, Niger, 2015.

PubMed

Kretz, Cecilia B; Retchless, Adam C; Sidikou, Fati; Issaka, Bassira; Ousmane, Sani; Schwartz, Stephanie; Tate, Ashley H; Pana, Assimawè; Njanpop-Lafourcade, Berthe-Marie; Nzeyimana, Innocent; Nse, Ricardo Obama; Deghmane, Ala-Eddine; Hong, Eva; Brynildsrud, Ola Brønstad; Novak, Ryan T; Meyer, Sarah A; Oukem-Boyer, Odile Ouwe Missi; Ronveaux, Olivier; Caugant, Dominique A; Taha, Muhamed-Kheir; Wang, Xin

2016-10-01

In 2015, Niger reported the largest epidemic of Neisseria meningitidis serogroup C (NmC) meningitis in sub-Saharan Africa. The NmC epidemic coincided with serogroup W (NmW) cases during the epidemic season, resulting in a total of 9,367 meningococcal cases through June 2015. To clarify the phylogenetic association, genetic evolution, and antibiotic determinants of the meningococcal strains in Niger, we sequenced the genomes of 102 isolates from this epidemic, comprising 81 NmC and 21 NmW isolates. The genomes of 82 isolates were completed, and all 102 were included in the analysis. All NmC isolates had sequence type 10217, which caused the outbreaks in Nigeria during 2013-2014 and for which a clonal complex has not yet been defined. The NmC isolates from Niger were substantially different from other NmC isolates collected globally. All NmW isolates belonged to clonal complex 11 and were closely related to the isolates causing recent outbreaks in Africa.
Comprehensive meta-analysis of Signal Transducers and Activators of Transcription (STAT) genomic binding patterns discerns cell-specific cis-regulatory modules

PubMed Central

2013-01-01

Background Cytokine-activated transcription factors from the STAT (Signal Transducers and Activators of Transcription) family control common and context-specific genetic programs. It is not clear to what extent cell-specific features determine the binding capacity of seven STAT members and to what degree they share genetic targets. Molecular insight into the biology of STATs was gained from a meta-analysis of 29 available ChIP-seq data sets covering genome-wide occupancy of STATs 1, 3, 4, 5A, 5B and 6 in several cell types. Results We determined that the genomic binding capacity of STATs is primarily defined by the cell type and to a lesser extent by individual family members. For example, the overlap of shared binding sites between STATs 3 and 5 in T cells is greater than that between STAT5 in T cells and non-T cells. Even for the top 1,000 highly enriched STAT binding sites, ~15% of STAT5 binding sites in mouse female liver are shared by other STATs in different cell types while in T cells ~90% of STAT5 binding sites are co-occupied by STAT3, STAT4 and STAT6. In addition, we identified 116 cis-regulatory modules (CRM), which are recognized by all STAT members across cell types defining a common JAK-STAT signature. Lastly, in liver STAT5 binding significantly coincides with binding of the cell-specific transcription factors HNF4A, FOXA1 and FOXA2 and is associated with cell-type specific gene transcription. Conclusions Our results suggest that genomic binding of STATs is primarily determined by the cell type and further specificity is achieved in part by juxtaposed binding of cell-specific transcription factors. PMID:23324445
Genome sequence of the button mushroom Agaricus bisporus reveals mechanisms governing adaptation to a humic-rich ecological niche

DOE Office of Scientific and Technical Information (OSTI.GOV)

Morin, Emmanuelle; Kohler, Annegret; Baker, Adam R.

Agaricus bisporus is the model fungus for the adaptation, persistence, and growth in the humic-rich leaf-litter environment. Aside from its ecological role, A. bisporus has been an important component of the human diet for over 200 y and worldwide cultivation of the button mushroom forms a multibillion dollar industry. We present two A. bisporus genomes, their gene repertoires and transcript profiles on compost and during mushroom formation. The genomes encode a full repertoire of polysaccharide-degrading enzymes similar to that of wood-decayers. Comparative transcriptomics of mycelium grown on defined medium, casing-soil, and compost revealed genes encoding enzymes involved in xylan, cellulose,more » pectin, and protein degradation are more highly expressed in compost. The striking expansion of heme-thiolate peroxidases and etherases is distinctive from Agaricomycotina wood-decayers and suggests a broad attack on decaying lignin and related metabolites found in humic acid-rich environment. Similarly, up-regulation of these genes together with a lignolytic manganese peroxidase, multiple copper radical oxidases, and cytochrome P450s is consistent with challenges posed by complex humic-rich substrates. The gene repertoire and expression of hydrolytic enzymes in A. bisporus is substantially different from the taxonomically related ectomycorrhizal symbiont Laccaria bicolor. A common promoter motif was also identified in genes very highly expressed in humic-rich substrates. These observations reveal genetic and enzymatic mechanisms governing adaptation to the humic-rich ecological niche formed during plant degradation, further defining the critical role such fungi contribute to soil structure and carbon sequestration in terrestrial ecosystems. Genome sequence will expedite mushroom breeding for improved agronomic characteristics.« less
Genome sequence of the button mushroom Agaricus bisporus reveals mechanisms governing adaptation to a humic-rich ecological niche

PubMed Central

Morin, Emmanuelle; Kohler, Annegret; Baker, Adam R.; Foulongne-Oriol, Marie; Lombard, Vincent; Nagye, Laszlo G.; Ohm, Robin A.; Patyshakuliyeva, Aleksandrina; Brun, Annick; Aerts, Andrea L.; Bailey, Andrew M.; Billette, Christophe; Coutinho, Pedro M.; Deakin, Greg; Doddapaneni, Harshavardhan; Floudas, Dimitrios; Grimwood, Jane; Hildén, Kristiina; Kües, Ursula; LaButti, Kurt M.; Lapidus, Alla; Lindquist, Erika A.; Lucas, Susan M.; Murat, Claude; Riley, Robert W.; Salamov, Asaf A.; Schmutz, Jeremy; Subramanian, Venkataramanan; Wösten, Han A. B.; Xu, Jianping; Eastwood, Daniel C.; Foster, Gary D.; Sonnenberg, Anton S. M.; Cullen, Dan; de Vries, Ronald P.; Lundell, Taina; Hibbett, David S.; Henrissat, Bernard; Burton, Kerry S.; Kerrigan, Richard W.; Challen, Michael P.; Grigoriev, Igor V.; Martin, Francis

2012-01-01

Agaricus bisporus is the model fungus for the adaptation, persistence, and growth in the humic-rich leaf-litter environment. Aside from its ecological role, A. bisporus has been an important component of the human diet for over 200 y and worldwide cultivation of the “button mushroom” forms a multibillion dollar industry. We present two A. bisporus genomes, their gene repertoires and transcript profiles on compost and during mushroom formation. The genomes encode a full repertoire of polysaccharide-degrading enzymes similar to that of wood-decayers. Comparative transcriptomics of mycelium grown on defined medium, casing-soil, and compost revealed genes encoding enzymes involved in xylan, cellulose, pectin, and protein degradation are more highly expressed in compost. The striking expansion of heme-thiolate peroxidases and β-etherases is distinctive from Agaricomycotina wood-decayers and suggests a broad attack on decaying lignin and related metabolites found in humic acid-rich environment. Similarly, up-regulation of these genes together with a lignolytic manganese peroxidase, multiple copper radical oxidases, and cytochrome P450s is consistent with challenges posed by complex humic-rich substrates. The gene repertoire and expression of hydrolytic enzymes in A. bisporus is substantially different from the taxonomically related ectomycorrhizal symbiont Laccaria bicolor. A common promoter motif was also identified in genes very highly expressed in humic-rich substrates. These observations reveal genetic and enzymatic mechanisms governing adaptation to the humic-rich ecological niche formed during plant degradation, further defining the critical role such fungi contribute to soil structure and carbon sequestration in terrestrial ecosystems. Genome sequence will expedite mushroom breeding for improved agronomic characteristics. PMID:23045686
Experimental evolution, genetic analysis and genome re-sequencing reveal the mutation conferring artemisinin resistance in an isogenic lineage of malaria parasites

PubMed Central

2010-01-01

Background Classical and quantitative linkage analyses of genetic crosses have traditionally been used to map genes of interest, such as those conferring chloroquine or quinine resistance in malaria parasites. Next-generation sequencing technologies now present the possibility of determining genome-wide genetic variation at single base-pair resolution. Here, we combine in vivo experimental evolution, a rapid genetic strategy and whole genome re-sequencing to identify the precise genetic basis of artemisinin resistance in a lineage of the rodent malaria parasite, Plasmodium chabaudi. Such genetic markers will further the investigation of resistance and its control in natural infections of the human malaria, P. falciparum. Results A lineage of isogenic in vivo drug-selected mutant P. chabaudi parasites was investigated. By measuring the artemisinin responses of these clones, the appearance of an in vivo artemisinin resistance phenotype within the lineage was defined. The underlying genetic locus was mapped to a region of chromosome 2 by Linkage Group Selection in two different genetic crosses. Whole-genome deep coverage short-read re-sequencing (Illumina® Solexa) defined the point mutations, insertions, deletions and copy-number variations arising in the lineage. Eight point mutations arise within the mutant lineage, only one of which appears on chromosome 2. This missense mutation arises contemporaneously with artemisinin resistance and maps to a gene encoding a de-ubiquitinating enzyme. Conclusions This integrated approach facilitates the rapid identification of mutations conferring selectable phenotypes, without prior knowledge of biological and molecular mechanisms. For malaria, this model can identify candidate genes before resistant parasites are commonly observed in natural human malaria populations. PMID:20846421
Multi-allelic haplotype model based on genetic partition for genomic prediction and variance component estimation using SNP markers.

PubMed

Da, Yang

2015-12-18

The amount of functional genomic information has been growing rapidly but remains largely unused in genomic selection. Genomic prediction and estimation using haplotypes in genome regions with functional elements such as all genes of the genome can be an approach to integrate functional and structural genomic information for genomic selection. Towards this goal, this article develops a new haplotype approach for genomic prediction and estimation. A multi-allelic haplotype model treating each haplotype as an 'allele' was developed for genomic prediction and estimation based on the partition of a multi-allelic genotypic value into additive and dominance values. Each additive value is expressed as a function of h - 1 additive effects, where h = number of alleles or haplotypes, and each dominance value is expressed as a function of h(h - 1)/2 dominance effects. For a sample of q individuals, the limit number of effects is 2q - 1 for additive effects and is the number of heterozygous genotypes for dominance effects. Additive values are factorized as a product between the additive model matrix and the h - 1 additive effects, and dominance values are factorized as a product between the dominance model matrix and the h(h - 1)/2 dominance effects. Genomic additive relationship matrix is defined as a function of the haplotype model matrix for additive effects, and genomic dominance relationship matrix is defined as a function of the haplotype model matrix for dominance effects. Based on these results, a mixed model implementation for genomic prediction and variance component estimation that jointly use haplotypes and single markers is established, including two computing strategies for genomic prediction and variance component estimation with identical results. The multi-allelic genetic partition fills a theoretical gap in genetic partition by providing general formulations for partitioning multi-allelic genotypic values and provides a haplotype method based on the quantitative genetics model towards the utilization of functional and structural genomic information for genomic prediction and estimation.
Actomyosin drives cancer cell nuclear dysmorphia and threatens genome stability.

PubMed

Takaki, Tohru; Montagner, Marco; Serres, Murielle P; Le Berre, Maël; Russell, Matt; Collinson, Lucy; Szuhai, Karoly; Howell, Michael; Boulton, Simon J; Sahai, Erik; Petronczki, Mark

2017-07-24

Altered nuclear shape is a defining feature of cancer cells. The mechanisms underlying nuclear dysmorphia in cancer remain poorly understood. Here we identify PPP1R12A and PPP1CB, two subunits of the myosin phosphatase complex that antagonizes actomyosin contractility, as proteins safeguarding nuclear integrity. Loss of PPP1R12A or PPP1CB causes nuclear fragmentation, nuclear envelope rupture, nuclear compartment breakdown and genome instability. Pharmacological or genetic inhibition of actomyosin contractility restores nuclear architecture and genome integrity in cells lacking PPP1R12A or PPP1CB. We detect actin filaments at nuclear envelope rupture sites and define the Rho-ROCK pathway as the driver of nuclear damage. Lamin A protects nuclei from the impact of actomyosin activity. Blocking contractility increases nuclear circularity in cultured cancer cells and suppresses deformations of xenograft nuclei in vivo. We conclude that actomyosin contractility is a major determinant of nuclear shape and that unrestrained contractility causes nuclear dysmorphia, nuclear envelope rupture and genome instability.
DNA replication origins—where do we begin?

PubMed Central

Prioleau, Marie-Noëlle; MacAlpine, David M.

2016-01-01

For more than three decades, investigators have sought to identify the precise locations where DNA replication initiates in mammalian genomes. The development of molecular and biochemical approaches to identify start sites of DNA replication (origins) based on the presence of defining and characteristic replication intermediates at specific loci led to the identification of only a handful of mammalian replication origins. The limited number of identified origins prevented a comprehensive and exhaustive search for conserved genomic features that were capable of specifying origins of DNA replication. More recently, the adaptation of origin-mapping assays to genome-wide approaches has led to the identification of tens of thousands of replication origins throughout mammalian genomes, providing an unprecedented opportunity to identify both genetic and epigenetic features that define and regulate their distribution and utilization. Here we summarize recent advances in our understanding of how primary sequence, chromatin environment, and nuclear architecture contribute to the dynamic selection and activation of replication origins across diverse cell types and developmental stages. PMID:27542827
Actomyosin drives cancer cell nuclear dysmorphia and threatens genome stability

PubMed Central

Takaki, Tohru; Montagner, Marco; Serres, Murielle P.; Le Berre, Maël; Russell, Matt; Collinson, Lucy; Szuhai, Karoly; Howell, Michael; Boulton, Simon J.; Sahai, Erik; Petronczki, Mark

2017-01-01

Altered nuclear shape is a defining feature of cancer cells. The mechanisms underlying nuclear dysmorphia in cancer remain poorly understood. Here we identify PPP1R12A and PPP1CB, two subunits of the myosin phosphatase complex that antagonizes actomyosin contractility, as proteins safeguarding nuclear integrity. Loss of PPP1R12A or PPP1CB causes nuclear fragmentation, nuclear envelope rupture, nuclear compartment breakdown and genome instability. Pharmacological or genetic inhibition of actomyosin contractility restores nuclear architecture and genome integrity in cells lacking PPP1R12A or PPP1CB. We detect actin filaments at nuclear envelope rupture sites and define the Rho-ROCK pathway as the driver of nuclear damage. Lamin A protects nuclei from the impact of actomyosin activity. Blocking contractility increases nuclear circularity in cultured cancer cells and suppresses deformations of xenograft nuclei in vivo. We conclude that actomyosin contractility is a major determinant of nuclear shape and that unrestrained contractility causes nuclear dysmorphia, nuclear envelope rupture and genome instability. PMID:28737169
Draft Genome Sequences of Human Pathogenic Fungus Geomyces pannorum Sensu Lato and Bat White Nose Syndrome Pathogen Geomyces (Pseudogymnoascus) destructans

PubMed Central

Crabtree, Jonathan; Nagaraj, Sushma; Chaturvedi, Sudha

2013-01-01

We report the draft genome sequences of Geomyces pannorum sensu lato and Geomyces (Pseudogymnoascus) destructans. G. pannorum has a larger proteome than G. destructans, containing more proteins with ascribed enzymatic functions. This dichotomy in the genomes of related psychrophilic fungi is a valuable target for defining their distinct saprobic and pathogenic attributes. PMID:24356829
Baculovirus-based genome editing in primary cells.

PubMed

Mansouri, Maysam; Ehsaei, Zahra; Taylor, Verdon; Berger, Philipp

2017-03-01

Genome editing in eukaryotes became easier in the last years with the development of nucleases that induce double strand breaks in DNA at user-defined sites. CRISPR/Cas9-based genome editing is currently one of the most powerful strategies. In the easiest case, a nuclease (e.g. Cas9) and a target defining guide RNA (gRNA) are transferred into a target cell. Non-homologous end joining (NHEJ) repair of the DNA break following Cas9 cleavage can lead to inactivation of the target gene. Specific repair or insertion of DNA with Homology Directed Repair (HDR) needs the simultaneous delivery of a repair template. Recombinant Lentivirus or Adenovirus genomes have enough capacity for a nuclease coding sequence and the gRNA but are usually too small to also carry large targeting constructs. We recently showed that a baculovirus-based multigene expression system (MultiPrime) can be used for genome editing in primary cells since it possesses the necessary capacity to carry the nuclease and gRNA expression constructs and the HDR targeting sequences. Here we present new Acceptor plasmids for MultiPrime that allow simplified cloning of baculoviruses for genome editing and we show their functionality in primary cells with limited life span and induced pluripotent stem cells (iPS). Copyright © 2017 Elsevier Inc. All rights reserved.
Defining the phylogenomics of Shigella species: a pathway to diagnostics.

PubMed

Sahl, Jason W; Morris, Carolyn R; Emberger, Jennifer; Fraser, Claire M; Ochieng, John Benjamin; Juma, Jane; Fields, Barry; Breiman, Robert F; Gilmour, Matthew; Nataro, James P; Rasko, David A

2015-03-01

Shigellae cause significant diarrheal disease and mortality in humans, as there are approximately 163 million episodes of shigellosis and 1.1 million deaths annually. While significant strides have been made in the understanding of the pathogenesis, few studies on the genomic content of the Shigella species have been completed. The goal of this study was to characterize the genomic diversity of Shigella species through sequencing of 55 isolates representing members of each of the four Shigella species: S. flexneri, S. sonnei, S. boydii, and S. dysenteriae. Phylogeny inferred from 336 available Shigella and Escherichia coli genomes defined exclusive clades of Shigella; conserved genomic markers that can identify each clade were then identified. PCR assays were developed for each clade-specific marker, which was combined with an amplicon for the conserved Shigella invasion antigen, IpaH3, into a multiplex PCR assay. This assay demonstrated high specificity, correctly identifying 218 of 221 presumptive Shigella isolates, and sensitivity, by not identifying any of 151 diverse E. coli isolates incorrectly as Shigella. This new phylogenomics-based PCR assay represents a valuable tool for rapid typing of uncharacterized Shigella isolates and provides a framework that can be utilized for the identification of novel genomic markers from genomic data. Copyright © 2015, American Society for Microbiology. All Rights Reserved.
Defining the Phylogenomics of Shigella Species: a Pathway to Diagnostics

PubMed Central

Sahl, Jason W.; Morris, Carolyn R.; Emberger, Jennifer; Fraser, Claire M.; Ochieng, John Benjamin; Juma, Jane; Fields, Barry; Breiman, Robert F.; Gilmour, Matthew; Nataro, James P.

2015-01-01

Shigellae cause significant diarrheal disease and mortality in humans, as there are approximately 163 million episodes of shigellosis and 1.1 million deaths annually. While significant strides have been made in the understanding of the pathogenesis, few studies on the genomic content of the Shigella species have been completed. The goal of this study was to characterize the genomic diversity of Shigella species through sequencing of 55 isolates representing members of each of the four Shigella species: S. flexneri, S. sonnei, S. boydii, and S. dysenteriae. Phylogeny inferred from 336 available Shigella and Escherichia coli genomes defined exclusive clades of Shigella; conserved genomic markers that can identify each clade were then identified. PCR assays were developed for each clade-specific marker, which was combined with an amplicon for the conserved Shigella invasion antigen, IpaH3, into a multiplex PCR assay. This assay demonstrated high specificity, correctly identifying 218 of 221 presumptive Shigella isolates, and sensitivity, by not identifying any of 151 diverse E. coli isolates incorrectly as Shigella. This new phylogenomics-based PCR assay represents a valuable tool for rapid typing of uncharacterized Shigella isolates and provides a framework that can be utilized for the identification of novel genomic markers from genomic data. PMID:25588655
Characterizing the genetic differences between two distinct migrant groups from Indo-European and Dravidian speaking populations in India.

PubMed

Ali, Mohammad; Liu, Xuanyao; Pillai, Esakimuthu Nisha; Chen, Peng; Khor, Chiea-Chuen; Ong, Rick Twee-Hee; Teo, Yik-Ying

2014-07-22

India is home to many ethnically and linguistically diverse populations. It is hypothesized that history of invasions by people from Persia and Central Asia, who are referred as Aryans in Hindu Holy Scriptures, had a defining role in shaping the Indian population canvas. A shift in spoken languages from Dravidian languages to Indo-European languages around 1500 B.C. is central to the Aryan Invasion Theory. Here we investigate the genetic differences between two sub-populations of India consisting of: (1) The Indo-European language speaking Gujarati Indians with genome-wide data from the International HapMap Project; and (2) the Dravidian language speaking Tamil Indians with genome-wide data from the Singapore Genome Variation Project. We implemented three population genetics measures to identify genomic regions that are significantly differentiated between the two Indian populations originating from the north and south of India. These measures singled out genomic regions with: (i) SNPs exhibiting significant variation in allele frequencies in the two Indian populations; and (ii) differential signals of positive natural selection as quantified by the integrated haplotype score (iHS) and cross-population extended haplotype homozygosity (XP-EHH). One of the regions that emerged spans the SLC24A5 gene that has been functionally shown to affect skin pigmentation, with a higher degree of genetic sharing between Gujarati Indians and Europeans. Our finding points to a gene-flow from Europe to north India that provides an explanation for the lighter skin tones present in North Indians in comparison to South Indians.
Protein domain assignment from the recurrence of locally similar structures

PubMed Central

Tai, Chin-Hsien; Sam, Vichetra; Gibrat, Jean-Francois; Garnier, Jean; Munson, Peter J.

2010-01-01

Domains are basic units of protein structure and essential for exploring protein fold space and structure evolution. With the structural genomics initiative, the number of protein structures in the Protein Databank (PDB) is increasing dramatically and domain assignments need to be done automatically. Most existing structural domain assignment programs define domains using the compactness of the domains and/or the number and strength of intra-domain versus inter-domain contacts. Here we present a different approach based on the recurrence of locally similar structural pieces (LSSPs) found by one-against-all structure comparisons with a dataset of 6,373 protein chains from the PDB. Residues of the query protein are clustered using LSSPs via three different procedures to define domains. This approach gives results that are comparable to several existing programs that use geometrical and other structural information explicitly. Remarkably, most of the proteins that contribute the LSSPs defining a domain do not themselves contain the domain of interest. This study shows that domains can be defined by a collection of relatively small locally similar structural pieces containing, on average, four secondary structure elements. In addition, it indicates that domains are indeed made of recurrent small structural pieces that are used to build protein structures of many different folds as suggested by recent studies. PMID:21287617
Economic evaluation of genomic selection in small ruminants: a sheep meat breeding program.

PubMed

Shumbusho, F; Raoul, J; Astruc, J M; Palhiere, I; Lemarié, S; Fugeray-Scarbel, A; Elsen, J M

2016-06-01

Recent genomic evaluation studies using real data and predicting genetic gain by modeling breeding programs have reported moderate expected benefits from the replacement of classic selection schemes by genomic selection (GS) in small ruminants. The objectives of this study were to compare the cost, monetary genetic gain and economic efficiency of classic selection and GS schemes in the meat sheep industry. Deterministic methods were used to model selection based on multi-trait indices from a sheep meat breeding program. Decisional variables related to male selection candidates and progeny testing were optimized to maximize the annual monetary genetic gain (AMGG), that is, a weighted sum of meat and maternal traits annual genetic gains. For GS, a reference population of 2000 individuals was assumed and genomic information was available for evaluation of male candidates only. In the classic selection scheme, males breeding values were estimated from own and offspring phenotypes. In GS, different scenarios were considered, differing by the information used to select males (genomic only, genomic+own performance, genomic+offspring phenotypes). The results showed that all GS scenarios were associated with higher total variable costs than classic selection (if the cost of genotyping was 123 euros/animal). In terms of AMGG and economic returns, GS scenarios were found to be superior to classic selection only if genomic information was combined with their own meat phenotypes (GS-Pheno) or with their progeny test information. The predicted economic efficiency, defined as returns (proportional to number of expressions of AMGG in the nucleus and commercial flocks) minus total variable costs, showed that the best GS scenario (GS-Pheno) was up to 15% more efficient than classic selection. For all selection scenarios, optimization increased the overall AMGG, returns and economic efficiency. As a conclusion, our study shows that some forms of GS strategies are more advantageous than classic selection, provided that GS is already initiated (i.e. the initial reference population is available). Optimizing decisional variables of the classic selection scheme could be of greater benefit than including genomic information in optimized designs.

Microbial genomic taxonomy

PubMed Central

2013-01-01

A need for a genomic species definition is emerging from several independent studies worldwide. In this commentary paper, we discuss recent studies on the genomic taxonomy of diverse microbial groups and a unified species definition based on genomics. Accordingly, strains from the same microbial species share >95% Average Amino Acid Identity (AAI) and Average Nucleotide Identity (ANI), >95% identity based on multiple alignment genes, <10 in Karlin genomic signature, and > 70% in silico Genome-to-Genome Hybridization similarity (GGDH). Species of the same genus will form monophyletic groups on the basis of 16S rRNA gene sequences, Multilocus Sequence Analysis (MLSA) and supertree analysis. In addition to the established requirements for species descriptions, we propose that new taxa descriptions should also include at least a draft genome sequence of the type strain in order to obtain a clear outlook on the genomic landscape of the novel microbe. The application of the new genomic species definition put forward here will allow researchers to use genome sequences to define simultaneously coherent phenotypic and genomic groups. PMID:24365132
The Chthonomonas calidirosea Genome Is Highly Conserved across Geographic Locations and Distinct Chemical and Microbial Environments in New Zealand's Taupō Volcanic Zone.

PubMed

Lee, Kevin C; Stott, Matthew B; Dunfield, Peter F; Huttenhower, Curtis; McDonald, Ian R; Morgan, Xochitl C

2016-06-15

Chthonomonas calidirosea T49(T) is a low-abundance, carbohydrate-scavenging, and thermophilic soil bacterium with a seemingly disorganized genome. We hypothesized that the C. calidirosea genome would be highly responsive to local selection pressure, resulting in the divergence of its genomic content, genome organization, and carbohydrate utilization phenotype across environments. We tested this hypothesis by sequencing the genomes of four C. calidirosea isolates obtained from four separate geothermal fields in the Taupō Volcanic Zone, New Zealand. For each isolation site, we measured physicochemical attributes and defined the associated microbial community by 16S rRNA gene sequencing. Despite their ecological and geographical isolation, the genome sequences showed low divergence (maximum, 1.17%). Isolate-specific variations included single-nucleotide polymorphisms (SNPs), restriction-modification systems, and mobile elements but few major deletions and no major rearrangements. The 50-fold variation in C. calidirosea relative abundance among the four sites correlated with site environmental characteristics but not with differences in genomic content. Conversely, the carbohydrate utilization profiles of the C. calidirosea isolates corresponded to the inferred isolate phylogenies, which only partially paralleled the geographical relationships among the sample sites. Genomic sequence conservation does not entirely parallel geographic distance, suggesting that stochastic dispersal and localized extinction, which allow for rapid population homogenization with little restriction by geographical barriers, are possible mechanisms of C. calidirosea distribution. This dispersal and extinction mechanism is likely not limited to C. calidirosea but may shape the populations and genomes of many other low-abundance free-living taxa. This study compares the genomic sequence variations and metabolisms of four strains of Chthonomonas calidirosea, a rare thermophilic bacterium from the phylum Armatimonadetes It additionally compares the microbial communities and chemistry of each of the geographically distinct sites from which the four C. calidirosea strains were isolated. C. calidirosea was previously reported to possess a highly disorganized genome, but it was unclear whether this reflected rapid evolution. Here, we show that each isolation site has a distinct chemistry and microbial community, but despite this, the C. calidirosea genome is highly conserved across all isolation sites. Furthermore, genomic sequence differences only partially paralleled geographic distance, suggesting that C. calidirosea genotypes are not primarily determined by adaptive evolution. Instead, the presence of C. calidirosea may be driven by stochastic dispersal and localized extinction. This ecological mechanism may apply to many other low-abundance taxa. Copyright © 2016 Lee et al.
The Chthonomonas calidirosea Genome Is Highly Conserved across Geographic Locations and Distinct Chemical and Microbial Environments in New Zealand's Taupō Volcanic Zone

PubMed Central

Lee, Kevin C.; Stott, Matthew B.; Dunfield, Peter F.; Huttenhower, Curtis; McDonald, Ian R.

2016-01-01

ABSTRACT Chthonomonas calidirosea T49T is a low-abundance, carbohydrate-scavenging, and thermophilic soil bacterium with a seemingly disorganized genome. We hypothesized that the C. calidirosea genome would be highly responsive to local selection pressure, resulting in the divergence of its genomic content, genome organization, and carbohydrate utilization phenotype across environments. We tested this hypothesis by sequencing the genomes of four C. calidirosea isolates obtained from four separate geothermal fields in the Taupō Volcanic Zone, New Zealand. For each isolation site, we measured physicochemical attributes and defined the associated microbial community by 16S rRNA gene sequencing. Despite their ecological and geographical isolation, the genome sequences showed low divergence (maximum, 1.17%). Isolate-specific variations included single-nucleotide polymorphisms (SNPs), restriction-modification systems, and mobile elements but few major deletions and no major rearrangements. The 50-fold variation in C. calidirosea relative abundance among the four sites correlated with site environmental characteristics but not with differences in genomic content. Conversely, the carbohydrate utilization profiles of the C. calidirosea isolates corresponded to the inferred isolate phylogenies, which only partially paralleled the geographical relationships among the sample sites. Genomic sequence conservation does not entirely parallel geographic distance, suggesting that stochastic dispersal and localized extinction, which allow for rapid population homogenization with little restriction by geographical barriers, are possible mechanisms of C. calidirosea distribution. This dispersal and extinction mechanism is likely not limited to C. calidirosea but may shape the populations and genomes of many other low-abundance free-living taxa. IMPORTANCE This study compares the genomic sequence variations and metabolisms of four strains of Chthonomonas calidirosea, a rare thermophilic bacterium from the phylum Armatimonadetes. It additionally compares the microbial communities and chemistry of each of the geographically distinct sites from which the four C. calidirosea strains were isolated. C. calidirosea was previously reported to possess a highly disorganized genome, but it was unclear whether this reflected rapid evolution. Here, we show that each isolation site has a distinct chemistry and microbial community, but despite this, the C. calidirosea genome is highly conserved across all isolation sites. Furthermore, genomic sequence differences only partially paralleled geographic distance, suggesting that C. calidirosea genotypes are not primarily determined by adaptive evolution. Instead, the presence of C. calidirosea may be driven by stochastic dispersal and localized extinction. This ecological mechanism may apply to many other low-abundance taxa. PMID:27060125
In silico genomic analyses reveal three distinct lineages of Escherichia coli O157:H7, one of which is associated with hyper-virulence.

PubMed

Laing, Chad R; Buchanan, Cody; Taboada, Eduardo N; Zhang, Yongxiang; Karmali, Mohamed A; Thomas, James E; Gannon, Victor Pj

2009-06-29

Many approaches have been used to study the evolution, population structure and genetic diversity of Escherichia coli O157:H7; however, observations made with different genotyping systems are not easily relatable to each other. Three genetic lineages of E. coli O157:H7 designated I, II and I/II have been identified using octamer-based genome scanning and microarray comparative genomic hybridization (mCGH). Each lineage contains significant phenotypic differences, with lineage I strains being the most commonly associated with human infections. Similarly, a clade of hyper-virulent O157:H7 strains implicated in the 2006 spinach and lettuce outbreaks has been defined using single-nucleotide polymorphism (SNP) typing. In this study an in silico comparison of six different genotyping approaches was performed on 19 E. coli genome sequences from 17 O157:H7 strains and single O145:NM and K12 MG1655 strains to provide an overall picture of diversity of the E. coli O157:H7 population, and to compare genotyping methods for O157:H7 strains. In silico determination of lineage, Shiga-toxin bacteriophage integration site, comparative genomic fingerprint, mCGH profile, novel region distribution profile, SNP type and multi-locus variable number tandem repeat analysis type was performed and a supernetwork based on the combination of these methods was produced. This supernetwork showed three distinct clusters of strains that were O157:H7 lineage-specific, with the SNP-based hyper-virulent clade 8 synonymous with O157:H7 lineage I/II. Lineage I/II/clade 8 strains clustered closest on the supernetwork to E. coli K12 and E. coli O55:H7, O145:NM and sorbitol-fermenting O157 strains. The results of this study highlight the similarities in relationships derived from multi-locus genome sampling methods and suggest a "common genotyping language" may be devised for population genetics and epidemiological studies. Future genotyping methods should provide data that can be stored centrally and accessed locally in an easily transferable, informative and extensible format based on comparative genomic analyses.
Pan-cancer Alterations of the MYC Oncogene and Its Proximal Network across the Cancer Genome Atlas. | Office of Cancer Genomics

Cancer.gov

Although the MYC oncogene has been implicated in cancer, a systematic assessment of alterations of MYC, related transcription factors, and co-regulatory proteins, forming the proximal MYC network (PMN), across human cancers is lacking. Using computational approaches, we define genomic and proteomic features associated with MYC and the PMN across the 33 cancers of The Cancer Genome Atlas. Pan-cancer, 28% of all samples had at least one of the MYC paralogs amplified.
Evolution of biological complexity

PubMed Central

Adami, Christoph; Ofria, Charles; Collier, Travis C.

2000-01-01

To make a case for or against a trend in the evolution of complexity in biological evolution, complexity needs to be both rigorously defined and measurable. A recent information-theoretic (but intuitively evident) definition identifies genomic complexity with the amount of information a sequence stores about its environment. We investigate the evolution of genomic complexity in populations of digital organisms and monitor in detail the evolutionary transitions that increase complexity. We show that, because natural selection forces genomes to behave as a natural “Maxwell Demon,” within a fixed environment, genomic complexity is forced to increase. PMID:10781045
BLAST Ring Image Generator (BRIG): simple prokaryote genome comparisons

PubMed Central

2011-01-01

Background Visualisation of genome comparisons is invaluable for helping to determine genotypic differences between closely related prokaryotes. New visualisation and abstraction methods are required in order to improve the validation, interpretation and communication of genome sequence information; especially with the increasing amount of data arising from next-generation sequencing projects. Visualising a prokaryote genome as a circular image has become a powerful means of displaying informative comparisons of one genome to a number of others. Several programs, imaging libraries and internet resources already exist for this purpose, however, most are either limited in the number of comparisons they can show, are unable to adequately utilise draft genome sequence data, or require a knowledge of command-line scripting for implementation. Currently, there is no freely available desktop application that enables users to rapidly visualise comparisons between hundreds of draft or complete genomes in a single image. Results BLAST Ring Image Generator (BRIG) can generate images that show multiple prokaryote genome comparisons, without an arbitrary limit on the number of genomes compared. The output image shows similarity between a central reference sequence and other sequences as a set of concentric rings, where BLAST matches are coloured on a sliding scale indicating a defined percentage identity. Images can also include draft genome assembly information to show read coverage, assembly breakpoints and collapsed repeats. In addition, BRIG supports the mapping of unassembled sequencing reads against one or more central reference sequences. Many types of custom data and annotations can be shown using BRIG, making it a versatile approach for visualising a range of genomic comparison data. BRIG is readily accessible to any user, as it assumes no specialist computational knowledge and will perform all required file parsing and BLAST comparisons automatically. Conclusions There is a clear need for a user-friendly program that can produce genome comparisons for a large number of prokaryote genomes with an emphasis on rapidly utilising unfinished or unassembled genome data. Here we present BRIG, a cross-platform application that enables the interactive generation of comparative genomic images via a simple graphical-user interface. BRIG is freely available for all operating systems at http://sourceforge.net/projects/brig/. PMID:21824423
BLAST Ring Image Generator (BRIG): simple prokaryote genome comparisons.

PubMed

Alikhan, Nabil-Fareed; Petty, Nicola K; Ben Zakour, Nouri L; Beatson, Scott A

2011-08-08

Visualisation of genome comparisons is invaluable for helping to determine genotypic differences between closely related prokaryotes. New visualisation and abstraction methods are required in order to improve the validation, interpretation and communication of genome sequence information; especially with the increasing amount of data arising from next-generation sequencing projects. Visualising a prokaryote genome as a circular image has become a powerful means of displaying informative comparisons of one genome to a number of others. Several programs, imaging libraries and internet resources already exist for this purpose, however, most are either limited in the number of comparisons they can show, are unable to adequately utilise draft genome sequence data, or require a knowledge of command-line scripting for implementation. Currently, there is no freely available desktop application that enables users to rapidly visualise comparisons between hundreds of draft or complete genomes in a single image. BLAST Ring Image Generator (BRIG) can generate images that show multiple prokaryote genome comparisons, without an arbitrary limit on the number of genomes compared. The output image shows similarity between a central reference sequence and other sequences as a set of concentric rings, where BLAST matches are coloured on a sliding scale indicating a defined percentage identity. Images can also include draft genome assembly information to show read coverage, assembly breakpoints and collapsed repeats. In addition, BRIG supports the mapping of unassembled sequencing reads against one or more central reference sequences. Many types of custom data and annotations can be shown using BRIG, making it a versatile approach for visualising a range of genomic comparison data. BRIG is readily accessible to any user, as it assumes no specialist computational knowledge and will perform all required file parsing and BLAST comparisons automatically. There is a clear need for a user-friendly program that can produce genome comparisons for a large number of prokaryote genomes with an emphasis on rapidly utilising unfinished or unassembled genome data. Here we present BRIG, a cross-platform application that enables the interactive generation of comparative genomic images via a simple graphical-user interface. BRIG is freely available for all operating systems at http://sourceforge.net/projects/brig/.
The Rules of Variation Expanded, Implications for the Research on Compatible Genomics.

PubMed

Castro-Chavez, Fernando

2011-05-12

The main focus of this article is to present the practical aspect of the code rules of variation and the search for a second set of genomic rules, including comparison of sequences to understand how to preserve compatible organisms in danger of extinction and how to generate biodiversity. Three new rules of variation are introduced: 1) homologous recombination, 2) a healthy fertile offspring, and 3) comparison of compatible genomes. The novel search in the natural world for fully compatible genomes capable of homologous recombination is explored by using examples of human polymorphisms in the LDLRAP1 gene, and by the production of fertile offspring by crossbreeding. Examples of dogs, llamas and finches will be presented by a rational control of: natural crossbreeding of organisms with compatible genomes (something already happening in nature), the current work focuses on the generation of new varieties after a careful plan. This study is presented within the context of biosemiotics, which studies the processing of information, signaling and signs by living systems. I define a group of organisms having compatible genomes as a single theme: the genomic species or population, able to speak the same molecular language through different accents, with each variety within a theme being a different version of the same book. These studies have a molecular, compatible genetics context. Population and ecosystem biosemiotics will be exemplified by a possible genetic damage capable of causing mutations by breaking the rules of variation through the coordinated patterns of atoms present in the 9/11 World Trade Center contaminated dust (U, Ba, La, Ce, Sr, Rb, K, Mn, Mg, etc.), combination that may be able to overload the molecular quality control mechanisms of the human body. I introduce here the balance of codons in the circular genetic code: 2[1(1)+1(3)+1(4)+4(2)]=2[2(2)+3(4)].
The Rules of Variation Expanded, Implications for the Research on Compatible Genomics

PubMed Central

Castro-Chavez, Fernando

2011-01-01

The main focus of this article is to present the practical aspect of the code rules of variation and the search for a second set of genomic rules, including comparison of sequences to understand how to preserve compatible organisms in danger of extinction and how to generate biodiversity. Three new rules of variation are introduced: 1) homologous recombination, 2) a healthy fertile offspring, and 3) comparison of compatible genomes. The novel search in the natural world for fully compatible genomes capable of homologous recombination is explored by using examples of human polymorphisms in the LDLRAP1 gene, and by the production of fertile offspring by crossbreeding. Examples of dogs, llamas and finches will be presented by a rational control of: natural crossbreeding of organisms with compatible genomes (something already happening in nature), the current work focuses on the generation of new varieties after a careful plan. This study is presented within the context of biosemiotics, which studies the processing of information, signaling and signs by living systems. I define a group of organisms having compatible genomes as a single theme: the genomic species or population, able to speak the same molecular language through different accents, with each variety within a theme being a different version of the same book. These studies have a molecular, compatible genetics context. Population and ecosystem biosemiotics will be exemplified by a possible genetic damage capable of causing mutations by breaking the rules of variation through the coordinated patterns of atoms present in the 9/11 World Trade Center contaminated dust (U, Ba, La, Ce, Sr, Rb, K, Mn, Mg, etc.), combination that may be able to overload the molecular quality control mechanisms of the human body. I introduce here the balance of codons in the circular genetic code: 2[1(1)+1(3)+1(4)+4(2)]=2[2(2)+3(4)]. PMID:21743816
Exploring metabolic pathway reconstruction and genome-wide expression profiling in Lactobacillus reuteri to define functional probiotic features.

PubMed

Saulnier, Delphine M; Santos, Filipe; Roos, Stefan; Mistretta, Toni-Ann; Spinler, Jennifer K; Molenaar, Douwe; Teusink, Bas; Versalovic, James

2011-04-29

The genomes of four Lactobacillus reuteri strains isolated from human breast milk and the gastrointestinal tract have been recently sequenced as part of the Human Microbiome Project. Preliminary genome comparisons suggested that these strains belong to two different clades, previously shown to differ with respect to antimicrobial production, biofilm formation, and immunomodulation. To explain possible mechanisms of survival in the host and probiosis, we completed a detailed genomic comparison of two breast milk-derived isolates representative of each group: an established probiotic strain (L. reuteri ATCC 55730) and a strain with promising probiotic features (L. reuteri ATCC PTA 6475). Transcriptomes of L. reuteri strains in different growth phases were monitored using strain-specific microarrays, and compared using a pan-metabolic model representing all known metabolic reactions present in these strains. Both strains contained candidate genes involved in the survival and persistence in the gut such as mucus-binding proteins and enzymes scavenging reactive oxygen species. A large operon predicted to encode the synthesis of an exopolysaccharide was identified in strain 55730. Both strains were predicted to produce health-promoting factors, including antimicrobial agents and vitamins (folate, vitamin B(12)). Additionally, a complete pathway for thiamine biosynthesis was predicted in strain 55730 for the first time in this species. Candidate genes responsible for immunomodulatory properties of each strain were identified by transcriptomic comparisons. The production of bioactive metabolites by human-derived probiotics may be predicted using metabolic modeling and transcriptomics. Such strategies may facilitate selection and optimization of probiotics for health promotion, disease prevention and amelioration.
DEFINING THE MANDATE OF PROTEOMICS IN THE POST-GENOMIC ERA: WORKSHOP REPORT

EPA Science Inventory

Research in proteomics is the next step after genomics in understanding life processes at the molecular level. In the largest sense proteomics encompasses knowledge of the structure, function and expression of all proteins in the biochemical or biological contexts of all organism...
An Evaluation Framework for Lossy Compression of Genome Sequencing Quality Values.

PubMed

Alberti, Claudio; Daniels, Noah; Hernaez, Mikel; Voges, Jan; Goldfeder, Rachel L; Hernandez-Lopez, Ana A; Mattavelli, Marco; Berger, Bonnie

2016-01-01

This paper provides the specification and an initial validation of an evaluation framework for the comparison of lossy compressors of genome sequencing quality values. The goal is to define reference data, test sets, tools and metrics that shall be used to evaluate the impact of lossy compression of quality values on human genome variant calling. The functionality of the framework is validated referring to two state-of-the-art genomic compressors. This work has been spurred by the current activity within the ISO/IEC SC29/WG11 technical committee (a.k.a. MPEG), which is investigating the possibility of starting a standardization activity for genomic information representation.
Early detection: the impact of genomics.

PubMed

van Lanschot, M C J; Bosch, L J W; de Wit, M; Carvalho, B; Meijer, G A

2017-08-01

The field of genomics has shifted our view on disease development by providing insights in the molecular and functional processes encoded in the genome. In the case of cancer, many alterations in the DNA accumulate that enable tumor growth or even metastatic dissemination. Identification of molecular signatures that define different stages of progression towards cancer can enable early tumor detection. In this review, the impact of genomics will be addressed using early detection of colorectal cancer (CRC) as an example. Increased understanding of the adenoma-to-carcinoma progression has led to the discovery of several diagnostic biomarkers. This combined with technical advancements, has facilitated the development of molecular tests for non-invasive early CRC detection in stool and blood samples. Even though several tests have already made it to clinical practice, sensitivity and specificity for the detection of precancerous lesions still need improvement. Besides the diagnostic qualities, also the accuracy of the intermediate endpoint is an important issue on how the effectiveness of a novel test is perceived. Here, progression biomarkers may provide a more precise measure than the currently used morphologically based features. Similar developments in biomarker use for early detection have taken place in other cancer types.
A short introduction to cytogenetic studies in mammals with reference to the present volume.

PubMed

Graphodatsky, A; Ferguson-Smith, M A; Stanyon, R

2012-01-01

Genome diversity has long been studied from the comparative cytogenetic perspective. Early workers documented differences between species in diploid chromosome number and fundamental number. Banding methods allowed more detailed descriptions of between-species rearrangements and classes of differentially staining chromosome material. The infusion of molecular methods into cytogenetics provided a third revolution, which is still not exhausted. Chromosome painting has provided a global view of the translocation history of mammalian genome evolution, well summarized in the contributions to this special volume. More recently, FISH of cloned DNA has provided details on defining breakpoint and intrachromosomal marker order, which have helped to document inversions and centromere repositioning. The most recent trend in comparative molecular cytogenetics is to integrate sequencing information in order to formulate and test reconstructions of ancestral genomes and phylogenomic hypotheses derived from comparative cytogenetics. The integration of comparative cytogenetics and sequencing promises to provide an understanding of what drives chromosome rearrangements and genome evolution in general. We believe that the contributions in this volume, in no small way, point the way to the next phase in cytogenetic studies. Copyright © 2012 S. Karger AG, Basel.
Pseudomonas caspiana sp. nov., a citrus pathogen in the Pseudomonas syringae phylogenetic group.

PubMed

Busquets, Antonio; Gomila, Margarita; Beiki, Farid; Mulet, Magdalena; Rahimian, Heshmat; García-Valdés, Elena; Lalucat, Jorge

2017-07-01

In a screening by multilocus sequence analysis of Pseudomonas strains isolated from diverse origins, 4 phylogenetically closely related strains (FBF58, FBF102 T , FBF103, and FBF122) formed a well-defined cluster in the Pseudomonas syringae phylogenetic group. The strains were isolated from citrus orchards in northern Iran with disease symptoms in the leaves and stems and its pathogenicity against citrus plants was demonstrated. The whole genome of the type strain of the proposed new species (FBF102 T =CECT 9164 T =CCUG 69273 T ) was sequenced and characterized. Comparative genomics with the 14 known Pseudomonas species type strains of the P. syringae phylogenetic group demonstrated that this strain belonged to a new genomic species, different from the species described thus far. Genome analysis detected genes predicted to be involved in pathogenesis, such as an atypical type 3 secretion system and two type 6 secretion systems, together with effectors and virulence factors. A polyphasic taxonomic characterization demonstrated that the 4 plant pathogenic strains represented a new species, for which the name Pseudomonas caspiana sp. nov. is proposed. Copyright © 2017 Elsevier GmbH. All rights reserved.
Molecular cloning of chitinase 33 (chit33) gene from Trichoderma atroviride

PubMed Central

Matroudi, S.; Zamani, M.R.; Motallebi, M.

2008-01-01

In this study Trichoderma atroviride was selected as over producer of chitinase enzyme among 30 different isolates of Trichoderma sp. on the basis of chitinase specific activity. From this isolate the genomic and cDNA clones encoding chit33 have been isolated and sequenced. Comparison of genomic and cDNA sequences for defining gene structure indicates that this gene contains three short introns and also an open reading frame coding for a protein of 321 amino acids. The deduced amino acid sequence includes a 19 aa putative signal peptide. Homology between this sequence and other reported Trichoderma Chit33 proteins are discussed. The coding sequence of chit33 gene was cloned in pEt26b(+) expression vector and expressed in E. coli. PMID:24031242
Genome sequencing of disease and carriage isolates of nontypeable Haemophilus influenzae identifies discrete population structure

PubMed Central

De Chiara, Matteo; Hood, Derek; Muzzi, Alessandro; Pickard, Derek J.; Perkins, Tim; Pizza, Mariagrazia; Dougan, Gordon; Rappuoli, Rino; Moxon, E. Richard; Soriani, Marco; Donati, Claudio

2014-01-01

One of the main hurdles for the development of an effective and broadly protective vaccine against nonencapsulated isolates of Haemophilus influenzae (NTHi) lies in the genetic diversity of the species, which renders extremely difficult the identification of cross-protective candidate antigens. To assess whether a population structure of NTHi could be defined, we performed genome sequencing of a collection of diverse clinical isolates representative of both carriage and disease and of the diversity of the natural population. Analysis of the distribution of polymorphic sites in the core genome and of the composition of the accessory genome defined distinct evolutionary clades and supported a predominantly clonal evolution of NTHi, with the majority of genetic information transmitted vertically within lineages. A correlation between the population structure and the presence of selected surface-associated proteins and lipooligosaccharide structure, known to contribute to virulence, was found. This high-resolution, genome-based population structure of NTHi provides the foundation to obtain a better understanding, of NTHi adaptation to the host as well as its commensal and virulence behavior, that could facilitate intervention strategies against disease caused by this important human pathogen. PMID:24706866
Genome sequencing of disease and carriage isolates of nontypeable Haemophilus influenzae identifies discrete population structure.

PubMed

De Chiara, Matteo; Hood, Derek; Muzzi, Alessandro; Pickard, Derek J; Perkins, Tim; Pizza, Mariagrazia; Dougan, Gordon; Rappuoli, Rino; Moxon, E Richard; Soriani, Marco; Donati, Claudio

2014-04-08

One of the main hurdles for the development of an effective and broadly protective vaccine against nonencapsulated isolates of Haemophilus influenzae (NTHi) lies in the genetic diversity of the species, which renders extremely difficult the identification of cross-protective candidate antigens. To assess whether a population structure of NTHi could be defined, we performed genome sequencing of a collection of diverse clinical isolates representative of both carriage and disease and of the diversity of the natural population. Analysis of the distribution of polymorphic sites in the core genome and of the composition of the accessory genome defined distinct evolutionary clades and supported a predominantly clonal evolution of NTHi, with the majority of genetic information transmitted vertically within lineages. A correlation between the population structure and the presence of selected surface-associated proteins and lipooligosaccharide structure, known to contribute to virulence, was found. This high-resolution, genome-based population structure of NTHi provides the foundation to obtain a better understanding, of NTHi adaptation to the host as well as its commensal and virulence behavior, that could facilitate intervention strategies against disease caused by this important human pathogen.
A mitochondrial genome phylogeny of termites (Blattodea: Termitoidae): robust support for interfamilial relationships and molecular synapomorphies define major clades.

PubMed

Cameron, Stephen L; Lo, Nathan; Bourguignon, Thomas; Svenson, Gavin J; Evans, Theodore A

2012-10-01

Despite their ecological significance as decomposers and their evolutionary significance as the most speciose eusocial insect group outside the Hymenoptera, termite (Blattodea: Termitoidae or Isoptera) evolutionary relationships have yet to be well resolved. Previous morphological and molecular analyses strongly conflict at the family level and are marked by poor support for backbone nodes. A mitochondrial (mt) genome phylogeny of termites was produced to test relationships between the recognised termite families, improve nodal support and test the phylogenetic utility of rare genomic changes found in the termite mt genome. Complete mt genomes were sequenced for 7 of the 9 extant termite families with additional representatives of each of the two most speciose families Rhinotermitidae (3 of 7 subfamilies) and Termitidae (3 of 8 subfamilies). The mt genome of the well supported sister-group of termites, the subsocial cockroach Cryptocercus, was also sequenced. A highly supported tree of termite relationships was produced by all analytical methods and data treatment approaches, however the relationship of the termites+Cryptocercus clade to other cockroach lineages was highly affected by the strong nucleotide compositional bias found in termites relative to other dictyopterans. The phylogeny supports previously proposed suprafamilial termite lineages, the Euisoptera and Neoisoptera, a later derived Kalotermitidae as sister group of the Neoisoptera and a monophyletic clade of dampwood (Stolotermitidae, Archotermopsidae) and harvester termites (Hodotermitidae). In contrast to previous termite phylogenetic studies, nodal supports were very high for family-level relationships within termites. Two rare genomic changes in the mt genome control region were found to be molecular synapomorphies for major clades. An elongated stem-loop structure defined the clade Polyphagidae + (Cryptocercus+termites), and a further series of compensatory base changes in this stem-loop is synapomorphic for the Neoisoptera. The complicated repeat structures first identified in Reticulitermes, composed of short (A-type) and long (B-type repeats) defines the clade Heterotermitinae+Termitidae, while the secondary loss of A-type repeats is synapomorphic for the non-macrotermitine Termitidae. Copyright © 2012 Elsevier Inc. All rights reserved.

Patient-derived organoid models help define personalized management of gastrointestinal cancer. | Office of Cancer Genomics

Cancer.gov

Background: The prognosis of patients with different gastrointestinal cancers varies widely. Despite advances in treatment strategies, such as extensive resections and the addition of new drugs to chemotherapy regimens, conventional treatment strategies have failed to improve survival for many tumours. Although promising, the clinical application of molecularly guided personalized treatment has proven to be challenging. This narrative review focuses on the personalization of cancer therapy using patient-derived three-dimensional 'organoid' models.
Constrained release of lamina-associated enhancers and genes from the nuclear envelope during T-cell activation facilitates their association in chromosome compartments

PubMed Central

de las Heras, Jose I.; Czapiewski, Rafal; Sivakumar, Aishwarya; Kerr, Alastair R.W.; Schirmer, Eric C.

2017-01-01

The 3D organization of the genome changes concomitantly with expression changes during hematopoiesis and immune activation. Studies have focused either on lamina-associated domains (LADs) or on topologically associated domains (TADs), defined by preferential local chromatin interactions, and chromosome compartments, defined as higher-order interactions between TADs sharing functionally similar states. However, few studies have investigated how these affect one another. To address this, we mapped LADs using Lamin B1–DamID during Jurkat T-cell activation, finding significant genome reorganization at the nuclear periphery dominated by release of loci frequently important for T-cell function. To assess how these changes at the nuclear periphery influence wider genome organization, our DamID data sets were contrasted with TADs and compartments. Features of specific repositioning events were then tested by fluorescence in situ hybridization during T-cell activation. First, considerable overlap between TADs and LADs was observed with the TAD repositioning as a unit. Second, A1 and A2 subcompartments are segregated in 3D space through differences in proximity to LADs along chromosomes. Third, genes and a putative enhancer in LADs that were released from the periphery during T-cell activation became preferentially associated with A2 subcompartments and were constrained to the relative proximity of the lamina. Thus, lamina associations influence internal nuclear organization, and changes in LADs during T-cell activation may provide an important additional mode of gene regulation. PMID:28424353
Metabolic pathways for the whole community.

PubMed

Hanson, Niels W; Konwar, Kishori M; Hawley, Alyse K; Altman, Tomer; Karp, Peter D; Hallam, Steven J

2014-07-22

A convergence of high-throughput sequencing and computational power is transforming biology into information science. Despite these technological advances, converting bits and bytes of sequence information into meaningful insights remains a challenging enterprise. Biological systems operate on multiple hierarchical levels from genomes to biomes. Holistic understanding of biological systems requires agile software tools that permit comparative analyses across multiple information levels (DNA, RNA, protein, and metabolites) to identify emergent properties, diagnose system states, or predict responses to environmental change. Here we adopt the MetaPathways annotation and analysis pipeline and Pathway Tools to construct environmental pathway/genome databases (ePGDBs) that describe microbial community metabolism using MetaCyc, a highly curated database of metabolic pathways and components covering all domains of life. We evaluate Pathway Tools' performance on three datasets with different complexity and coding potential, including simulated metagenomes, a symbiotic system, and the Hawaii Ocean Time-series. We define accuracy and sensitivity relationships between read length, coverage and pathway recovery and evaluate the impact of taxonomic pruning on ePGDB construction and interpretation. Resulting ePGDBs provide interactive metabolic maps, predict emergent metabolic pathways associated with biosynthesis and energy production and differentiate between genomic potential and phenotypic expression across defined environmental gradients. This multi-tiered analysis provides the user community with specific operating guidelines, performance metrics and prediction hazards for more reliable ePGDB construction and interpretation. Moreover, it demonstrates the power of Pathway Tools in predicting metabolic interactions in natural and engineered ecosystems.
Identification and bioinformatics comparison of two novel phosphatases in monoecious and gynoecious cucumber lines

NASA Astrophysics Data System (ADS)

Pawełkowicz, Magdalena E.; Wojcieszek, Michał; Osipowski, Paweł; Krzywkowski, Tomasz; PlÄ der, Wojciech; Przybecki, Zbigniew

2016-09-01

Two Arabidopsis thaliana genes from the PP2C family of protein phosphatases (AtABI1 and AtABI2) were used to find orthologous genes in the Cucumis sativus L. cv. Borszczagowski (cucumber) genome. Cucumber has been used as a model plant for sex expression studies because although it has been defined as a monoecious species, numerous genotypes are known to produce only female, only male, or hermaphroditic flowers. We identified two new orthologous genes of AtABI1 and AtABI2 in the cucumber genome and named them CsABI1 and CsABI2. To determine the relationships between the regulation of CsABI1 and CsABI2 and flower morphogenesis in cucumber, we performed various computational analyses to define the structure of the genes, and to predict regulatory elements and protein motifs in their sequences. We also performed an expression analysis to identify differences in the expression levels of CsABI1 and CsABI2 in vegetative and generative tissues (leaf, shoot apex, and flower buds) of monoecious (B10) and gynoecious (2gg) cucumber lines. We found that the expressions of CsABI1 and CsABI2 differed in male and female floral buds, and correlated these findings with the abscisic acid signaling pathways in male and female flowers.
Transcriptome analyses of rhesus monkey preimplantation embryos reveal a reduced capacity for DNA double-strand break repair in primate oocytes and early embryos

PubMed Central

Wang, Xinyi; Liu, Denghui; He, Dajian; Suo, Shengbao; Xia, Xian; He, Xiechao; Han, Jing-Dong J.; Zheng, Ping

2017-01-01

Preimplantation embryogenesis encompasses several critical events including genome reprogramming, zygotic genome activation (ZGA), and cell-fate commitment. The molecular basis of these processes remains obscure in primates in which there is a high rate of embryo wastage. Thus, understanding the factors involved in genome reprogramming and ZGA might help reproductive success during this susceptible period of early development and generate induced pluripotent stem cells with greater efficiency. Moreover, explaining the molecular basis responsible for embryo wastage in primates will greatly expand our knowledge of species evolution. By using RNA-seq in single and pooled oocytes and embryos, we defined the transcriptome throughout preimplantation development in rhesus monkey. In comparison to archival human and mouse data, we found that the transcriptome dynamics of monkey oocytes and embryos were very similar to those of human but very different from those of mouse. We identified several classes of maternal and zygotic genes, whose expression peaks were highly correlated with the time frames of genome reprogramming, ZGA, and cell-fate commitment, respectively. Importantly, comparison of the ZGA-related network modules among the three species revealed less robust surveillance of genomic instability in primate oocytes and embryos than in rodents, particularly in the pathways of DNA damage signaling and homology-directed DNA double-strand break repair. This study highlights the utility of monkey models to better understand the molecular basis for genome reprogramming, ZGA, and genomic stability surveillance in human early embryogenesis and may provide insights for improved homologous recombination-mediated gene editing in monkey. PMID:28223401
Segmental Duplications and Copy-Number Variation in the Human Genome

PubMed Central

Sharp, Andrew J. ; Locke, Devin P. ; McGrath, Sean D. ; Cheng, Ze ; Bailey, Jeffrey A. ; Vallente, Rhea U. ; Pertz, Lisa M. ; Clark, Royden A. ; Schwartz, Stuart ; Segraves, Rick ; Oseroff, Vanessa V. ; Albertson, Donna G. ; Pinkel, Daniel ; Eichler, Evan E.

2005-01-01

The human genome contains numerous blocks of highly homologous duplicated sequence. This higher-order architecture provides a substrate for recombination and recurrent chromosomal rearrangement associated with genomic disease. However, an assessment of the role of segmental duplications in normal variation has not yet been made. On the basis of the duplication architecture of the human genome, we defined a set of 130 potential rearrangement hotspots and constructed a targeted bacterial artificial chromosome (BAC) microarray (with 2,194 BACs) to assess copy-number variation in these regions by array comparative genomic hybridization. Using our segmental duplication BAC microarray, we screened a panel of 47 normal individuals, who represented populations from four continents, and we identified 119 regions of copy-number polymorphism (CNP), 73 of which were previously unreported. We observed an equal frequency of duplications and deletions, as well as a 4-fold enrichment of CNPs within hotspot regions, compared with control BACs (P < .000001), which suggests that segmental duplications are a major catalyst of large-scale variation in the human genome. Importantly, segmental duplications themselves were also significantly enriched >4-fold within regions of CNP. Almost without exception, CNPs were not confined to a single population, suggesting that these either are recurrent events, having occurred independently in multiple founders, or were present in early human populations. Our study demonstrates that segmental duplications define hotspots of chromosomal rearrangement, likely acting as mediators of normal variation as well as genomic disease, and it suggests that the consideration of genomic architecture can significantly improve the ascertainment of large-scale rearrangements. Our specialized segmental duplication BAC microarray and associated database of structural polymorphisms will provide an important resource for the future characterization of human genomic disorders. PMID:15918152
Fatty acid-binding protein genes of the ancient, air-breathing, ray-finned fish, spotted gar (Lepisosteus oculatus).

PubMed

Venkatachalam, Ananda B; Fontenot, Quenton; Farrara, Allyse; Wright, Jonathan M

2018-03-01

With the advent of high-throughput DNA sequencing technology, the genomic sequence of many disparate species has led to the relatively new discipline of genomics, the study of genome structure, function and evolution. Much work has been focused on the role of whole genome duplications (WGD) in the architecture of extant vertebrate genomes, particularly those of teleost fishes which underwent a WGD early in the teleost radiation >230 million years ago (mya). Our past work has focused on the fate of duplicated copies of a multigene family coding for the intracellular lipid-binding protein (iLBP) genes in the teleost fishes. To define the evolutionary processes that determined the fate of duplicated genes and generated the structure of extant fish genomes, however, requires comparative genomic analysis with a fish lineage that diverged before the teleost WGD, such as the spotted gar (Lepisosteus oculatus), an ancient, air-breathing, ray-finned fish. Here, we describe the genomic organization, chromosomal location and tissue-specific expression of a subfamily of the iLBP genes that code for fatty acid-binding proteins (Fabps) in spotted gar. Based on this work, we have defined the minimum suite of fabp genes prior to their duplication in the teleost lineages ~230-400 mya. Spotted gar, therefore, serves as an appropriate outgroup, or ancestral/ancient fish, that did not undergo the teleost-specific WGD. As such, analyses of the spatio-temporal regulation of spotted gar genes provides a foundation to determine whether the duplicated fabp genes have been retained in teleost genomes owing to either sub- or neofunctionalization. Copyright © 2017 Elsevier Inc. All rights reserved.
Defining the diverse spectrum of inversions, complex structural variation, and chromothripsis in the morbid human genome.

PubMed

Collins, Ryan L; Brand, Harrison; Redin, Claire E; Hanscom, Carrie; Antolik, Caroline; Stone, Matthew R; Glessner, Joseph T; Mason, Tamara; Pregno, Giulia; Dorrani, Naghmeh; Mandrile, Giorgia; Giachino, Daniela; Perrin, Danielle; Walsh, Cole; Cipicchio, Michelle; Costello, Maura; Stortchevoi, Alexei; An, Joon-Yong; Currall, Benjamin B; Seabra, Catarina M; Ragavendran, Ashok; Margolin, Lauren; Martinez-Agosto, Julian A; Lucente, Diane; Levy, Brynn; Sanders, Stephan J; Wapner, Ronald J; Quintero-Rivera, Fabiola; Kloosterman, Wigard; Talkowski, Michael E

2017-03-06

Structural variation (SV) influences genome organization and contributes to human disease. However, the complete mutational spectrum of SV has not been routinely captured in disease association studies. We sequenced 689 participants with autism spectrum disorder (ASD) and other developmental abnormalities to construct a genome-wide map of large SV. Using long-insert jumping libraries at 105X mean physical coverage and linked-read whole-genome sequencing from 10X Genomics, we document seven major SV classes at ~5 kb SV resolution. Our results encompass 11,735 distinct large SV sites, 38.1% of which are novel and 16.8% of which are balanced or complex. We characterize 16 recurrent subclasses of complex SV (cxSV), revealing that: (1) cxSV are larger and rarer than canonical SV; (2) each genome harbors 14 large cxSV on average; (3) 84.4% of large cxSVs involve inversion; and (4) most large cxSV (93.8%) have not been delineated in previous studies. Rare SVs are more likely to disrupt coding and regulatory non-coding loci, particularly when truncating constrained and disease-associated genes. We also identify multiple cases of catastrophic chromosomal rearrangements known as chromoanagenesis, including somatic chromoanasynthesis, and extreme balanced germline chromothripsis events involving up to 65 breakpoints and 60.6 Mb across four chromosomes, further defining rare categories of extreme cxSV. These data provide a foundational map of large SV in the morbid human genome and demonstrate a previously underappreciated abundance and diversity of cxSV that should be considered in genomic studies of human disease.
Technological advances and genomics in metazoan parasites.

PubMed

Knox, D P

2004-02-01

Molecular biology has provided the means to identify parasite proteins, to define their function, patterns of expression and the means to produce them in quantity for subsequent functional analyses. Whole genome and expressed sequence tag programmes, and the parallel development of powerful bioinformatics tools, allow the execution of genome-wide between stage or species comparisons and meaningful gene-expression profiling. The latter can be undertaken with several new technologies such as DNA microarray and serial analysis of gene expression. Proteome analysis has come to the fore in recent years providing a crucial link between the gene and its protein product. RNA interference and ballistic gene transfer are exciting developments which can provide the means to precisely define the function of individual genes and, of importance in devising novel parasite control strategies, the effect that gene knockdown will have on parasite survival.
Genomic Predictions and Genome-Wide Association Study of Resistance Against Piscirickettsia salmonis in Coho Salmon (Oncorhynchus kisutch) Using ddRAD Sequencing

PubMed Central

Barría, Agustín; Christensen, Kris A.; Yoshida, Grazyella M.; Correa, Katharina; Jedlicki, Ana; Lhorente, Jean P.; Davidson, William S.; Yáñez, José M.

2018-01-01

Piscirickettsia salmonis is one of the main infectious diseases affecting coho salmon (Oncorhynchus kisutch) farming, and current treatments have been ineffective for the control of this disease. Genetic improvement for P. salmonis resistance has been proposed as a feasible alternative for the control of this infectious disease in farmed fish. Genotyping by sequencing (GBS) strategies allow genotyping of hundreds of individuals with thousands of single nucleotide polymorphisms (SNPs), which can be used to perform genome wide association studies (GWAS) and predict genetic values using genome-wide information. We used double-digest restriction-site associated DNA (ddRAD) sequencing to dissect the genetic architecture of resistance against P. salmonis in a farmed coho salmon population and to identify molecular markers associated with the trait. We also evaluated genomic selection (GS) models in order to determine the potential to accelerate the genetic improvement of this trait by means of using genome-wide molecular information. A total of 764 individuals from 33 full-sib families (17 highly resistant and 16 highly susceptible) were experimentally challenged against P. salmonis and their genotypes were assayed using ddRAD sequencing. A total of 9,389 SNPs markers were identified in the population. These markers were used to test genomic selection models and compare different GWAS methodologies for resistance measured as day of death (DD) and binary survival (BIN). Genomic selection models showed higher accuracies than the traditional pedigree-based best linear unbiased prediction (PBLUP) method, for both DD and BIN. The models showed an improvement of up to 95% and 155% respectively over PBLUP. One SNP related with B-cell development was identified as a potential functional candidate associated with resistance to P. salmonis defined as DD. PMID:29440129
SeeGH--a software tool for visualization of whole genome array comparative genomic hybridization data.

PubMed

Chi, Bryan; DeLeeuw, Ronald J; Coe, Bradley P; MacAulay, Calum; Lam, Wan L

2004-02-09

Array comparative genomic hybridization (CGH) is a technique which detects copy number differences in DNA segments. Complete sequencing of the human genome and the development of an array representing a tiling set of tens of thousands of DNA segments spanning the entire human genome has made high resolution copy number analysis throughout the genome possible. Since array CGH provides signal ratio for each DNA segment, visualization would require the reassembly of individual data points into chromosome profiles. We have developed a visualization tool for displaying whole genome array CGH data in the context of chromosomal location. SeeGH is an application that translates spot signal ratio data from array CGH experiments to displays of high resolution chromosome profiles. Data is imported from a simple tab delimited text file obtained from standard microarray image analysis software. SeeGH processes the signal ratio data and graphically displays it in a conventional CGH karyotype diagram with the added features of magnification and DNA segment annotation. In this process, SeeGH imports the data into a database, calculates the average ratio and standard deviation for each replicate spot, and links them to chromosome regions for graphical display. Once the data is displayed, users have the option of hiding or flagging DNA segments based on user defined criteria, and retrieve annotation information such as clone name, NCBI sequence accession number, ratio, base pair position on the chromosome, and standard deviation. SeeGH represents a novel software tool used to view and analyze array CGH data. The software gives users the ability to view the data in an overall genomic view as well as magnify specific chromosomal regions facilitating the precise localization of genetic alterations. SeeGH is easily installed and runs on Microsoft Windows 2000 or later environments.
Genetic Diversity of the Ordinary Strain of Potato virus Y (PVY) and Origin of Recombinant PVY Strains

PubMed Central

Karasev, Alexander V.; Hu, Xiaojun; Brown, Celeste J.; Kerlan, Camille; Nikolaeva, Olga V.; Crosslin, James M.; Gray, Stewart M.

2011-01-01

The ordinary strain of Potato virus Y (PVY), PVYO, causes mild mosaic in tobacco and induces necrosis and severe stunting in potato cultivars carrying the Ny gene. A novel substrain of PVYO was recently reported, PVYO-O5, which is spreading in the United States and is distinguished from other PVYO isolates serologically (i.e., reacting to the otherwise PVYN-specific monoclonal antibody 1F5). To characterize this new PVYO-O5 subgroup and address possible reasons for its continued spread, we conducted a molecular study of PVYO and PVYO-O5 isolates from a North American collection of PVY through whole-genome sequencing and phylogenetic analysis. In all, 44 PVYO isolates were sequenced, including 31 from the previously defined PVYO-O5 group, and subjected to whole-genome analysis. PVYO-O5 isolates formed a separate lineage within the PVYO genome cluster in the whole-genome phylogenetic tree and represented a novel evolutionary lineage of PVY from potato. On the other hand, the PVYO sequences separated into at least two distinct lineages on the whole-genome phylogenetic tree. To shed light on the origin of the three most common PVY recombinants, a more detailed phylogenetic analysis of a sequence fragment, nucleotides 2,406 to 5,821, that is present in all recombinant and nonrecombinant PVYO genomes was conducted. The analysis revealed that PVYN:O and PVYN-Wi recombinants acquired their PVYO segments from two separate PVYO lineages, whereas the PVYNTN recombinant acquired its PVYO segment from the same lineage as PVYN:O. These data suggest that PVYN:O and PVYN-Wi recombinants originated from two separate recombination events involving two different PVYO parental genomes, whereas the PVYNTN recombinants likely originated from the PVYN:O genome via additional recombination events. PMID:21675922
Identification and nucleotide sequence analysis of the repetitive DNA element in the genome of fish lymphocystis disease virus.

PubMed

Schnitzler, P; Delius, H; Scholz, J; Touray, M; Orth, E; Darai, G

1987-12-01

The genome of the fish lymphocystis disease virus (FLDV) was screened for the existence of repetitive DNA sequences using a defined and complete gene library of the viral genome (98 kbp) by DNA-DNA hybridization, heteroduplex analysis, and restriction fine mapping. A repetitive DNA sequence was detected at the coordinates 0.034 to 0.057 and 0.718 to 0.736 map units (m.u.) of the FLDV genome. The first region (0.034 to 0.057 m.u.) corresponds to the 5' terminus of the EcoRI FLDV DNA fragment B (0.034 to 0.165 m.u.) and the second region (0.718 to 0.736 m.u.) is identical to the EcoRI DNA fragment M of the viral genome. The DNA nucleotide sequence of the EcoRI FLDV DNA fragment M was determined. This analysis revealed the presence of many short direct and inverted repetitions, e.g., a 18-mer direct repetition (TTTAAAATTTAATTAA) that started at nucleotide positions 812 and 942 and a 14-mer inverted repeat (TTAAATTTAAATTT) at nucleotide positions 820 and 959. Only short open reading frames were detected within this region. The DNA repetitions are discussed as sequences that play a possible regulatory role for virus replication. Furthermore, hybridization experiments revealed that the repetitive DNA sequences are conserved in the genome of different strains of fish lymphocystis disease virus isolated from two species of Pleuronectidae (flounder and dab).
The interaction between cytosine methylation and processes of DNA replication and repair shape the mutational landscape of cancer genomes.

PubMed

Poulos, Rebecca C; Olivier, Jake; Wong, Jason W H

2017-07-27

Methylated cytosines (5mCs) are frequently mutated in the genome. However, no studies have yet comprehensively analysed mutation-methylation associations across cancer types. Here we analyse 916 cancer genomes, together with tissue type-specific methylation and replication timing data. We describe a strong mutation-methylation association across colorectal cancer subtypes, most interestingly in samples with microsatellite instability (MSI) or Polymerase epsilon (POLE) exonuclease domain mutations. By analysing genomic regions with differential mismatch repair (MMR) efficiency, we suggest a possible role for MMR in the correction of 5mC deamination events, potentially accounting for the high rate of 5mC mutation accumulation in MSI tumours. Additionally, we propose that mutant POLE asserts a mutator phenotype specifically at 5mCs, and we find coding mutation hotspots in POLE-mutant cancers at highly-methylated CpGs in the tumour-suppressor genes APC and TP53. Finally, using multivariable regression models, we demonstrate that different cancers exhibit distinct mutation-methylation associations, with DNA repair influencing such associations in certain cancer genomes. Taken together, we find differential associations with methylation that are vital for accurately predicting expected mutation loads across cancer types. Our findings reveal links between methylation and common mutation and repair processes, with these mechanisms defining a key part of the mutational landscape of cancer genomes. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Construction of an ultra-high density consensus genetic map, and enhancement of the physical map from genome sequencing in Lupinus angustifolius.

PubMed

Zhou, Gaofeng; Jian, Jianbo; Wang, Penghao; Li, Chengdao; Tao, Ye; Li, Xuan; Renshaw, Daniel; Clements, Jonathan; Sweetingham, Mark; Yang, Huaan

2018-01-01

An ultra-high density genetic map containing 34,574 sequence-defined markers was developed in Lupinus angustifolius. Markers closely linked to nine genes of agronomic traits were identified. A physical map was improved to cover 560.5 Mb genome sequence. Lupin (Lupinus angustifolius L.) is a recently domesticated legume grain crop. In this study, we applied the restriction-site associated DNA sequencing (RADseq) method to genotype an F 9 recombinant inbred line population derived from a wild type × domesticated cultivar (W × D) cross. A high density linkage map was developed based on the W × D population. By integrating sequence-defined DNA markers reported in previous mapping studies, we established an ultra-high density consensus genetic map, which contains 34,574 markers consisting of 3508 loci covering 2399 cM on 20 linkage groups. The largest gap in the entire consensus map was 4.73 cM. The high density W × D map and the consensus map were used to develop an improved physical map, which covered 560.5 Mb of genome sequence data. The ultra-high density consensus linkage map, the improved physical map and the markers linked to genes of breeding interest reported in this study provide a common tool for genome sequence assembly, structural genomics, comparative genomics, functional genomics, QTL mapping, and molecular plant breeding in lupin.
Gene transfer agents: phage-like elements of genetic exchange

PubMed Central

Lang, Andrew S.; Zhaxybayeva, Olga; Beatty, J. Thomas

2013-01-01

Horizontal gene transfer is important in the evolution of bacterial and archaeal genomes. An interesting genetic exchange process is carried out by diverse phage-like gene transfer agents (GTAs) that are found in a wide range of prokaryotes. Although GTAs resemble phages, they lack the hallmark capabilities that define typical phages, and they package random pieces of the producing cell’s genome. In this Review, we discuss the defining characteristics of the GTAs that have been identified to date, along with potential functions for these agents and the possible evolutionary forces that act on the genes involved in their production. PMID:22683880
Search for sarcoidosis candidate genes by integration of data from genomic, transcriptomic and proteomic studies.

PubMed

Maver, Ales; Medica, Igor; Peterlin, Borut

2009-12-01

The search for gene candidates in multifactorial diseases such as sarcoidosis can be based on the integration of linkage association data, gene expression data, and protein profile data from genomic, transcriptomic and proteomic studies, respectively. In this study we performed a literature-based search for studies reporting such data, followed by integration of collected information. Different databases were examined--Medline, HugGE Navigator, ArrayExpress and Gene Expression Omnibus (GEO). Candidate genes were defined as genes which were reported in at least 2 different types of omics studies. Genes previously investigated in sarcoidosis were excluded from further analyses. We identified 177 genes associated with sarcoidosis as potential new candidate genes. Subsequently, 9 gene candidates identified to overlap in 2 different types of studies (genomic, transcriptomic and/or proteomic) were consistently reported in at least 3 studies: SERPINB1, FABP4, S100A8, HBEGF, IL7R, LRIG1, PTPN23, DPM2 and NUP214. These genes are involved in regulation of immune response, cellular proliferation, apoptosis, inhibition of protease activity, lipid metabolism. Exact biological functions of HBEGF, LRIG1, PTPN23, DPM2 and NUP214 remain to be completely elucidated. We propose 9 candidate genes: SERPINB1, FABP4, S100A8, HBEGF, IL7R, LRIG1, PTPN23, DPM2 and NUP214, as genes with high potential for association with sarcoidosis.
[Genome editing of industrial microorganism].

PubMed

Zhu, Linjiang; Li, Qi

2015-03-01

Genome editing is defined as highly-effective and precise modification of cellular genome in a large scale. In recent years, such genome-editing methods have been rapidly developed in the field of industrial strain improvement. The quickly-updating methods thoroughly change the old mode of inefficient genetic modification, which is "one modification, one selection marker, and one target site". Highly-effective modification mode in genome editing have been developed including simultaneous modification of multiplex genes, highly-effective insertion, replacement, and deletion of target genes in the genome scale, cut-paste of a large DNA fragment. These new tools for microbial genome editing will certainly be applied widely, and increase the efficiency of industrial strain improvement, and promote the revolution of traditional fermentation industry and rapid development of novel industrial biotechnology like production of biofuel and biomaterial. The technological principle of these genome-editing methods and their applications were summarized in this review, which can benefit engineering and construction of industrial microorganism.
Identification, characterization and distribution of transposable elements in the flax (Linum usitatissimum L.) genome.

PubMed

González, Leonardo Galindo; Deyholos, Michael K

2012-11-21

Flax (Linum usitatissimum L.) is an important crop for the production of bioproducts derived from its seed and stem fiber. Transposable elements (TEs) are widespread in plant genomes and are a key component of their evolution. The availability of a genome assembly of flax (Linum usitatissimum) affords new opportunities to explore the diversity of TEs and their relationship to genes and gene expression. Four de novo repeat identification algorithms (PILER, RepeatScout, LTR_finder and LTR_STRUC) were applied to the flax genome assembly. The resulting library of flax repeats was combined with the RepBase Viridiplantae division and used with RepeatMasker to identify TEs coverage in the genome. LTR retrotransposons were the most abundant TEs (17.2% genome coverage), followed by Long Interspersed Nuclear Element (LINE) retrotransposons (2.10%) and Mutator DNA transposons (1.99%). Comparison of putative flax TEs to flax transcript databases indicated that TEs are not highly expressed in flax. However, the presence of recent insertions, defined by 100% intra-element LTR similarity, provided evidence for recent TE activity. Spatial analysis showed TE-rich regions, gene-rich regions as well as regions with similar genes and TE density. Monte Carlo simulations for the 71 largest scaffolds (≥ 1 Mb each) did not show any regional differences in the frequency of TE overlap with gene coding sequences. However, differences between TE superfamilies were found in their proximity to genes. Genes within TE-rich regions also appeared to have lower transcript expression, based on EST abundance. When LTR elements were compared, Copia showed more diversity, recent insertions and conserved domains than the Gypsy, demonstrating their importance in genome evolution. The calculated 23.06% TE coverage of the flax WGS assembly is at the low end of the range of TE coverages reported in other eudicots, although this estimate does not include TEs likely found in unassembled repetitive regions of the genome. Since enrichment for TEs in genomic regions was associated with reduced expression of neighbouring genes, and many members of the Copia LTR superfamily are inserted close to coding regions, we suggest Copia elements have a greater influence on recent flax genome evolution while Gypsy elements have become residual and highly mutated.
Identification, characterization and distribution of transposable elements in the flax (Linum usitatissimum L.) genome

PubMed Central

2012-01-01

Background Flax (Linum usitatissimum L.) is an important crop for the production of bioproducts derived from its seed and stem fiber. Transposable elements (TEs) are widespread in plant genomes and are a key component of their evolution. The availability of a genome assembly of flax (Linum usitatissimum) affords new opportunities to explore the diversity of TEs and their relationship to genes and gene expression. Results Four de novo repeat identification algorithms (PILER, RepeatScout, LTR_finder and LTR_STRUC) were applied to the flax genome assembly. The resulting library of flax repeats was combined with the RepBase Viridiplantae division and used with RepeatMasker to identify TEs coverage in the genome. LTR retrotransposons were the most abundant TEs (17.2% genome coverage), followed by Long Interspersed Nuclear Element (LINE) retrotransposons (2.10%) and Mutator DNA transposons (1.99%). Comparison of putative flax TEs to flax transcript databases indicated that TEs are not highly expressed in flax. However, the presence of recent insertions, defined by 100% intra-element LTR similarity, provided evidence for recent TE activity. Spatial analysis showed TE-rich regions, gene-rich regions as well as regions with similar genes and TE density. Monte Carlo simulations for the 71 largest scaffolds (≥ 1 Mb each) did not show any regional differences in the frequency of TE overlap with gene coding sequences. However, differences between TE superfamilies were found in their proximity to genes. Genes within TE-rich regions also appeared to have lower transcript expression, based on EST abundance. When LTR elements were compared, Copia showed more diversity, recent insertions and conserved domains than the Gypsy, demonstrating their importance in genome evolution. Conclusions The calculated 23.06% TE coverage of the flax WGS assembly is at the low end of the range of TE coverages reported in other eudicots, although this estimate does not include TEs likely found in unassembled repetitive regions of the genome. Since enrichment for TEs in genomic regions was associated with reduced expression of neighbouring genes, and many members of the Copia LTR superfamily are inserted close to coding regions, we suggest Copia elements have a greater influence on recent flax genome evolution while Gypsy elements have become residual and highly mutated. PMID:23171245

A framework genetic map for Miscanthus sinensis from RNAseq-based markers shows recent tetraploidy

PubMed Central

2012-01-01

Background Miscanthus (subtribe Saccharinae, tribe Andropogoneae, family Poaceae) is a genus of temperate perennial C4 grasses whose high biomass production makes it, along with its close relatives sugarcane and sorghum, attractive as a biofuel feedstock. The base chromosome number of Miscanthus (x = 19) is different from that of other Saccharinae and approximately twice that of the related Sorghum bicolor (x = 10), suggesting large-scale duplications may have occurred in recent ancestors of Miscanthus. Owing to the complexity of the Miscanthus genome and the complications of self-incompatibility, a complete genetic map with a high density of markers has not yet been developed. Results We used deep transcriptome sequencing (RNAseq) from two M. sinensis accessions to define 1536 single nucleotide variants (SNVs) for a GoldenGate™ genotyping array, and found that simple sequence repeat (SSR) markers defined in sugarcane are often informative in M. sinensis. A total of 658 SNP and 210 SSR markers were validated via segregation in a full sibling F1 mapping population. Using 221 progeny from this mapping population, we constructed a genetic map for M. sinensis that resolves into 19 linkage groups, the haploid chromosome number expected from cytological evidence. Comparative genomic analysis documents a genome-wide duplication in Miscanthus relative to Sorghum bicolor, with subsequent insertional fusion of a pair of chromosomes. The utility of the map is confirmed by the identification of two paralogous C4-pyruvate, phosphate dikinase (C4-PPDK) loci in Miscanthus, at positions syntenic to the single orthologous gene in Sorghum. Conclusions The genus Miscanthus experienced an ancestral tetraploidy and chromosome fusion prior to its diversification, but after its divergence from the closely related sugarcane clade. The recent timing of this tetraploidy complicates discovery and mapping of genetic markers for Miscanthus species, since alleles and fixed differences between paralogs are comparable. These difficulties can be overcome by careful analysis of segregation patterns in a mapping population and genotyping of doubled haploids. The genetic map for Miscanthus will be useful in biological discovery and breeding efforts to improve this emerging biofuel crop, and also provide a valuable resource for understanding genomic responses to tetraploidy and chromosome fusion. PMID:22524439
Genetic fine mapping and genomic annotation defines causal mechanisms at type 2 diabetes susceptibility loci.

PubMed

Gaulton, Kyle J; Ferreira, Teresa; Lee, Yeji; Raimondo, Anne; Mägi, Reedik; Reschen, Michael E; Mahajan, Anubha; Locke, Adam; Rayner, N William; Robertson, Neil; Scott, Robert A; Prokopenko, Inga; Scott, Laura J; Green, Todd; Sparso, Thomas; Thuillier, Dorothee; Yengo, Loic; Grallert, Harald; Wahl, Simone; Frånberg, Mattias; Strawbridge, Rona J; Kestler, Hans; Chheda, Himanshu; Eisele, Lewin; Gustafsson, Stefan; Steinthorsdottir, Valgerdur; Thorleifsson, Gudmar; Qi, Lu; Karssen, Lennart C; van Leeuwen, Elisabeth M; Willems, Sara M; Li, Man; Chen, Han; Fuchsberger, Christian; Kwan, Phoenix; Ma, Clement; Linderman, Michael; Lu, Yingchang; Thomsen, Soren K; Rundle, Jana K; Beer, Nicola L; van de Bunt, Martijn; Chalisey, Anil; Kang, Hyun Min; Voight, Benjamin F; Abecasis, Gonçalo R; Almgren, Peter; Baldassarre, Damiano; Balkau, Beverley; Benediktsson, Rafn; Blüher, Matthias; Boeing, Heiner; Bonnycastle, Lori L; Bottinger, Erwin P; Burtt, Noël P; Carey, Jason; Charpentier, Guillaume; Chines, Peter S; Cornelis, Marilyn C; Couper, David J; Crenshaw, Andrew T; van Dam, Rob M; Doney, Alex S F; Dorkhan, Mozhgan; Edkins, Sarah; Eriksson, Johan G; Esko, Tonu; Eury, Elodie; Fadista, João; Flannick, Jason; Fontanillas, Pierre; Fox, Caroline; Franks, Paul W; Gertow, Karl; Gieger, Christian; Gigante, Bruna; Gottesman, Omri; Grant, George B; Grarup, Niels; Groves, Christopher J; Hassinen, Maija; Have, Christian T; Herder, Christian; Holmen, Oddgeir L; Hreidarsson, Astradur B; Humphries, Steve E; Hunter, David J; Jackson, Anne U; Jonsson, Anna; Jørgensen, Marit E; Jørgensen, Torben; Kao, Wen-Hong L; Kerrison, Nicola D; Kinnunen, Leena; Klopp, Norman; Kong, Augustine; Kovacs, Peter; Kraft, Peter; Kravic, Jasmina; Langford, Cordelia; Leander, Karin; Liang, Liming; Lichtner, Peter; Lindgren, Cecilia M; Lindholm, Eero; Linneberg, Allan; Liu, Ching-Ti; Lobbens, Stéphane; Luan, Jian'an; Lyssenko, Valeriya; Männistö, Satu; McLeod, Olga; Meyer, Julia; Mihailov, Evelin; Mirza, Ghazala; Mühleisen, Thomas W; Müller-Nurasyid, Martina; Navarro, Carmen; Nöthen, Markus M; Oskolkov, Nikolay N; Owen, Katharine R; Palli, Domenico; Pechlivanis, Sonali; Peltonen, Leena; Perry, John R B; Platou, Carl G P; Roden, Michael; Ruderfer, Douglas; Rybin, Denis; van der Schouw, Yvonne T; Sennblad, Bengt; Sigurðsson, Gunnar; Stančáková, Alena; Steinbach, Gerald; Storm, Petter; Strauch, Konstantin; Stringham, Heather M; Sun, Qi; Thorand, Barbara; Tikkanen, Emmi; Tonjes, Anke; Trakalo, Joseph; Tremoli, Elena; Tuomi, Tiinamaija; Wennauer, Roman; Wiltshire, Steven; Wood, Andrew R; Zeggini, Eleftheria; Dunham, Ian; Birney, Ewan; Pasquali, Lorenzo; Ferrer, Jorge; Loos, Ruth J F; Dupuis, Josée; Florez, Jose C; Boerwinkle, Eric; Pankow, James S; van Duijn, Cornelia; Sijbrands, Eric; Meigs, James B; Hu, Frank B; Thorsteinsdottir, Unnur; Stefansson, Kari; Lakka, Timo A; Rauramaa, Rainer; Stumvoll, Michael; Pedersen, Nancy L; Lind, Lars; Keinanen-Kiukaanniemi, Sirkka M; Korpi-Hyövälti, Eeva; Saaristo, Timo E; Saltevo, Juha; Kuusisto, Johanna; Laakso, Markku; Metspalu, Andres; Erbel, Raimund; Jöcke, Karl-Heinz; Moebus, Susanne; Ripatti, Samuli; Salomaa, Veikko; Ingelsson, Erik; Boehm, Bernhard O; Bergman, Richard N; Collins, Francis S; Mohlke, Karen L; Koistinen, Heikki; Tuomilehto, Jaakko; Hveem, Kristian; Njølstad, Inger; Deloukas, Panagiotis; Donnelly, Peter J; Frayling, Timothy M; Hattersley, Andrew T; de Faire, Ulf; Hamsten, Anders; Illig, Thomas; Peters, Annette; Cauchi, Stephane; Sladek, Rob; Froguel, Philippe; Hansen, Torben; Pedersen, Oluf; Morris, Andrew D; Palmer, Collin N A; Kathiresan, Sekar; Melander, Olle; Nilsson, Peter M; Groop, Leif C; Barroso, Inês; Langenberg, Claudia; Wareham, Nicholas J; O'Callaghan, Christopher A; Gloyn, Anna L; Altshuler, David; Boehnke, Michael; Teslovich, Tanya M; McCarthy, Mark I; Morris, Andrew P

2015-12-01

We performed fine mapping of 39 established type 2 diabetes (T2D) loci in 27,206 cases and 57,574 controls of European ancestry. We identified 49 distinct association signals at these loci, including five mapping in or near KCNQ1. 'Credible sets' of the variants most likely to drive each distinct signal mapped predominantly to noncoding sequence, implying that association with T2D is mediated through gene regulation. Credible set variants were enriched for overlap with FOXA2 chromatin immunoprecipitation binding sites in human islet and liver cells, including at MTNR1B, where fine mapping implicated rs10830963 as driving T2D association. We confirmed that the T2D risk allele for this SNP increases FOXA2-bound enhancer activity in islet- and liver-derived cells. We observed allele-specific differences in NEUROD1 binding in islet-derived cells, consistent with evidence that the T2D risk allele increases islet MTNR1B expression. Our study demonstrates how integration of genetic and genomic information can define molecular mechanisms through which variants underlying association signals exert their effects on disease.
The Emergence of Pan-Cancer CIMP and Its Elusive Interpretation

PubMed Central

Miller, Brendan F.; Sánchez-Vega, Francisco; Elnitski, Laura

2016-01-01

Epigenetic dysregulation is recognized as a hallmark of cancer. In the last 16 years, a CpG island methylator phenotype (CIMP) has been documented in tumors originating from different tissues. However, a looming question in the field is whether or not CIMP is a pan-cancer phenomenon or a tissue-specific event. Here, we give a synopsis of the history of CIMP and describe the pattern of DNA methylation that defines the CIMP phenotype in different cancer types. We highlight new conceptual approaches of classifying tumors based on CIMP in a cancer type-agnostic way that reveal the presence of distinct CIMP tumors in a multitude of The Cancer Genome Atlas (TCGA) datasets, suggesting that this phenotype may transcend tissue-type specificity. Lastly, we show evidence supporting the clinical relevance of CIMP-positive tumors and suggest that a common CIMP etiology may define new mechanistic targets in cancer treatment. PMID:27879658
The Emergence of Pan-Cancer CIMP and Its Elusive Interpretation.

PubMed

Miller, Brendan F; Sánchez-Vega, Francisco; Elnitski, Laura

2016-11-22

Epigenetic dysregulation is recognized as a hallmark of cancer. In the last 16 years, a CpG island methylator phenotype (CIMP) has been documented in tumors originating from different tissues. However, a looming question in the field is whether or not CIMP is a pan-cancer phenomenon or a tissue-specific event. Here, we give a synopsis of the history of CIMP and describe the pattern of DNA methylation that defines the CIMP phenotype in different cancer types. We highlight new conceptual approaches of classifying tumors based on CIMP in a cancer type-agnostic way that reveal the presence of distinct CIMP tumors in a multitude of The Cancer Genome Atlas (TCGA) datasets, suggesting that this phenotype may transcend tissue-type specificity. Lastly, we show evidence supporting the clinical relevance of CIMP-positive tumors and suggest that a common CIMP etiology may define new mechanistic targets in cancer treatment.
Population Structure and Antimicrobial Resistance Profiles of Streptococcus suis Serotype 2 Sequence Type 25 Strains.

PubMed

Athey, Taryn B T; Teatero, Sarah; Takamatsu, Daisuke; Wasserscheid, Jessica; Dewar, Ken; Gottschalk, Marcelo; Fittipaldi, Nahuel

2016-01-01

Strains of serotype 2 Streptococcus suis are responsible for swine and human infections. Different serotype 2 genetic backgrounds have been defined using multilocus sequence typing (MLST). However, little is known about the genetic diversity within each MLST sequence type (ST). Here, we used whole-genome sequencing to test the hypothesis that S. suis serotype 2 strains of the ST25 lineage are genetically heterogeneous. We evaluated 51 serotype 2 ST25 S. suis strains isolated from diseased pigs and humans in Canada, the United States of America, and Thailand. Whole-genome sequencing revealed numerous large-scale rearrangements in the ST25 genome, compared to the genomes of ST1 and ST28 S. suis strains, which result, among other changes, in disruption of a pilus island locus. We report that recombination and lateral gene transfer contribute to ST25 genetic diversity. Phylogenetic analysis identified two main and distinct Thai and North American clades grouping most strains investigated. These clades also possessed distinct patterns of antimicrobial resistance genes, which correlated with acquisition of different integrative and conjugative elements (ICEs). Some of these ICEs were found to be integrated at a recombination hot spot, previously identified as the site of integration of the 89K pathogenicity island in serotype 2 ST7 S. suis strains. Our results highlight the limitations of MLST for phylogenetic analysis of S. suis, and the importance of lateral gene transfer and recombination as drivers of diversity in this swine pathogen and zoonotic agent.
Population Structure and Antimicrobial Resistance Profiles of Streptococcus suis Serotype 2 Sequence Type 25 Strains

PubMed Central

Athey, Taryn B. T.; Teatero, Sarah; Takamatsu, Daisuke; Wasserscheid, Jessica; Dewar, Ken; Gottschalk, Marcelo; Fittipaldi, Nahuel

2016-01-01

Strains of serotype 2 Streptococcus suis are responsible for swine and human infections. Different serotype 2 genetic backgrounds have been defined using multilocus sequence typing (MLST). However, little is known about the genetic diversity within each MLST sequence type (ST). Here, we used whole-genome sequencing to test the hypothesis that S. suis serotype 2 strains of the ST25 lineage are genetically heterogeneous. We evaluated 51 serotype 2 ST25 S. suis strains isolated from diseased pigs and humans in Canada, the United States of America, and Thailand. Whole-genome sequencing revealed numerous large-scale rearrangements in the ST25 genome, compared to the genomes of ST1 and ST28 S. suis strains, which result, among other changes, in disruption of a pilus island locus. We report that recombination and lateral gene transfer contribute to ST25 genetic diversity. Phylogenetic analysis identified two main and distinct Thai and North American clades grouping most strains investigated. These clades also possessed distinct patterns of antimicrobial resistance genes, which correlated with acquisition of different integrative and conjugative elements (ICEs). Some of these ICEs were found to be integrated at a recombination hot spot, previously identified as the site of integration of the 89K pathogenicity island in serotype 2 ST7 S. suis strains. Our results highlight the limitations of MLST for phylogenetic analysis of S. suis, and the importance of lateral gene transfer and recombination as drivers of diversity in this swine pathogen and zoonotic agent. PMID:26954687
Advances in Molecular Pathology and Treatment of Periampullary Cancers.

PubMed

Chandrasegaram, Manju D; Chen, John W; Price, Timothy J; Zalcberg, John; Sjoquist, Katrin; Merrett, Neil D

2016-01-01

Periampullary cancers (PACs) include the following 4 traditional anatomic subtypes: pancreatic, ampullary, biliary, or duodenal cancers. This review was performed to highlight recent advances in the genomic and molecular understanding of each PAC subtype and the advances in chemotherapeutic and molecular trials in these cancer subtypes. Recent advances have highlighted differences in the genomic and molecular features within each PAC subtype. Ampullary cancers can now be further defined accurately into their intestinal and pancreatobiliary subtypes using histomolecular profiling. K-ras mutation, which occurs in most pancreatic cancers, is found to occur less frequently in ampullary (42%-52%), biliary (22%-23%), and duodenal cancers (32%-35%), suggesting crucial differences in targetable mutations in these cancer subtypes.Ampullary cancers of intestinal subtype and duodenal cancers seem to share similarities with colorectal cancer, given that they respond to similar chemotherapeutic regimens. This has potential implications for clinical trials and treatment selection, where PACs are often considered together. Future trials should be designed in view of our increased understanding of the different anatomic and histomolecularly profiled subtypes of PAC cancers, which respects their individual molecular characteristics, phenotype, and response to treatment.
Spike-In Normalization of ChIP Data Using DNA-DIG-Antibody Complex.

PubMed

Eberle, Andrea B

2018-01-01

Chromatin immunoprecipitation (ChIP) is a widely used method to determine the occupancy of specific proteins within the genome, helping to unravel the function and activity of specific genomic regions. In ChIP experiments, normalization of the obtained data by a suitable internal reference is crucial. However, particularly when comparing differently treated samples, such a reference is difficult to identify. Here, a simple method to improve the accuracy and reliability of ChIP experiments by the help of an external reference is described. An artificial molecule, composed of a well-defined digoxigenin (DIG) labeled DNA fragment in complex with an anti-DIG antibody, is synthesized and added to each chromatin sample before immunoprecipitation. During the ChIP procedure, the DNA-DIG-antibody complex undergoes the same treatments as the chromatin and is therefore purified and quantified together with the chromatin of interest. This external reference compensates for variability during the ChIP routine and improves the similarity between replicates, thereby emphasizing the biological differences between samples.
Improved Ribosome-Footprint and mRNA Measurements Provide Insights into Dynamics and Regulation of Yeast Translation

DTIC Science & Technology

2016-02-11

the White- head Genome Technology Core for sequencing . This work was supported by the UCSF Program for Breakthrough Biomedical Research (funded in...landscape of the yeast genome defined by RNA sequencing . Science 320, 1344–1349. Nedialkova, D.D., and Leidel, S.A. (2015). Optimization of Codon Translation... the CC BY license (http://creativecommons.org/licenses/by/4.0/). SUMMARY Ribosome-footprint profiling provides genome -wide snapshots of translation
Within-genome evolution of REPINs: a new family of miniature mobile DNA in bacteria.

PubMed

Bertels, Frederic; Rainey, Paul B

2011-06-01

Repetitive sequences are a conserved feature of many bacterial genomes. While first reported almost thirty years ago, and frequently exploited for genotyping purposes, little is known about their origin, maintenance, or processes affecting the dynamics of within-genome evolution. Here, beginning with analysis of the diversity and abundance of short oligonucleotide sequences in the genome of Pseudomonas fluorescens SBW25, we show that over-represented short sequences define three distinct groups (GI, GII, and GIII) of repetitive extragenic palindromic (REP) sequences. Patterns of REP distribution suggest that closely linked REP sequences form a functional replicative unit: REP doublets are over-represented, randomly distributed in extragenic space, and more highly conserved than singlets. In addition, doublets are organized as inverted repeats, which together with intervening spacer sequences are predicted to form hairpin structures in ssDNA or mRNA. We refer to these newly defined entities as REPINs (REP doublets forming hairpins) and identify short reads from population sequencing that reveal putative transposition intermediates. The proximal relationship between GI, GII, and GIII REPINs and specific REP-associated tyrosine transposases (RAYTs), combined with features of the putative transposition intermediate, suggests a mechanism for within-genome dissemination. Analysis of the distribution of REPs in a range of RAYT-containing bacterial genomes, including Escherichia coli K-12 and Nostoc punctiforme, show that REPINs are a widely distributed, but hitherto unrecognized, family of miniature non-autonomous mobile DNA.
Single-Cell Whole-Genome Amplification and Sequencing: Methodology and Applications.

PubMed

Huang, Lei; Ma, Fei; Chapman, Alec; Lu, Sijia; Xie, Xiaoliang Sunney

2015-01-01

We present a survey of single-cell whole-genome amplification (WGA) methods, including degenerate oligonucleotide-primed polymerase chain reaction (DOP-PCR), multiple displacement amplification (MDA), and multiple annealing and looping-based amplification cycles (MALBAC). The key parameters to characterize the performance of these methods are defined, including genome coverage, uniformity, reproducibility, unmappable rates, chimera rates, allele dropout rates, false positive rates for calling single-nucleotide variations, and ability to call copy-number variations. Using these parameters, we compare five commercial WGA kits by performing deep sequencing of multiple single cells. We also discuss several major applications of single-cell genomics, including studies of whole-genome de novo mutation rates, the early evolution of cancer genomes, circulating tumor cells (CTCs), meiotic recombination of germ cells, preimplantation genetic diagnosis (PGD), and preimplantation genomic screening (PGS) for in vitro-fertilized embryos.
Personal Genome Sequencing in Ostensibly Healthy Individuals and the PeopleSeq Consortium

PubMed Central

Linderman, Michael D.; Nielsen, Daiva E.; Green, Robert C.

2016-01-01

Thousands of ostensibly healthy individuals have had their exome or genome sequenced, but a much smaller number of these individuals have received any personal genomic results from that sequencing. We term those projects in which ostensibly healthy participants can receive sequencing-derived genetic findings and may also have access to their genomic data as participatory predispositional personal genome sequencing (PPGS). Here we are focused on genome sequencing applied in a pre-symptomatic context and so define PPGS to exclude diagnostic genome sequencing intended to identify the molecular cause of suspected or diagnosed genetic disease. In this report we describe the design of completed and underway PPGS projects, briefly summarize the results reported to date and introduce the PeopleSeq Consortium, a newly formed collaboration of PPGS projects designed to collect much-needed longitudinal outcome data. PMID:27023617
Robust Inference of Genetic Exchange Communities from Microbial Genomes Using TF-IDF

PubMed Central

Cong, Yingnan; Chan, Yao-ban; Phillips, Charles A.; Langston, Michael A.; Ragan, Mark A.

2017-01-01

Bacteria and archaea can exchange genetic material across lineages through processes of lateral genetic transfer (LGT). Collectively, these exchange relationships can be modeled as a network and analyzed using concepts from graph theory. In particular, densely connected regions within an LGT network have been defined as genetic exchange communities (GECs). However, it has been problematic to construct networks in which edges solely represent LGT. Here we apply term frequency-inverse document frequency (TF-IDF), an alignment-free method originating from document analysis, to infer regions of lateral origin in bacterial genomes. We examine four empirical datasets of different size (number of genomes) and phyletic breadth, varying a key parameter (word length k) within bounds established in previous work. We map the inferred lateral regions to genes in recipient genomes, and construct networks in which the nodes are groups of genomes, and the edges natively represent LGT. We then extract maximum and maximal cliques (i.e., GECs) from these graphs, and identify nodes that belong to GECs across a wide range of k. Most surviving lateral transfer has happened within these GECs. Using Gene Ontology enrichment tests we demonstrate that biological processes associated with metabolism, regulation and transport are often over-represented among the genes affected by LGT within these communities. These enrichments are largely robust to change of k. PMID:28154557
Whole-Genome Resequencing of Experimental Populations Reveals Polygenic Basis of Egg-Size Variation in Drosophila melanogaster

PubMed Central

Jha, Aashish R.; Miles, Cecelia M.; Lippert, Nodia R.; Brown, Christopher D.; White, Kevin P.; Kreitman, Martin

2015-01-01

Complete genome resequencing of populations holds great promise in deconstructing complex polygenic traits to elucidate molecular and developmental mechanisms of adaptation. Egg size is a classic adaptive trait in insects, birds, and other taxa, but its highly polygenic architecture has prevented high-resolution genetic analysis. We used replicated experimental evolution in Drosophila melanogaster and whole-genome sequencing to identify consistent signatures of polygenic egg-size adaptation. A generalized linear-mixed model revealed reproducible allele frequency differences between replicated experimental populations selected for large and small egg volumes at approximately 4,000 single nucleotide polymorphisms (SNPs). Several hundred distinct genomic regions contain clusters of these SNPs and have lower heterozygosity than the genomic background, consistent with selection acting on polymorphisms in these regions. These SNPs are also enriched among genes expressed in Drosophila ovaries and many of these genes have well-defined functions in Drosophila oogenesis. Additional genes regulating egg development, growth, and cell size show evidence of directional selection as genes regulating these biological processes are enriched for highly differentiated SNPs. Genetic crosses performed with a subset of candidate genes demonstrated that these genes influence egg size, at least in the large genetic background. These findings confirm the highly polygenic architecture of this adaptive trait, and suggest the involvement of many novel candidate genes in regulating egg size. PMID:26044351
Genome-scale engineering for systems and synthetic biology

PubMed Central

Esvelt, Kevin M; Wang, Harris H

2013-01-01

Genome-modification technologies enable the rational engineering and perturbation of biological systems. Historically, these methods have been limited to gene insertions or mutations at random or at a few pre-defined locations across the genome. The handful of methods capable of targeted gene editing suffered from low efficiencies, significant labor costs, or both. Recent advances have dramatically expanded our ability to engineer cells in a directed and combinatorial manner. Here, we review current technologies and methodologies for genome-scale engineering, discuss the prospects for extending efficient genome modification to new hosts, and explore the implications of continued advances toward the development of flexibly programmable chasses, novel biochemistries, and safer organismal and ecological engineering. PMID:23340847
Keeping track of worm trackers.

PubMed

Husson, Steven J; Costa, Wagner Steuer; Schmitt, Cornelia; Gottschalk, Alexander

2013-02-22

C. elegans is used extensively as a model system in the neurosciences due to its well defined nervous system. However, the seeming simplicity of this nervous system in anatomical structure and neuronal connectivity, at least compared to higher animals, underlies a rich diversity of behaviors. The usefulness of the worm in genome-wide mutagenesis or RNAi screens, where thousands of strains are assessed for phenotype, emphasizes the need for computational methods for automated parameterization of generated behaviors. In addition, behaviors can be modulated upon external cues like temperature, O(subscript)2(/subscript) and CO(subscript)2(/subscript) concentrations, mechanosensory and chemosensory inputs. Different machine vision tools have been developed to aid researchers in their efforts to inventory and characterize defined behavioral "outputs". Here we aim at providing an overview of different worm-tracking packages or video analysis tools designed to quantify different aspects of locomotion such as the occurrence of directional changes (turns, omega bends), curvature of the sinusoidal shape (amplitude, body bend angles) and velocity (speed, backward or forward movement).
How to infer relative fitness from a sample of genomic sequences.

PubMed

Dayarian, Adel; Shraiman, Boris I

2014-07-01

Mounting evidence suggests that natural populations can harbor extensive fitness diversity with numerous genomic loci under selection. It is also known that genealogical trees for populations under selection are quantifiably different from those expected under neutral evolution and described statistically by Kingman's coalescent. While differences in the statistical structure of genealogies have long been used as a test for the presence of selection, the full extent of the information that they contain has not been exploited. Here we demonstrate that the shape of the reconstructed genealogical tree for a moderately large number of random genomic samples taken from a fitness diverse, but otherwise unstructured, asexual population can be used to predict the relative fitness of individuals within the sample. To achieve this we define a heuristic algorithm, which we test in silico, using simulations of a Wright-Fisher model for a realistic range of mutation rates and selection strength. Our inferred fitness ranking is based on a linear discriminator that identifies rapidly coalescing lineages in the reconstructed tree. Inferred fitness ranking correlates strongly with actual fitness, with a genome in the top 10% ranked being in the top 20% fittest with false discovery rate of 0.1-0.3, depending on the mutation/selection parameters. The ranking also enables us to predict the genotypes that future populations inherit from the present one. While the inference accuracy increases monotonically with sample size, samples of 200 nearly saturate the performance. We propose that our approach can be used for inferring relative fitness of genomes obtained in single-cell sequencing of tumors and in monitoring viral outbreaks. Copyright © 2014 by the Genetics Society of America.
Identification of novel genomic loci associated with soybean shoot tissue macro- and micro-nutrient concentrations

USDA-ARS?s Scientific Manuscript database

The mineral composition of crops is important for animal and human health. The natural diversity that exists within crop species can be utilized to investigate mechanisms that define plant mineral composition and to identify genomic loci important for these processes. The objective of this study was...
Genomic Sciences for Developmentalists: A Merge of Science and Practice

ERIC Educational Resources Information Center

Grigorenko, Elena L.

2015-01-01

The etiological forces of development have been a central question for the developmental sciences (however defined) since their crystallization as a distinct branch of scientific inquiry. Although the history of these sciences contains examples of extreme positions capitalizing on either the predominance of the genome (i.e., the accumulation of…
DNA replication origins-where do we begin?

PubMed

Prioleau, Marie-Noëlle; MacAlpine, David M

2016-08-01

For more than three decades, investigators have sought to identify the precise locations where DNA replication initiates in mammalian genomes. The development of molecular and biochemical approaches to identify start sites of DNA replication (origins) based on the presence of defining and characteristic replication intermediates at specific loci led to the identification of only a handful of mammalian replication origins. The limited number of identified origins prevented a comprehensive and exhaustive search for conserved genomic features that were capable of specifying origins of DNA replication. More recently, the adaptation of origin-mapping assays to genome-wide approaches has led to the identification of tens of thousands of replication origins throughout mammalian genomes, providing an unprecedented opportunity to identify both genetic and epigenetic features that define and regulate their distribution and utilization. Here we summarize recent advances in our understanding of how primary sequence, chromatin environment, and nuclear architecture contribute to the dynamic selection and activation of replication origins across diverse cell types and developmental stages. © 2016 Prioleau and MacAlpine; Published by Cold Spring Harbor Laboratory Press.

Genomes of diverse isolates of the marine cyanobacterium Prochlorococcus

PubMed Central

Biller, Steven J.; Berube, Paul M.; Berta-Thompson, Jessie W.; Kelly, Libusha; Roggensack, Sara E.; Awad, Lana; Roache-Johnson, Kathryn H.; Ding, Huiming; Giovannoni, Stephen J.; Rocap, Gabrielle; Moore, Lisa R.; Chisholm, Sallie W.

2014-01-01

The marine cyanobacterium Prochlorococcus is the numerically dominant photosynthetic organism in the oligotrophic oceans, and a model system in marine microbial ecology. Here we report 27 new whole genome sequences (2 complete and closed; 25 of draft quality) of cultured isolates, representing five major phylogenetic clades of Prochlorococcus. The sequenced strains were isolated from diverse regions of the oceans, facilitating studies of the drivers of microbial diversity—both in the lab and in the field. To improve the utility of these genomes for comparative genomics, we also define pre-computed clusters of orthologous groups of proteins (COGs), indicating how genes are distributed among these and other publicly available Prochlorococcus genomes. These data represent a significant expansion of Prochlorococcus reference genomes that are useful for numerous applications in microbial ecology, evolution and oceanography. PMID:25977791
A mechanistic link between gene regulation and genome architecture in mammalian development.

PubMed

Bonora, Giancarlo; Plath, Kathrin; Denholtz, Matthew

2014-08-01

The organization of chromatin within the nucleus and the regulation of transcription are tightly linked. Recently, mechanisms underlying this relationship have been uncovered. By defining the organizational hierarchy of the genome, determining changes in chromatin organization associated with changes in cell identity, and describing chromatin organization within the context of linear genomic features (such as chromatin modifications and transcription factor binding) and architectural proteins (including Cohesin, CTCF, and Mediator), a new paradigm in genome biology was established wherein genomes are organized around gene regulatory factors that govern cell identity. As such, chromatin organization plays a central role in establishing and maintaining cell state during development, with gene regulation and genome organization being mutually dependent effectors of cell identity. Copyright © 2014 Elsevier Ltd. All rights reserved.
The Epidemiology of Longevity and Exceptional Survival

PubMed Central

Newman, Anne B.; Murabito, Joanne M.

2013-01-01

The field of the “epidemiology of longevity” has been expanding rapidly in recent years. Several long-term cohort studies have followed older adults long enough to identify the most long-lived and to define many factors that lead to a long life span. Very long-lived people such as centenarians have been examined using case-control study designs. Both cohort and case-control studies have been the subject of genome-wide association studies that have identified genetic variants associated with longevity. With growing recognition of the importance of rare variations, family studies of longevity will be useful. Most recently, exome and whole-genome sequencing, gene expression, and epigenetic studies have been undertaken to better define functional variation and regulation of the genome. In this review, we consider how these studies are leading to a deeper understanding of the underlying biologic pathways to longevity. PMID:23372024
PanCoreGen - Profiling, detecting, annotating protein-coding genes in microbial genomes.

PubMed

Paul, Sandip; Bhardwaj, Archana; Bag, Sumit K; Sokurenko, Evgeni V; Chattopadhyay, Sujay

2015-12-01

A large amount of genomic data, especially from multiple isolates of a single species, has opened new vistas for microbial genomics analysis. Analyzing the pan-genome (i.e. the sum of genetic repertoire) of microbial species is crucial in understanding the dynamics of molecular evolution, where virulence evolution is of major interest. Here we present PanCoreGen - a standalone application for pan- and core-genomic profiling of microbial protein-coding genes. PanCoreGen overcomes key limitations of the existing pan-genomic analysis tools, and develops an integrated annotation-structure for a species-specific pan-genomic profile. It provides important new features for annotating draft genomes/contigs and detecting unidentified genes in annotated genomes. It also generates user-defined group-specific datasets within the pan-genome. Interestingly, analyzing an example-set of Salmonella genomes, we detect potential footprints of adaptive convergence of horizontally transferred genes in two human-restricted pathogenic serovars - Typhi and Paratyphi A. Overall, PanCoreGen represents a state-of-the-art tool for microbial phylogenomics and pathogenomics study. Copyright © 2015 Elsevier Inc. All rights reserved.
Probing Genomic Aspects of the Multi-Host Pathogen Clostridium perfringens Reveals Significant Pangenome Diversity, and a Diverse Array of Virulence Factors

PubMed Central

Kiu, Raymond; Caim, Shabhonam; Alexander, Sarah; Pachori, Purnima; Hall, Lindsay J.

2017-01-01

Clostridium perfringens is an important cause of animal and human infections, however information about the genetic makeup of this pathogenic bacterium is currently limited. In this study, we sought to understand and characterise the genomic variation, pangenomic diversity, and key virulence traits of 56 C. perfringens strains which included 51 public, and 5 newly sequenced and annotated genomes using Whole Genome Sequencing. Our investigation revealed that C. perfringens has an “open” pangenome comprising 11667 genes and 12.6% of core genes, identified as the most divergent single-species Gram-positive bacterial pangenome currently reported. Our computational analyses also defined C. perfringens phylogeny (16S rRNA gene) in relation to some 25 Clostridium species, with C. baratii and C. sardiniense determined to be the closest relatives. Profiling virulence-associated factors confirmed presence of well-characterised C. perfringens-associated exotoxins genes including α-toxin (plc), enterotoxin (cpe), and Perfringolysin O (pfo or pfoA), although interestingly there did not appear to be a close correlation with encoded toxin type and disease phenotype. Furthermore, genomic analysis indicated significant horizontal gene transfer events as defined by presence of prophage genomes, and notably absence of CRISPR defence systems in >70% (40/56) of the strains. In relation to antimicrobial resistance mechanisms, tetracycline resistance genes (tet) and anti-defensins genes (mprF) were consistently detected in silico (tet: 75%; mprF: 100%). However, pre-antibiotic era strain genomes did not encode for tet, thus implying antimicrobial selective pressures in C. perfringens evolutionary history over the past 80 years. This study provides new genomic understanding of this genetically divergent multi-host bacterium, and further expands our knowledge on this medically and veterinary important pathogen. PMID:29312194
Transmissible Gastroenteritis Coronavirus Genome Packaging Signal Is Located at the 5′ End of the Genome and Promotes Viral RNA Incorporation into Virions in a Replication-Independent Process

PubMed Central

Morales, Lucia; Mateos-Gomez, Pedro A.; Capiscol, Carmen; del Palacio, Lorena; Sola, Isabel

2013-01-01

Preferential RNA packaging in coronaviruses involves the recognition of viral genomic RNA, a crucial process for viral particle morphogenesis mediated by RNA-specific sequences, known as packaging signals. An essential packaging signal component of transmissible gastroenteritis coronavirus (TGEV) has been further delimited to the first 598 nucleotides (nt) from the 5′ end of its RNA genome, by using recombinant viruses transcribing subgenomic mRNA that included potential packaging signals. The integrity of the entire sequence domain was necessary because deletion of any of the five structural motifs defined within this region abrogated specific packaging of this viral RNA. One of these RNA motifs was the stem-loop SL5, a highly conserved motif in coronaviruses located at nucleotide positions 106 to 136. Partial deletion or point mutations within this motif also abrogated packaging. Using TGEV-derived defective minigenomes replicated in trans by a helper virus, we have shown that TGEV RNA packaging is a replication-independent process. Furthermore, the last 494 nt of the genomic 3′ end were not essential for packaging, although this region increased packaging efficiency. TGEV RNA sequences identified as necessary for viral genome packaging were not sufficient to direct packaging of a heterologous sequence derived from the green fluorescent protein gene. These results indicated that TGEV genome packaging is a complex process involving many factors in addition to the identified RNA packaging signal. The identification of well-defined RNA motifs within the TGEV RNA genome that are essential for packaging will be useful for designing packaging-deficient biosafe coronavirus-derived vectors and providing new targets for antiviral therapies. PMID:23966403
Probing Genomic Aspects of the Multi-Host Pathogen Clostridium perfringens Reveals Significant Pangenome Diversity, and a Diverse Array of Virulence Factors.

PubMed

Kiu, Raymond; Caim, Shabhonam; Alexander, Sarah; Pachori, Purnima; Hall, Lindsay J

2017-01-01

Clostridium perfringens is an important cause of animal and human infections, however information about the genetic makeup of this pathogenic bacterium is currently limited. In this study, we sought to understand and characterise the genomic variation, pangenomic diversity, and key virulence traits of 56 C. perfringens strains which included 51 public, and 5 newly sequenced and annotated genomes using Whole Genome Sequencing. Our investigation revealed that C. perfringens has an "open" pangenome comprising 11667 genes and 12.6% of core genes, identified as the most divergent single-species Gram-positive bacterial pangenome currently reported. Our computational analyses also defined C. perfringens phylogeny (16S rRNA gene) in relation to some 25 Clostridium species, with C. baratii and C. sardiniense determined to be the closest relatives. Profiling virulence-associated factors confirmed presence of well-characterised C. perfringens -associated exotoxins genes including α-toxin ( plc ), enterotoxin ( cpe ), and Perfringolysin O ( pfo or pfoA ), although interestingly there did not appear to be a close correlation with encoded toxin type and disease phenotype. Furthermore, genomic analysis indicated significant horizontal gene transfer events as defined by presence of prophage genomes, and notably absence of CRISPR defence systems in >70% (40/56) of the strains. In relation to antimicrobial resistance mechanisms, tetracycline resistance genes ( tet ) and anti-defensins genes ( mprF ) were consistently detected in silico ( tet : 75%; mprF : 100%). However, pre-antibiotic era strain genomes did not encode for tet , thus implying antimicrobial selective pressures in C. perfringens evolutionary history over the past 80 years. This study provides new genomic understanding of this genetically divergent multi-host bacterium, and further expands our knowledge on this medically and veterinary important pathogen.
In the loop: how chromatin topology links genome structure to function in mechanisms underlying learning and memory.

PubMed

Watson, L Ashley; Tsai, Li-Huei

2017-04-01

Different aspects of learning, memory, and cognition are regulated by epigenetic mechanisms such as covalent DNA modifications and histone post-translational modifications. More recently, the modulation of chromatin architecture and nuclear organization is emerging as a key factor in dynamic transcriptional regulation of the post-mitotic neuron. For instance, neuronal activity induces relocalization of gene loci to 'transcription factories', and specific enhancer-promoter looping contacts allow for precise transcriptional regulation. Moreover, neuronal activity-dependent DNA double-strand break formation in the promoter of immediate early genes appears to overcome topological constraints on transcription. Together, these findings point to a critical role for genome topology in integrating dynamic environmental signals to define precise spatiotemporal gene expression programs supporting cognitive processes. Copyright © 2016 Elsevier Ltd. All rights reserved.
Genome-Wide Analysis of bZIP-Encoding Genes in Maize

PubMed Central

Wei, Kaifa; Chen, Juan; Wang, Yanmei; Chen, Yanhui; Chen, Shaoxiang; Lin, Yina; Pan, Si; Zhong, Xiaojun; Xie, Daoxin

2012-01-01

In plants, basic leucine zipper (bZIP) proteins regulate numerous biological processes such as seed maturation, flower and vascular development, stress signalling and pathogen defence. We have carried out a genome-wide identification and analysis of 125 bZIP genes that exist in the maize genome, encoding 170 distinct bZIP proteins. This family can be divided into 11 groups according to the phylogenetic relationship among the maize bZIP proteins and those in Arabidopsis and rice. Six kinds of intron patterns (a–f) within the basic and hinge regions are defined. The additional conserved motifs have been identified and present the group specificity. Detailed three-dimensional structure analysis has been done to display the sequence conservation and potential distribution of the bZIP domain. Further, we predict the DNA-binding pattern and the dimerization property on the basis of the characteristic features in the basic and hinge regions and the leucine zipper, respectively, which supports our classification greatly and helps to classify 26 distinct subfamilies. The chromosome distribution and the genetic analysis reveal that 58 ZmbZIP genes are located in the segmental duplicate regions in the maize genome, suggesting that the segment chromosomal duplications contribute greatly to the expansion of the maize bZIP family. Across the 60 different developmental stages of 11 organs, three apparent clusters formed represent three kinds of different expression patterns among the ZmbZIP gene family in maize development. A similar but slightly different expression pattern of bZIPs in two inbred lines displays that 22 detected ZmbZIP genes might be involved in drought stress. Thirteen pairs and 143 pairs of ZmbZIP genes show strongly negative and positive correlations in the four distinct fungal infections, respectively, based on the expression profile and Pearson's correlation coefficient analysis. PMID:23103471
Mice, humans and haplotypes--the hunt for disease genes in SLE.

PubMed

Rigby, R J; Fernando, M M A; Vyse, T J

2006-09-01

Defining the polymorphisms that contribute to the development of complex genetic disease traits is a challenging, although increasingly tractable problem. Historically, the technical difficulties in conducting association studies across the entire human genome are such that murine models have been used to generate candidate genes for analysis in human complex diseases, such as SLE. In this article we discuss the advantages and disadvantages of this approach and specifically address some assumptions made in the transition from studying one species to another, using lupus as an example. These issues include differences in genetic structure and genetic organisation which are a reflection on the population history. Clearly there are major differences in the histories of the human population and inbred laboratory strains of mice. Both human and murine genomes do exhibit structure at the genetic level. That is to say, they comprise haplotypes which are genomic regions that carry runs of polymorphisms that are not independently inherited. Haplotypes therefore reduce the number of combinations of the polymorphisms in the DNA in that region and facilitate the identification of disease susceptibility genes in both mice and humans. There are now novel means of generating candidate genes in SLE using mutagenesis (with ENU) in mice and identifying mice that generate antinuclear autoimmunity. In addition, murine models still provide a valuable means of exploring the functional consequences of genetic variation. However, advances in technology are such that human geneticists can now screen large fractions of the human genome for disease associations using microchip technologies that provide information on upwards of 100,000 different polymorphisms. These approaches are aimed at identifying haplotypes that carry disease susceptibility mutations and rely less on the generation of candidate genes.
Multiple Multi-Copper Oxidase Gene Families in Basidiomycetes – What for?

PubMed Central

Kües, Ursula; Rühl, Martin

2011-01-01

Genome analyses revealed in various basidiomycetes the existence of multiple genes for blue multi-copper oxidases (MCOs). Whole genomes are now available from saprotrophs, white rot and brown rot species, plant and animal pathogens and ectomycorrhizal species. Total numbers (from 1 to 17) and types of mco genes differ between analyzed species with no easy to recognize connection of gene distribution to fungal life styles. Types of mco genes might be present in one and absent in another fungus. Distinct types of genes have been multiplied at speciation in different organisms. Phylogenetic analysis defined different subfamilies of laccases sensu stricto (specific to Agaricomycetes), classical Fe2+-oxidizing Fet3-like ferroxidases, potential ferroxidases/laccases exhibiting either one or both of these enzymatic functions, enzymes clustering with pigment MCOs and putative ascorbate oxidases. Biochemically best described are laccases sensu stricto due to their proposed roles in degradation of wood, straw and plant litter and due to the large interest in these enzymes in biotechnology. However, biological functions of laccases and other MCOs are generally little addressed. Functions in substrate degradation, symbiontic and pathogenic intercations, development, pigmentation and copper homeostasis have been put forward. Evidences for biological functions are in most instances rather circumstantial by correlations of expression. Multiple factors impede research on biological functions such as difficulties of defining suitable biological systems for molecular research, the broad and overlapping substrate spectrum multi-copper oxidases usually possess, the low existent knowledge on their natural substrates, difficulties imposed by low expression or expression of multiple enzymes, and difficulties in expressing enzymes heterologously. PMID:21966246
Quantitative genome re-sequencing defines multiple mutations conferring chloroquine resistance in rodent malaria

PubMed Central

2012-01-01

Background Drug resistance in the malaria parasite Plasmodium falciparum severely compromises the treatment and control of malaria. A knowledge of the critical mutations conferring resistance to particular drugs is important in understanding modes of drug action and mechanisms of resistances. They are required to design better therapies and limit drug resistance. A mutation in the gene (pfcrt) encoding a membrane transporter has been identified as a principal determinant of chloroquine resistance in P. falciparum, but we lack a full account of higher level chloroquine resistance. Furthermore, the determinants of resistance in the other major human malaria parasite, P. vivax, are not known. To address these questions, we investigated the genetic basis of chloroquine resistance in an isogenic lineage of rodent malaria parasite P. chabaudi in which high level resistance to chloroquine has been progressively selected under laboratory conditions. Results Loci containing the critical genes were mapped by Linkage Group Selection, using a genetic cross between the high-level chloroquine-resistant mutant and a genetically distinct sensitive strain. A novel high-resolution quantitative whole-genome re-sequencing approach was used to reveal three regions of selection on chr11, chr03 and chr02 that appear progressively at increasing drug doses on three chromosomes. Whole-genome sequencing of the chloroquine-resistant parent identified just four point mutations in different genes on these chromosomes. Three mutations are located at the foci of the selection valleys and are therefore predicted to confer different levels of chloroquine resistance. The critical mutation conferring the first level of chloroquine resistance is found in aat1, a putative aminoacid transporter. Conclusions Quantitative trait loci conferring selectable phenotypes, such as drug resistance, can be mapped directly using progressive genome-wide linkage group selection. Quantitative genome-wide short-read genome resequencing can be used to reveal these signatures of drug selection at high resolution. The identities of three genes (and mutations within them) conferring different levels of chloroquine resistance generate insights regarding the genetic architecture and mechanisms of resistance to chloroquine and other drugs. Importantly, their orthologues may now be evaluated for critical or accessory roles in chloroquine resistance in human malarias P. vivax and P. falciparum. PMID:22435897
Mitochondrial genomic analysis of late onset Alzheimer's disease reveals protective haplogroups H6A1A/H6A1B: the Cache County Study on Memory in Aging.

PubMed

Ridge, Perry G; Maxwell, Taylor J; Corcoran, Christopher D; Norton, Maria C; Tschanz, Joann T; O'Brien, Elizabeth; Kerber, Richard A; Cawthon, Richard M; Munger, Ronald G; Kauwe, John S K

2012-01-01

Alzheimer's disease (AD) is the most common cause of dementia and AD risk clusters within families. Part of the familial aggregation of AD is accounted for by excess maternal vs. paternal inheritance, a pattern consistent with mitochondrial inheritance. The role of specific mitochondrial DNA (mtDNA) variants and haplogroups in AD risk is uncertain. We determined the complete mitochondrial genome sequence of 1007 participants in the Cache County Study on Memory in Aging, a population-based prospective cohort study of dementia in northern Utah. AD diagnoses were made with a multi-stage protocol that included clinical examination and review by a panel of clinical experts. We used TreeScanning, a statistically robust approach based on haplotype networks, to analyze the mtDNA sequence data. Participants with major mitochondrial haplotypes H6A1A and H6A1B showed a reduced risk of AD (p=0.017, corrected for multiple comparisons). The protective haplotypes were defined by three variants: m.3915G>A, m.4727A>G, and m.9380G>A. These three variants characterize two different major haplogroups. Together m.4727A>G and m.9380G>A define H6A1, and it has been suggested m.3915G>A defines H6A. Additional variants differentiate H6A1A and H6A1B; however, none of these variants had a significant relationship with AD case-control status. Our findings provide evidence of a reduced risk of AD for individuals with mtDNA haplotypes H6A1A and H6A1B. These findings are the results of the largest study to date with complete mtDNA genome sequence data, yet the functional significance of the associated haplotypes remains unknown and replication in others studies is necessary.
Mitochondrial Genomic Analysis of Late Onset Alzheimer’s Disease Reveals Protective Haplogroups H6A1A/H6A1B: The Cache County Study on Memory in Aging

PubMed Central

Ridge, Perry G.; Maxwell, Taylor J.; Corcoran, Christopher D.; Norton, Maria C.; Tschanz, JoAnn T.; O’Brien, Elizabeth; Kerber, Richard A.; Cawthon, Richard M.; Munger, Ronald G.; Kauwe, John S. K.

2012-01-01

Background Alzheimer’s disease (AD) is the most common cause of dementia and AD risk clusters within families. Part of the familial aggregation of AD is accounted for by excess maternal vs. paternal inheritance, a pattern consistent with mitochondrial inheritance. The role of specific mitochondrial DNA (mtDNA) variants and haplogroups in AD risk is uncertain. Methodology/Principal Findings We determined the complete mitochondrial genome sequence of 1007 participants in the Cache County Study on Memory in Aging, a population-based prospective cohort study of dementia in northern Utah. AD diagnoses were made with a multi-stage protocol that included clinical examination and review by a panel of clinical experts. We used TreeScanning, a statistically robust approach based on haplotype networks, to analyze the mtDNA sequence data. Participants with major mitochondrial haplotypes H6A1A and H6A1B showed a reduced risk of AD (p = 0.017, corrected for multiple comparisons). The protective haplotypes were defined by three variants: m.3915G>A, m.4727A>G, and m.9380G>A. These three variants characterize two different major haplogroups. Together m.4727A>G and m.9380G>A define H6A1, and it has been suggested m.3915G>A defines H6A. Additional variants differentiate H6A1A and H6A1B; however, none of these variants had a significant relationship with AD case-control status. Conclusions/Significance Our findings provide evidence of a reduced risk of AD for individuals with mtDNA haplotypes H6A1A and H6A1B. These findings are the results of the largest study to date with complete mtDNA genome sequence data, yet the functional significance of the associated haplotypes remains unknown and replication in others studies is necessary. PMID:23028804
Visual Exploration of Genetic Association with Voxel-based Imaging Phenotypes in an MCI/AD Study

PubMed Central

Kim, Sungeun; Shen, Li; Saykin, Andrew J.; West, John D.

2010-01-01

Neuroimaging genomics is a new transdisciplinary research field, which aims to examine genetic effects on brain via integrated analyses of high throughput neuroimaging and genomic data. We report our recent work on (1) developing an imaging genomic browsing system that allows for whole genome and entire brain analyses based on visual exploration and (2) applying the system to the imaging genomic analysis of an existing MCI/AD cohort. Voxel-based morphometry is used to define imaging phenotypes. ANCOVA is employed to evaluate the effect of the interaction of genotypes and diagnosis in relation to imaging phenotypes while controlling for relevant covariates. Encouraging experimental results suggest that the proposed system has substantial potential for enabling discovery of imaging genomic associations through visual evaluation and for localizing candidate imaging regions and genomic regions for refined statistical modeling. PMID:19963597
Atomic structure of the human cytomegalovirus capsid with its securing tegument layer of pp150

PubMed Central

Yu, Xuekui; Jih, Jonathan; Jiang, Jiansen; Zhou, Z. Hong

2017-01-01

Herpesviruses possess a genome-pressurized capsid. The 235-kilobase genome of human cytomegalovirus (HCMV) is by far the largest of any herpesvirus, yet it has been unclear how its capsid, which is similar in size to those of other herpesviruses, is stabilized. Here we report a HCMV atomic structure consisting of the herpesvirus-conserved capsid proteins MCP, Tri1, Tri2, and SCP and the HCMV-specific tegument protein pp150—totaling ~4000 molecules and 62 different conformers. MCPs manifest as a complex of insertions around a bacteriophage HK97 gp5–like domain, which gives rise to three classes of capsid floor–defining interactions; triplexes, composed of two “embracing” Tri2 conformers and a “third-wheeling” Tri1, fasten the capsid floor. HCMV-specific strategies include using hexon channels to accommodate the genome and pp150 helix bundles to secure the capsid via cysteine tetrad–to-SCP interactions. Our structure should inform rational design of countermeasures against HCMV, other herpesviruses, and even HIV/AIDS. PMID:28663444
Comparative Cytogenetics between Two Important Songbird, Models: The Zebra Finch and the Canary

PubMed Central

dos Santos, Michelly da Silva; Kretschmer, Rafael; Frankl-Vilches, Carolina; Bakker, Antje; Gahr, Manfred; O´Brien, Patricia C. M.; Ferguson-Smith, Malcolm A.

2017-01-01

Songbird species (order Passeriformes, suborder Oscines) are important models in various experimental fields spanning behavioural genomics to neurobiology. Although the genomes of some songbird species were sequenced recently, the chromosomal organization of these species is mostly unknown. Here we focused on the two most studied songbird species in neuroscience, the zebra finch (Taeniopygia guttata) and the canary (Serinus canaria). In order to clarify these issues and also to integrate chromosome data with their assembled genomes, we used classical and molecular cytogenetics in both zebra finch and canary to define their chromosomal homology, localization of heterochromatic blocks and distribution of rDNA clusters. We confirmed the same diploid number (2n = 80) in both species, as previously reported. FISH experiments confirmed the occurrence of multiple paracentric and pericentric inversions previously found in other species of Passeriformes, providing a cytogenetic signature for this order, and corroborating data from in silico analyses. Additionally, compared to other Passeriformes, we detected differences in the zebra finch karyotype concerning the morphology of some chromosomes, in the distribution of 5S rDNA clusters, and an inversion in chromosome 1. PMID:28129381
Novel mouse model recapitulates genome and transcriptome alterations in human colorectal carcinomas.

PubMed

McNeil, Nicole E; Padilla-Nash, Hesed M; Buishand, Floryne O; Hue, Yue; Ried, Thomas

2017-03-01

Human colorectal carcinomas are defined by a nonrandom distribution of genomic imbalances that are characteristic for this disease. Often, these imbalances affect entire chromosomes. Understanding the role of these aneuploidies for carcinogenesis is of utmost importance. Currently, established transgenic mice do not recapitulate the pathognonomic genome aberration profile of human colorectal carcinomas. We have developed a novel model based on the spontaneous transformation of murine colon epithelial cells. During this process, cells progress through stages of pre-immortalization, immortalization and, finally, transformation, and result in tumors when injected into immunocompromised mice. We analyzed our model for genome and transcriptome alterations using ArrayCGH, spectral karyotyping (SKY), and array based gene expression profiling. ArrayCGH revealed a recurrent pattern of genomic imbalances. These results were confirmed by SKY. Comparing these imbalances with orthologous maps of human chromosomes revealed a remarkable overlap. We observed focal deletions of the tumor suppressor genes Trp53 and Cdkn2a/p16. High-level focal genomic amplification included the locus harboring the oncogene Mdm2, which was confirmed by FISH in the form of double minute chromosomes. Array-based global gene expression revealed distinct differences between the sequential steps of spontaneous transformation. Gene expression changes showed significant similarities with human colorectal carcinomas. Pathways most prominently affected included genes involved in chromosomal instability and in epithelial to mesenchymal transition. Our novel mouse model therefore recapitulates the most prominent genome and transcriptome alterations in human colorectal cancer, and might serve as a valuable tool for understanding the dynamic process of tumorigenesis, and for preclinical drug testing. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Global mapping of DNA conformational flexibility on Saccharomyces cerevisiae.

PubMed

Menconi, Giulia; Bedini, Andrea; Barale, Roberto; Sbrana, Isabella

2015-04-01

In this study we provide the first comprehensive map of DNA conformational flexibility in Saccharomyces cerevisiae complete genome. Flexibility plays a key role in DNA supercoiling and DNA/protein binding, regulating DNA transcription, replication or repair. Specific interest in flexibility analysis concerns its relationship with human genome instability. Enrichment in flexible sequences has been detected in unstable regions of human genome defined fragile sites, where genes map and carry frequent deletions and rearrangements in cancer. Flexible sequences have been suggested to be the determinants of fragile gene proneness to breakage; however, their actual role and properties remain elusive. Our in silico analysis carried out genome-wide via the StabFlex algorithm, shows the conserved presence of highly flexible regions in budding yeast genome as well as in genomes of other Saccharomyces sensu stricto species. Flexibile peaks in S. cerevisiae identify 175 ORFs mapping on their 3'UTR, a region affecting mRNA translation, localization and stability. (TA)n repeats of different extension shape the central structure of peaks and co-localize with polyadenylation efficiency element (EE) signals. ORFs with flexible peaks share common features. Transcripts are characterized by decreased half-life: this is considered peculiar of genes involved in regulatory systems with high turnover; consistently, their function affects biological processes such as cell cycle regulation or stress response. Our findings support the functional importance of flexibility peaks, suggesting that the flexible sequence may be derived by an expansion of canonical TAYRTA polyadenylation efficiency element. The flexible (TA)n repeat amplification could be the outcome of an evolutionary neofunctionalization leading to a differential 3'-end processing and expression regulation in genes with peculiar function. Our study provides a new support to the functional role of flexibility in genomes and a strategy for its characterization inside human fragile sites.
Comparative genome analysis of a large Dutch Legionella pneumophila strain collection identifies five markers highly correlated with clinical strains

PubMed Central

2010-01-01

Background Discrimination between clinical and environmental strains within many bacterial species is currently underexplored. Genomic analyses have clearly shown the enormous variability in genome composition between different strains of a bacterial species. In this study we have used Legionella pneumophila, the causative agent of Legionnaire's disease, to search for genomic markers related to pathogenicity. During a large surveillance study in The Netherlands well-characterized patient-derived strains and environmental strains were collected. We have used a mixed-genome microarray to perform comparative-genome analysis of 257 strains from this collection. Results Microarray analysis indicated that 480 DNA markers (out of in total 3360 markers) showed clear variation in presence between individual strains and these were therefore selected for further analysis. Unsupervised statistical analysis of these markers showed the enormous genomic variation within the species but did not show any correlation with a pathogenic phenotype. We therefore used supervised statistical analysis to identify discriminating markers. Genetic programming was used both to identify predictive markers and to define their interrelationships. A model consisting of five markers was developed that together correctly predicted 100% of the clinical strains and 69% of the environmental strains. Conclusions A novel approach for identifying predictive markers enabling discrimination between clinical and environmental isolates of L. pneumophila is presented. Out of over 3000 possible markers, five were selected that together enabled correct prediction of all the clinical strains included in this study. This novel approach for identifying predictive markers can be applied to all bacterial species, allowing for better discrimination between strains well equipped to cause human disease and relatively harmless strains. PMID:20630115

Global Mapping of DNA Conformational Flexibility on Saccharomyces cerevisiae

PubMed Central

Menconi, Giulia; Bedini, Andrea; Barale, Roberto; Sbrana, Isabella

2015-01-01

In this study we provide the first comprehensive map of DNA conformational flexibility in Saccharomyces cerevisiae complete genome. Flexibility plays a key role in DNA supercoiling and DNA/protein binding, regulating DNA transcription, replication or repair. Specific interest in flexibility analysis concerns its relationship with human genome instability. Enrichment in flexible sequences has been detected in unstable regions of human genome defined fragile sites, where genes map and carry frequent deletions and rearrangements in cancer. Flexible sequences have been suggested to be the determinants of fragile gene proneness to breakage; however, their actual role and properties remain elusive. Our in silico analysis carried out genome-wide via the StabFlex algorithm, shows the conserved presence of highly flexible regions in budding yeast genome as well as in genomes of other Saccharomyces sensu stricto species. Flexibile peaks in S. cerevisiae identify 175 ORFs mapping on their 3’UTR, a region affecting mRNA translation, localization and stability. (TA)n repeats of different extension shape the central structure of peaks and co-localize with polyadenylation efficiency element (EE) signals. ORFs with flexible peaks share common features. Transcripts are characterized by decreased half-life: this is considered peculiar of genes involved in regulatory systems with high turnover; consistently, their function affects biological processes such as cell cycle regulation or stress response. Our findings support the functional importance of flexibility peaks, suggesting that the flexible sequence may be derived by an expansion of canonical TAYRTA polyadenylation efficiency element. The flexible (TA)n repeat amplification could be the outcome of an evolutionary neofunctionalization leading to a differential 3’-end processing and expression regulation in genes with peculiar function. Our study provides a new support to the functional role of flexibility in genomes and a strategy for its characterization inside human fragile sites. PMID:25860149
PCOGR: phylogenetic COG ranking as an online tool to judge the specificity of COGs with respect to freely definable groups of organisms.

PubMed

Meereis, Florian; Kaufmann, Michael

2004-10-15

The rapidly increasing number of completely sequenced genomes led to the establishment of the COG-database which, based on sequence homologies, assigns similar proteins from different organisms to clusters of orthologous groups (COGs). There are several bioinformatic studies that made use of this database to determine (hyper)thermophile-specific proteins by searching for COGs containing (almost) exclusively proteins from (hyper)thermophilic genomes. However, public software to perform individually definable group-specific searches is not available. The tool described here exactly fills this gap. The software is accessible at http://www.uni-wh.de/pcogr and is linked to the COG-database. The user can freely define two groups of organisms by selecting for each of the (current) 66 organisms to belong either to groupA, to the reference groupB or to be ignored by the algorithm. Then, for all COGs a specificity index is calculated with respect to the specificity to groupA, i. e. high scoring COGs contain proteins from the most of groupA organisms while proteins from the most organisms assigned to groupB are absent. In addition to ranking all COGs according to the user defined specificity criteria, a graphical visualization shows the distribution of all COGs by displaying their abundance as a function of their specificity indexes. This software allows detecting COGs specific to a predefined group of organisms. All COGs are ranked in the order of their specificity and a graphical visualization allows recognizing (i) the presence and abundance of such COGs and (ii) the phylogenetic relationship between groupA- and groupB-organisms. The software also allows detecting putative protein-protein interactions, novel enzymes involved in only partially known biochemical pathways, and alternate enzymes originated by convergent evolution.
Whole genome sequencing of the monomorphic pathogen Mycobacterium bovis reveals local differentiation of cattle clinical isolates.

PubMed

Lasserre, Moira; Fresia, Pablo; Greif, Gonzalo; Iraola, Gregorio; Castro-Ramos, Miguel; Juambeltz, Arturo; Nuñez, Álvaro; Naya, Hugo; Robello, Carlos; Berná, Luisa

2018-01-02

Bovine tuberculosis (bTB) poses serious risks to animal welfare and economy, as well as to public health as a zoonosis. Its etiological agent, Mycobacterium bovis, belongs to the Mycobacterium tuberculosis complex (MTBC), a group of genetically monomorphic organisms featured by a remarkably high overall nucleotide identity (99.9%). Indeed, this characteristic is of major concern for correct typing and determination of strain-specific traits based on sequence diversity. Due to its historical economic dependence on cattle production, Uruguay is deeply affected by the prevailing incidence of Mycobacterium bovis. With the world's highest number of cattle per human, and its intensive cattle production, Uruguay represents a particularly suited setting to evaluate genomic variability among isolates, and the diversity traits associated to this pathogen. We compared 186 genomes from MTBC strains isolated worldwide, and found a highly structured population in M. bovis. The analysis of 23 new M. bovis genomes, belonging to strains isolated in Uruguay evidenced three groups present in the country. Despite presenting an expected highly conserved genomic structure and sequence, these strains segregate into a clustered manner within the worldwide phylogeny. Analysis of the non-pe/ppe differential areas against a reference genome defined four main sources of variability, namely: regions of difference (RD), variable genes, duplications and novel genes. RDs and variant analysis segregated the strains into clusters that are concordant with their spoligotype identities. Due to its high homoplasy rate, spoligotyping failed to reflect the true genomic diversity among worldwide representative strains, however, it remains a good indicator for closely related populations. This study introduces a comprehensive population structure analysis of worldwide M. bovis isolates. The incorporation and analysis of 23 novel Uruguayan M. bovis genomes, sheds light onto the genomic diversity of this pathogen, evidencing the existence of greater genetic variability among strains than previously contemplated.
NGS-based approach to determine the presence of HPV and their sites of integration in human cancer genome.

PubMed

Chandrani, P; Kulkarni, V; Iyer, P; Upadhyay, P; Chaubal, R; Das, P; Mulherkar, R; Singh, R; Dutt, A

2015-06-09

Human papilloma virus (HPV) accounts for the most common cause of all virus-associated human cancers. Here, we describe the first graphic user interface (GUI)-based automated tool 'HPVDetector', for non-computational biologists, exclusively for detection and annotation of the HPV genome based on next-generation sequencing data sets. We developed a custom-made reference genome that comprises of human chromosomes along with annotated genome of 143 HPV types as pseudochromosomes. The tool runs on a dual mode as defined by the user: a 'quick mode' to identify presence of HPV types and an 'integration mode' to determine genomic location for the site of integration. The input data can be a paired-end whole-exome, whole-genome or whole-transcriptome data set. The HPVDetector is available in public domain for download: http://www.actrec.gov.in/pi-webpages/AmitDutt/HPVdetector/HPVDetector.html. On the basis of our evaluation of 116 whole-exome, 23 whole-transcriptome and 2 whole-genome data, we were able to identify presence of HPV in 20 exomes and 4 transcriptomes of cervical and head and neck cancer tumour samples. Using the inbuilt annotation module of HPVDetector, we found predominant integration of viral gene E7, a known oncogene, at known 17q21, 3q27, 7q35, Xq28 and novel sites of integration in the human genome. Furthermore, co-infection with high-risk HPVs such as 16 and 31 were found to be mutually exclusive compared with low-risk HPV71. HPVDetector is a simple yet precise and robust tool for detecting HPV from tumour samples using variety of next-generation sequencing platforms including whole genome, whole exome and transcriptome. Two different modes (quick detection and integration mode) along with a GUI widen the usability of HPVDetector for biologists and clinicians with minimal computational knowledge.
Unexpected genomic relationships between Bacillus anthracis strains from Bangladesh and Central Europe.

PubMed

Rume, Farzana Islam; Ahsan, Chowdhury Rafiqul; Biswas, Paritosh Kumar; Yasmin, Mahmuda; Braun, Peter; Walter, Mathias C; Antwerpen, Markus; Grass, Gregor; Hanczaruk, Matthias

2016-11-01

The zoonosis anthrax caused by the bacterium Bacillus anthracis has a broad geographical distribution. Active enzootic areas are typically located away from central and northern Europe where cases of the disease occur only sporadically and in limited numbers. In contrast, a few out of the 64 districts of Bangladesh are hyper-endemic for anthrax and there the disease causes major losses in live-stock. In this study we genotyped eight strains of B. anthracis collected from the districts of Sirajganj and Tangail in 2013. All these strains belonged to canSNP group A.Br.001/002 Sterne differing only in a few of 31 tandem-repeat (MLVA)-markers. Whole genome sequences were obtained from five of these strains and compared with genomic information of B. anthracis strains originating from various geographical locations. Characteristic signatures were detected defining two "Bangladesh" clusters potentially useful for rapid molecular epidemiology. From this data high-resolution PCR assays were developed and subsequently tested on additional isolates from Bangladesh and Central Europe. Remarkably, this comparative genomic analysis focusing on SNP-discovery revealed a close genetic relationship between these strains from Bangladesh and historic strains collected between 1991 and 2008 in The Netherlands and Germany, respectively. Possible explanations for these phylogenetic relationships are discussed. Copyright Â© 2016 Elsevier B.V. All rights reserved.
An in silico pan-genomic probe for the molecular traits behind Lactobacillus ruminis gut autochthony.

PubMed

Kant, Ravi; Palva, Airi; von Ossowski, Ingemar

2017-01-01

As an ecological niche, the mammalian intestine provides the ideal habitat for a variety of bacterial microorganisms. Purportedly, some commensal genera and species offer a beneficial mix of metabolic, protective, and structural processes that help sustain the natural digestive health of the host. Among these sort of gut inhabitants is the Gram-positive lactic acid bacterium Lactobacillus ruminis, a strict anaerobe with both pili and flagella on its cell surface, but also known for being autochthonous (indigenous) to the intestinal environment. Given that the molecular basis of gut autochthony for this species is largely unexplored and unknown, we undertook a study at the genome level to pinpoint some of the adaptive traits behind its colonization behavior. In our pan-genomic probe of L. ruminis, the genomes of nine different strains isolated from human, bovine, porcine, and equine host guts were compiled and compared for in silico analysis. For this, we conducted a geno-phenotypic assessment of protein-coding genes, with an emphasis on those products involved with cell-surface morphology and anaerobic fermentation and respiration. We also categorized and examined the core and accessory genes that define the L. ruminis species and its strains. Here, we made an attempt to identify those genes having ecologically relevant phenotypes that might support or bring about intestinal indigenousness.
Genome-wide mapping reveals single-origin chromosome replication in Leishmania, a eukaryotic microbe.

PubMed

Marques, Catarina A; Dickens, Nicholas J; Paape, Daniel; Campbell, Samantha J; McCulloch, Richard

2015-10-19

DNA replication initiates on defined genome sites, termed origins. Origin usage appears to follow common rules in the eukaryotic organisms examined to date: all chromosomes are replicated from multiple origins, which display variations in firing efficiency and are selected from a larger pool of potential origins. To ask if these features of DNA replication are true of all eukaryotes, we describe genome-wide origin mapping in the parasite Leishmania. Origin mapping in Leishmania suggests a striking divergence in origin usage relative to characterized eukaryotes, since each chromosome appears to be replicated from a single origin. By comparing two species of Leishmania, we find evidence that such origin singularity is maintained in the face of chromosome fusion or fission events during evolution. Mapping Leishmania origins suggests that all origins fire with equal efficiency, and that the genomic sites occupied by origins differ from related non-origins sites. Finally, we provide evidence that origin location in Leishmania displays striking conservation with Trypanosoma brucei, despite the latter parasite replicating its chromosomes from multiple, variable strength origins. The demonstration of chromosome replication for a single origin in Leishmania, a microbial eukaryote, has implications for the evolution of origin multiplicity and associated controls, and may explain the pervasive aneuploidy that characterizes Leishmania chromosome architecture.
Biotechnological Potential of Cold Adapted Pseudoalteromonas spp. Isolated from ‘Deep Sea’ Sponges

PubMed Central

Borchert, Erik; Knobloch, Stephen; Dwyer, Emilie; Flynn, Sinéad; Jackson, Stephen A.; Jóhannsson, Ragnar; Marteinsson, Viggó T.; O’Gara, Fergal; Dobson, Alan D. W.

2017-01-01

The marine genus Pseudoalteromonas is known for its versatile biotechnological potential with respect to the production of antimicrobials and enzymes of industrial interest. We have sequenced the genomes of three Pseudoalteromonas sp. strains isolated from different deep sea sponges on the Illumina MiSeq platform. The isolates have been screened for various industrially important enzymes and comparative genomics has been applied to investigate potential relationships between the isolates and their host organisms, while comparing them to free-living Pseudoalteromonas spp. from shallow and deep sea environments. The genomes of the sponge associated Pseudoalteromonas strains contained much lower levels of potential eukaryotic-like proteins which are known to be enriched in symbiotic sponge associated microorganisms, than might be expected for true sponge symbionts. While all the Pseudoalteromonas shared a large distinct subset of genes, nonetheless the number of unique and accessory genes is quite large and defines the pan-genome as open. Enzymatic screens indicate that a vast array of enzyme activities is expressed by the isolates, including β-galactosidase, β-glucosidase, and protease activities. A β-glucosidase gene from one of the Pseudoalteromonas isolates, strain EB27 was heterologously expressed in Escherichia coli and, following biochemical characterization, the recombinant enzyme was found to be cold-adapted, thermolabile, halotolerant, and alkaline active. PMID:28629190
Treatment of Mestastatic Breast Cancer by Photodynamic Therapy Induced Anti-Tumor Immunity in a Murine Model

DTIC Science & Technology

2005-12-01

dinucleotide and were more common in the genomes of bacteria compared to humans. Immunostimulatory sequences in bacterial ( bDNA ) that are structurally defined...stimulates B cells, natural killer (NK) cells, dendritic cells (DC), and macrophages, regardless of whether the DNA is in the form of genomic bDNA or
Epigenetic regulation of gene expression and cellular functions induced by butyrate, an example of interactions between gene and nutrients

USDA-ARS?s Scientific Manuscript database

Epigenetics has been defined as ‘the study of heritable changes in genome function that occur without a change in DNA sequence. Research on nutrigenomics, the genome-nutrient interface and epigenomics is in its infancy with respect to livestock species. Feed costs are the single greatest expense t...
Optimization of multi-environment trials for genomic selection based on crop models.

PubMed

Rincent, R; Kuhn, E; Monod, H; Oury, F-X; Rousset, M; Allard, V; Le Gouis, J

2017-08-01

We propose a statistical criterion to optimize multi-environment trials to predict genotype × environment interactions more efficiently, by combining crop growth models and genomic selection models. Genotype × environment interactions (GEI) are common in plant multi-environment trials (METs). In this context, models developed for genomic selection (GS) that refers to the use of genome-wide information for predicting breeding values of selection candidates need to be adapted. One promising way to increase prediction accuracy in various environments is to combine ecophysiological and genetic modelling thanks to crop growth models (CGM) incorporating genetic parameters. The efficiency of this approach relies on the quality of the parameter estimates, which depends on the environments composing this MET used for calibration. The objective of this study was to determine a method to optimize the set of environments composing the MET for estimating genetic parameters in this context. A criterion called OptiMET was defined to this aim, and was evaluated on simulated and real data, with the example of wheat phenology. The MET defined with OptiMET allowed estimating the genetic parameters with lower error, leading to higher QTL detection power and higher prediction accuracies. MET defined with OptiMET was on average more efficient than random MET composed of twice as many environments, in terms of quality of the parameter estimates. OptiMET is thus a valuable tool to determine optimal experimental conditions to best exploit MET and the phenotyping tools that are currently developed.
proGenomes: a resource for consistent functional and taxonomic annotations of prokaryotic genomes.

PubMed

Mende, Daniel R; Letunic, Ivica; Huerta-Cepas, Jaime; Li, Simone S; Forslund, Kristoffer; Sunagawa, Shinichi; Bork, Peer

2017-01-04

The availability of microbial genomes has opened many new avenues of research within microbiology. This has been driven primarily by comparative genomics approaches, which rely on accurate and consistent characterization of genomic sequences. It is nevertheless difficult to obtain consistent taxonomic and integrated functional annotations for defined prokaryotic clades. Thus, we developed proGenomes, a resource that provides user-friendly access to currently 25 038 high-quality genomes whose sequences and consistent annotations can be retrieved individually or by taxonomic clade. These genomes are assigned to 5306 consistent and accurate taxonomic species clusters based on previously established methodology. proGenomes also contains functional information for almost 80 million protein-coding genes, including a comprehensive set of general annotations and more focused annotations for carbohydrate-active enzymes and antibiotic resistance genes. Additionally, broad habitat information is provided for many genomes. All genomes and associated information can be downloaded by user-selected clade or multiple habitat-specific sets of representative genomes. We expect that the availability of high-quality genomes with comprehensive functional annotations will promote advances in clinical microbial genomics, functional evolution and other subfields of microbiology. proGenomes is available at http://progenomes.embl.de. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
The genomic landscape shaped by selection on transposable elements across 18 mouse strains.

PubMed

Nellåker, Christoffer; Keane, Thomas M; Yalcin, Binnaz; Wong, Kim; Agam, Avigail; Belgard, T Grant; Flint, Jonathan; Adams, David J; Frankel, Wayne N; Ponting, Chris P

2012-06-15

Transposable element (TE)-derived sequence dominates the landscape of mammalian genomes and can modulate gene function by dysregulating transcription and translation. Our current knowledge of TEs in laboratory mouse strains is limited primarily to those present in the C57BL/6J reference genome, with most mouse TEs being drawn from three distinct classes, namely short interspersed nuclear elements (SINEs), long interspersed nuclear elements (LINEs) and the endogenous retrovirus (ERV) superfamily. Despite their high prevalence, the different genomic and gene properties controlling whether TEs are preferentially purged from, or are retained by, genetic drift or positive selection in mammalian genomes remain poorly defined. Using whole genome sequencing data from 13 classical laboratory and 4 wild-derived mouse inbred strains, we developed a comprehensive catalogue of 103,798 polymorphic TE variants. We employ this extensive data set to characterize TE variants across the Mus lineage, and to infer neutral and selective processes that have acted over 2 million years. Our results indicate that the majority of TE variants are introduced though the male germline and that only a minority of TE variants exert detectable changes in gene expression. However, among genes with differential expression across the strains there are twice as many TE variants identified as being putative causal variants as expected. Most TE variants that cause gene expression changes appear to be purged rapidly by purifying selection. Our findings demonstrate that past TE insertions have often been highly deleterious, and help to prioritize TE variants according to their likely contribution to gene expression or phenotype variation.
Packaging signals in two single-stranded RNA viruses imply a conserved assembly mechanism and geometry of the packaged genome.

PubMed

Dykeman, Eric C; Stockley, Peter G; Twarock, Reidun

2013-09-09

The current paradigm for assembly of single-stranded RNA viruses is based on a mechanism involving non-sequence-specific packaging of genomic RNA driven by electrostatic interactions. Recent experiments, however, provide compelling evidence for sequence specificity in this process both in vitro and in vivo. The existence of multiple RNA packaging signals (PSs) within viral genomes has been proposed, which facilitates assembly by binding coat proteins in such a way that they promote the protein-protein contacts needed to build the capsid. The binding energy from these interactions enables the confinement or compaction of the genomic RNAs. Identifying the nature of such PSs is crucial for a full understanding of assembly, which is an as yet untapped potential drug target for this important class of pathogens. Here, for two related bacterial viruses, we determine the sequences and locations of their PSs using Hamiltonian paths, a concept from graph theory, in combination with bioinformatics and structural studies. Their PSs have a common secondary structure motif but distinct consensus sequences and positions within the respective genomes. Despite these differences, the distributions of PSs in both viruses imply defined conformations for the packaged RNA genomes in contact with the protein shell in the capsid, consistent with a recent asymmetric structure determination of the MS2 virion. The PS distributions identified moreover imply a preferred, evolutionarily conserved assembly pathway with respect to the RNA sequence with potentially profound implications for other single-stranded RNA viruses known to have RNA PSs, including many animal and human pathogens. Copyright © 2013 Elsevier Ltd. All rights reserved.
Analysis of genomic alterations in neuroblastoma by multiplex ligation-dependent probe amplification and array comparative genomic hybridization: a comparison of results.

PubMed

Combaret, Valérie; Iacono, Isabelle; Bréjon, Stéphanie; Schleiermacher, Gudrun; Pierron, Gäelle; Couturier, Jérôme; Bergeron, Christophe; Blay, Jean-Yves

2012-12-01

In cases of neuroblastoma, recurring genetic alterations--losses of the 1p, 3p, 4p, and 11q and/or gains of 1q, 2p, and 17q chromosome arms--are currently used to define the therapeutic strategy in therapeutic protocols for low- and intermediate-risk patients. Different genome-wide analysis techniques, such as array comparative genomic hybridization (aCGH) or multiplex ligation-dependent probe amplification (MLPA), have been suggested for detecting chromosome segmental abnormalities. In this study, we compared the results of the two technologies in the analyses of the DNA of tumor samples from 91 neuroblastoma patients. Similar results were obtained with the two techniques for 75 samples (82%). In five cases (5.5%), the MLPA results were not interpretable. Discrepancies between the aCGH and MLPA results were observed in 11 cases (12%). Among the discrepancies, a 18q21.2-qter gain and 16p11.2 and 11q14.1-q14.3 losses were detected only by aCGH. The MLPA results showed that the 7p, 7q, and 14q chromosome arms were affected in six cases, while in two cases, 2p and 17q gains were observed; these results were confirmed by neither aCGH nor fluorescence in situ hybridization (FISH) analysis. Because of the higher sensitivity and specificity of genome-wide information, reasonable cost, and shorter time of aCGH analysis, we recommend the aCGH procedure for the analysis of genomic alterations in neuroblastoma. Copyright © 2012 Elsevier Inc. All rights reserved.
Diversity and evolution of phycobilisomes in marine Synechococcus spp.: a comparative genomics study.

PubMed

Six, Christophe; Thomas, Jean-Claude; Garczarek, Laurence; Ostrowski, Martin; Dufresne, Alexis; Blot, Nicolas; Scanlan, David J; Partensky, Frédéric

2007-01-01

Marine Synechococcus owe their specific vivid color (ranging from blue-green to orange) to their large extrinsic antenna complexes called phycobilisomes, comprising a central allophycocyanin core and rods of variable phycobiliprotein composition. Three major pigment types can be defined depending on the major phycobiliprotein found in the rods (phycocyanin, phycoerythrin I or phycoerythrin II). Among strains containing both phycoerythrins I and II, four subtypes can be distinguished based on the ratio of the two chromophores bound to these phycobiliproteins. Genomes of eleven marine Synechococcus strains recently became available with one to four strains per pigment type or subtype, allowing an unprecedented comparative genomics study of genes involved in phycobilisome metabolism. By carefully comparing the Synechococcus genomes, we have retrieved candidate genes potentially required for the synthesis of phycobiliproteins in each pigment type. This includes linker polypeptides, phycobilin lyases and a number of novel genes of uncharacterized function. Interestingly, strains belonging to a given pigment type have similar phycobilisome gene complements and organization, independent of the core genome phylogeny (as assessed using concatenated ribosomal proteins). While phylogenetic trees based on concatenated allophycocyanin protein sequences are congruent with the latter, those based on phycocyanin and phycoerythrin notably differ and match the Synechococcus pigment types. We conclude that the phycobilisome core has likely evolved together with the core genome, while rods must have evolved independently, possibly by lateral transfer of phycobilisome rod genes or gene clusters between Synechococcus strains, either via viruses or by natural transformation, allowing rapid adaptation to a variety of light niches.
Plasmid Dynamics in KPC-Positive Klebsiella pneumoniae during Long-Term Patient Colonization

PubMed Central

Park, Morgan; Deming, Clayton; Thomas, Pamela J.; Young, Alice C.; Coleman, Holly; Sison, Christina; Weingarten, Rebecca A.; Lau, Anna F.; Dekker, John P.; Palmore, Tara N.; Frank, Karen M.

2016-01-01

ABSTRACT Carbapenem-resistant Klebsiella pneumoniae strains are formidable hospital pathogens that pose a serious threat to patients around the globe due to a rising incidence in health care facilities, high mortality rates associated with infection, and potential to spread antibiotic resistance to other bacterial species, such as Escherichia coli. Over 6 months in 2011, 17 patients at the National Institutes of Health (NIH) Clinical Center became colonized with a highly virulent, transmissible carbapenem-resistant strain of K. pneumoniae. Our real-time genomic sequencing tracked patient-to-patient routes of transmission and informed epidemiologists’ actions to monitor and control this outbreak. Two of these patients remained colonized with carbapenemase-producing organisms for at least 2 to 4 years, providing the opportunity to undertake a focused genomic study of long-term colonization with antibiotic-resistant bacteria. Whole-genome sequencing studies shed light on the underlying complex microbial colonization, including mixed or evolving bacterial populations and gain or loss of plasmids. Isolates from NIH patient 15 showed complex plasmid rearrangements, leaving the chromosome and the blaKPC-carrying plasmid intact but rearranging the two other plasmids of this outbreak strain. NIH patient 16 has shown continuous colonization with blaKPC-positive organisms across multiple time points spanning 2011 to 2015. Genomic studies defined a complex pattern of succession and plasmid transmission across two different K. pneumoniae sequence types and an E. coli isolate. These findings demonstrate the utility of genomic methods for understanding strain succession, genome plasticity, and long-term carriage of antibiotic-resistant organisms. PMID:27353756
Comparative genomics of Enterococcus faecalis from healthy Norwegian infants

PubMed Central

Solheim, Margrete; Aakra, Ågot; Snipen, Lars G; Brede, Dag A; Nes, Ingolf F

2009-01-01

Background Enterococcus faecalis, traditionally considered a harmless commensal of the intestinal tract, is now ranked among the leading causes of nosocomial infections. In an attempt to gain insight into the genetic make-up of commensal E. faecalis, we have studied genomic variation in a collection of community-derived E. faecalis isolated from the feces of Norwegian infants. Results The E. faecalis isolates were first sequence typed by multilocus sequence typing (MLST) and characterized with respect to antibiotic resistance and properties associated with virulence. A subset of the isolates was compared to the vancomycin resistant strain E. faecalis V583 (V583) by whole genome microarray comparison (comparative genomic hybridization (CGH)). Several of the putative enterococcal virulence factors were found to be highly prevalent among the commensal baby isolates. The genomic variation as observed by CGH was less between isolates displaying the same MLST sequence type than between isolates belonging to different evolutionary lineages. Conclusion The variations in gene content observed among the investigated commensal E. faecalis is comparable to the genetic variation previously reported among strains of various origins thought to be representative of the major E. faecalis lineages. Previous MLST analysis of E. faecalis have identified so-called high-risk enterococcal clonal complexes (HiRECC), defined as genetically distinct subpopulations, epidemiologically associated with enterococcal infections. The observed correlation between CGH and MLST presented here, may offer a method for the identification of lineage-specific genes, and may therefore add clues on how to distinguish pathogenic from commensal E. faecalis. In this work, information on the core genome of E. faecalis is also substantially extended. PMID:19393078
Constrained release of lamina-associated enhancers and genes from the nuclear envelope during T-cell activation facilitates their association in chromosome compartments.

PubMed

Robson, Michael I; de Las Heras, Jose I; Czapiewski, Rafal; Sivakumar, Aishwarya; Kerr, Alastair R W; Schirmer, Eric C

2017-07-01

The 3D organization of the genome changes concomitantly with expression changes during hematopoiesis and immune activation. Studies have focused either on lamina-associated domains (LADs) or on topologically associated domains (TADs), defined by preferential local chromatin interactions, and chromosome compartments, defined as higher-order interactions between TADs sharing functionally similar states. However, few studies have investigated how these affect one another. To address this, we mapped LADs using Lamin B1-DamID during Jurkat T-cell activation, finding significant genome reorganization at the nuclear periphery dominated by release of loci frequently important for T-cell function. To assess how these changes at the nuclear periphery influence wider genome organization, our DamID data sets were contrasted with TADs and compartments. Features of specific repositioning events were then tested by fluorescence in situ hybridization during T-cell activation. First, considerable overlap between TADs and LADs was observed with the TAD repositioning as a unit. Second, A1 and A2 subcompartments are segregated in 3D space through differences in proximity to LADs along chromosomes. Third, genes and a putative enhancer in LADs that were released from the periphery during T-cell activation became preferentially associated with A2 subcompartments and were constrained to the relative proximity of the lamina. Thus, lamina associations influence internal nuclear organization, and changes in LADs during T-cell activation may provide an important additional mode of gene regulation. © 2017 Robson et al.; Published by Cold Spring Harbor Laboratory Press.
SNiPlay: a web-based tool for detection, management and analysis of SNPs. Application to grapevine diversity projects.

PubMed

Dereeper, Alexis; Nicolas, Stéphane; Le Cunff, Loïc; Bacilieri, Roberto; Doligez, Agnès; Peros, Jean-Pierre; Ruiz, Manuel; This, Patrice

2011-05-05

High-throughput re-sequencing, new genotyping technologies and the availability of reference genomes allow the extensive characterization of Single Nucleotide Polymorphisms (SNPs) and insertion/deletion events (indels) in many plant species. The rapidly increasing amount of re-sequencing and genotyping data generated by large-scale genetic diversity projects requires the development of integrated bioinformatics tools able to efficiently manage, analyze, and combine these genetic data with genome structure and external data. In this context, we developed SNiPlay, a flexible, user-friendly and integrative web-based tool dedicated to polymorphism discovery and analysis. It integrates:1) a pipeline, freely accessible through the internet, combining existing softwares with new tools to detect SNPs and to compute different types of statistical indices and graphical layouts for SNP data. From standard sequence alignments, genotyping data or Sanger sequencing traces given as input, SNiPlay detects SNPs and indels events and outputs submission files for the design of Illumina's SNP chips. Subsequently, it sends sequences and genotyping data into a series of modules in charge of various processes: physical mapping to a reference genome, annotation (genomic position, intron/exon location, synonymous/non-synonymous substitutions), SNP frequency determination in user-defined groups, haplotype reconstruction and network, linkage disequilibrium evaluation, and diversity analysis (Pi, Watterson's Theta, Tajima's D).Furthermore, the pipeline allows the use of external data (such as phenotype, geographic origin, taxa, stratification) to define groups and compare statistical indices.2) a database storing polymorphisms, genotyping data and grapevine sequences released by public and private projects. It allows the user to retrieve SNPs using various filters (such as genomic position, missing data, polymorphism type, allele frequency), to compare SNP patterns between populations, and to export genotyping data or sequences in various formats. Our experiments on grapevine genetic projects showed that SNiPlay allows geneticists to rapidly obtain advanced results in several key research areas of plant genetic diversity. Both the management and treatment of large amounts of SNP data are rendered considerably easier for end-users through automation and integration. Current developments are taking into account new advances in high-throughput technologies.SNiPlay is available at: http://sniplay.cirad.fr/.

Meeting Report from the Genomic Standards Consortium (GSC) Workshop 8

PubMed Central

Kyrpides, Nikos; Field, Dawn; Sterk, Peter; Kottmann, Renzo; Glöckner, Frank Oliver; Hirschman, Lynette; Garrity, George M.; Cochrane, Guy; Wooley, John

2010-01-01

This report summarizes the proceedings of the 8th meeting of the Genomic Standards Consortium held at the Department of Energy Joint Genome Institute in Walnut Creek, CA, USA on September 9-11, 2009. This three-day workshop marked the maturing of Genomic Standards Consortium from an informal gathering of researchers interested in developing standards in the field of genomic and metagenomics to an established community with a defined governance mechanism, its own open access journal, and a family of established standards for describing genomes, metagenomes and marker studies (i.e. ribosomal RNA gene surveys). There will be increased efforts within the GSC to reach out to the wider scientific community via a range of new projects. Further information about the GSC and its activities can be found at http://gensc.org/. PMID:21304696
Multilevel Research and the Challenges of Implementing Genomic Medicine

PubMed Central

Coates, Ralph J.; Fennell, Mary L.; Glasgow, Russell E.; Scheuner, Maren T.; Schully, Sheri D.; Williams, Marc S.; Clauser, Steven B.

2012-01-01

Advances in genomics and related fields promise a new era of personalized medicine in the cancer care continuum. Nevertheless, there are fundamental challenges in integrating genomic medicine into cancer practice. We explore how multilevel research can contribute to implementation of genomic medicine. We first review the rapidly developing scientific discoveries in this field and the paucity of current applications that are ready for implementation in clinical and public health programs. We then define a multidisciplinary translational research agenda for successful integration of genomic medicine into policy and practice and consider challenges for successful implementation. We illustrate the agenda using the example of Lynch syndrome testing in newly diagnosed cases of colorectal cancer and cascade testing in relatives. We synthesize existing information in a framework for future multilevel research for integrating genomic medicine into the cancer care continuum. PMID:22623603
Multilevel research and the challenges of implementing genomic medicine.

PubMed

Khoury, Muin J; Coates, Ralph J; Fennell, Mary L; Glasgow, Russell E; Scheuner, Maren T; Schully, Sheri D; Williams, Marc S; Clauser, Steven B

2012-05-01

Advances in genomics and related fields promise a new era of personalized medicine in the cancer care continuum. Nevertheless, there are fundamental challenges in integrating genomic medicine into cancer practice. We explore how multilevel research can contribute to implementation of genomic medicine. We first review the rapidly developing scientific discoveries in this field and the paucity of current applications that are ready for implementation in clinical and public health programs. We then define a multidisciplinary translational research agenda for successful integration of genomic medicine into policy and practice and consider challenges for successful implementation. We illustrate the agenda using the example of Lynch syndrome testing in newly diagnosed cases of colorectal cancer and cascade testing in relatives. We synthesize existing information in a framework for future multilevel research for integrating genomic medicine into the cancer care continuum.
Reconstruction of the vertebrate ancestral genome reveals dynamic genome reorganization in early vertebrates.

PubMed

Nakatani, Yoichiro; Takeda, Hiroyuki; Kohara, Yuji; Morishita, Shinichi

2007-09-01

Although several vertebrate genomes have been sequenced, little is known about the genome evolution of early vertebrates and how large-scale genomic changes such as the two rounds of whole-genome duplications (2R WGD) affected evolutionary complexity and novelty in vertebrates. Reconstructing the ancestral vertebrate genome is highly nontrivial because of the difficulty in identifying traces originating from the 2R WGD. To resolve this problem, we developed a novel method capable of pinning down remains of the 2R WGD in the human and medaka fish genomes using invertebrate tunicate and sea urchin genes to define ohnologs, i.e., paralogs produced by the 2R WGD. We validated the reconstruction using the chicken genome, which was not considered in the reconstruction step, and observed that many ancestral proto-chromosomes were retained in the chicken genome and had one-to-one correspondence to chicken microchromosomes, thereby confirming the reconstructed ancestral genomes. Our reconstruction revealed a contrast between the slow karyotype evolution after the second WGD and the rapid, lineage-specific genome reorganizations that occurred in the ancestral lineages of major taxonomic groups such as teleost fishes, amphibians, reptiles, and marsupials.
Radiation induced genome instability: multiscale modelling and data analysis

NASA Astrophysics Data System (ADS)

Andreev, Sergey; Eidelman, Yuri

2012-07-01

Genome instability (GI) is thought to be an important step in cancer induction and progression. Radiation induced GI is usually defined as genome alterations in the progeny of irradiated cells. The aim of this report is to demonstrate an opportunity for integrative analysis of radiation induced GI on the basis of multiscale modelling. Integrative, systems level modelling is necessary to assess different pathways resulting in GI in which a variety of genetic and epigenetic processes are involved. The multilevel modelling includes the Monte Carlo based simulation of several key processes involved in GI: DNA double strand breaks (DSBs) generation in cells initially irradiated as well as in descendants of irradiated cells, damage transmission through mitosis. Taking the cell-cycle-dependent generation of DNA/chromosome breakage into account ensures an advantage in estimating the contribution of different DNA damage response pathways to GI, as to nonhomologous vs homologous recombination repair mechanisms, the role of DSBs at telomeres or interstitial chromosomal sites, etc. The preliminary estimates show that both telomeric and non-telomeric DSB interactions are involved in delayed effects of radiation although differentially for different cell types. The computational experiments provide the data on the wide spectrum of GI endpoints (dicentrics, micronuclei, nonclonal translocations, chromatid exchanges, chromosome fragments) similar to those obtained experimentally for various cell lines under various experimental conditions. The modelling based analysis of experimental data demonstrates that radiation induced GI may be viewed as processes of delayed DSB induction/interaction/transmission being a key for quantification of GI. On the other hand, this conclusion is not sufficient to understand GI as a whole because factors of DNA non-damaging origin can also induce GI. Additionally, new data on induced pluripotent stem cells reveal that GI is acquired in normal mature cells during genome reprogramming by the oncogene c-myc and three additional transcription factors. These and other data reveal the need for generalisation of current model of GI. One can expect that different early events of both DNA damaging and non-damaging origins merge in a single late pathway. To search for a deeper view we propose to redefine GI as genome destabilisation manifested in erosion of genome states and altered transitions between states. This changing view on GI may help to integrate the inducing factors of various origins in the single basic model of GI.
Within-Genome Evolution of REPINs: a New Family of Miniature Mobile DNA in Bacteria

PubMed Central

Bertels, Frederic; Rainey, Paul B.

2011-01-01

Repetitive sequences are a conserved feature of many bacterial genomes. While first reported almost thirty years ago, and frequently exploited for genotyping purposes, little is known about their origin, maintenance, or processes affecting the dynamics of within-genome evolution. Here, beginning with analysis of the diversity and abundance of short oligonucleotide sequences in the genome of Pseudomonas fluorescens SBW25, we show that over-represented short sequences define three distinct groups (GI, GII, and GIII) of repetitive extragenic palindromic (REP) sequences. Patterns of REP distribution suggest that closely linked REP sequences form a functional replicative unit: REP doublets are over-represented, randomly distributed in extragenic space, and more highly conserved than singlets. In addition, doublets are organized as inverted repeats, which together with intervening spacer sequences are predicted to form hairpin structures in ssDNA or mRNA. We refer to these newly defined entities as REPINs (REP doublets forming hairpins) and identify short reads from population sequencing that reveal putative transposition intermediates. The proximal relationship between GI, GII, and GIII REPINs and specific REP-associated tyrosine transposases (RAYTs), combined with features of the putative transposition intermediate, suggests a mechanism for within-genome dissemination. Analysis of the distribution of REPs in a range of RAYT–containing bacterial genomes, including Escherichia coli K-12 and Nostoc punctiforme, show that REPINs are a widely distributed, but hitherto unrecognized, family of miniature non-autonomous mobile DNA. PMID:21698139
Caryoscope: An Open Source Java application for viewing microarray data in a genomic context

PubMed Central

Awad, Ihab AB; Rees, Christian A; Hernandez-Boussard, Tina; Ball, Catherine A; Sherlock, Gavin

2004-01-01

Background Microarray-based comparative genome hybridization experiments generate data that can be mapped onto the genome. These data are interpreted more easily when represented graphically in a genomic context. Results We have developed Caryoscope, which is an open source Java application for visualizing microarray data from array comparative genome hybridization experiments in a genomic context. Caryoscope can read General Feature Format files (GFF files), as well as comma- and tab-delimited files, that define the genomic positions of the microarray reporters for which data are obtained. The microarray data can be browsed using an interactive, zoomable interface, which helps users identify regions of chromosomal deletion or amplification. The graphical representation of the data can be exported in a number of graphic formats, including publication-quality formats such as PostScript. Conclusion Caryoscope is a useful tool that can aid in the visualization, exploration and interpretation of microarray data in a genomic context. PMID:15488149
Genome-Wide Mutagenesis in Borrelia burgdorferi.

PubMed

Lin, Tao; Gao, Lihui

2018-01-01

Signature-tagged mutagenesis (STM) is a functional genomics approach to identify bacterial virulence determinants and virulence factors by simultaneously screening multiple mutants in a single host animal, and has been utilized extensively for the study of bacterial pathogenesis, host-pathogen interactions, and spirochete and tick biology. The signature-tagged transposon mutagenesis has been developed to investigate virulence determinants and pathogenesis of Borrelia burgdorferi. Mutants in genes important in virulence are identified by negative selection in which the mutants fail to colonize or disseminate in the animal host and tick vector. STM procedure combined with Luminex Flex ® Map™ technology and next-generation sequencing (e.g., Tn-seq) are the powerful high-throughput tools for the determination of Borrelia burgdorferi virulence determinants. The assessment of multiple tissue sites and two DNA resources at two different time points using Luminex Flex ® Map™ technology provides a robust data set. B. burgdorferi transposon mutant screening indicates that a high proportion of genes are the novel virulence determinants that are required for mouse and tick infection. In this protocol, an effective signature-tagged Himar1-based transposon suicide vector was developed and used to generate a sequence-defined library of nearly 4800 mutants in the infectious B. burgdorferi B31 clone. In STM, signature-tagged suicide vectors are constructed by inserting unique DNA sequences (tags) into the transposable elements. The signature-tagged transposon mutants are generated when transposon suicide vectors are transformed into an infectious B. burgdorferi clone, and the transposable element is transposed into the 5'-TA-3' sequence in the B. burgdorferi genome with the signature tag. The transposon library is created and consists of many sub-libraries, each sub-library has several hundreds of mutants with same tags. A group of mice or ticks are infected with a mixed population of mutants with different tags, after recovered from different tissues of infected mice and ticks, mutants from output pool and input pool are detected using high-throughput, semi-quantitative Luminex ® FLEXMAP™ or next-generation sequencing (Tn-seq) technologies. Thus far, we have created a high-density, sequence-defined transposon library of over 6600 STM mutants for the efficient genome-wide investigation of genes and gene products required for wild-type pathogenesis, host-pathogen interactions, in vitro growth, in vivo survival, physiology, morphology, chemotaxis, motility, structure, metabolism, gene regulation, plasmid maintenance and replication, etc. The insertion sites of 4480 transposon mutants have been determined. About 800 predicted protein-encoding genes in the genome were disrupted in the STM transposon library. The infectivity and some functions of 800 mutants in 500 genes have been determined. Analysis of these transposon mutants has yielded valuable information regarding the genes and gene products important in the pathogenesis and biology of B. burgdorferi and its tick vectors.
Comparative genome analysis of Pseudomonas genomes including Populus-associated isolates

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jun, Se Ran; Wassenaar, Trudy; Nookaew, Intawat

The Pseudomonas genus contains a metabolically versatile group of organisms that are known to occupy numerous ecological niches including the rhizosphere and endosphere of many plants influencing phylogenetic diversity and heterogeneity. In this study, comparative genome analysis was performed on over one thousand Pseudomonas genomes, including 21 Pseudomonas strains isolated from the roots of native Populus deltoides. Based on average amino acid identity, genomic clusters were identified within the Pseudomonas genus, which showed agreements with clades by NCBI and cliques by IMG. The P. fluorescens group was organized into 20 distinct genomic clusters, representing enormous diversity and heterogeneity. The speciesmore » P. aeruginosa showed clear distinction in their genomic relatedness compared to other Pseudomonas species groups based on the pan and core genome analysis. The 19 isolates of our 21 Populus-associated isolates formed three distinct subgroups within the P. fluorescens major group, supported by pathway profiles analysis, while two isolates were more closely related to P. chlororaphis and P. putida. The specific genes to Populus-associated subgroups were identified where genes specific to subgroup 1 include several sensory systems such as proteins which act in two-component signal transduction, a TonB-dependent receptor, and a phosphorelay sensor; specific genes to subgroup 2 contain unique hypothetical genes; and genes specific to subgroup 3 organisms have a different hydrolase activity. IMPORTANCE The comparative genome analyses of the genus Pseudomonas that included Populus-associated isolates resulted in novel insights into high diversity of Pseudomonas. Consistent and robust genomic clusters with phylogenetic homogeneity were identified, which resolved species-clades that are not clearly defined by 16S rRNA gene sequence analysis alone. The genomic clusters may be reflective of distinct ecological niches to which the organisms have adapted, but this needs to be experimentally characterized with ecologically relevant phenotype properties. This study justifies the need to sequence multiple isolates, especially from P. fluorescens group in order to study functional capabilities from a pangenomic perspective. This information will prove useful when choosing Pseudomonas strains for use to promote growth and increase disease resistance in plants.« less
Comparative genome analysis of Pseudomonas genomes including Populus-associated isolates

DOE PAGES

Jun, Se Ran; Wassenaar, Trudy; Nookaew, Intawat; ...

2016-01-01

The Pseudomonas genus contains a metabolically versatile group of organisms that are known to occupy numerous ecological niches including the rhizosphere and endosphere of many plants influencing phylogenetic diversity and heterogeneity. In this study, comparative genome analysis was performed on over one thousand Pseudomonas genomes, including 21 Pseudomonas strains isolated from the roots of native Populus deltoides. Based on average amino acid identity, genomic clusters were identified within the Pseudomonas genus, which showed agreements with clades by NCBI and cliques by IMG. The P. fluorescens group was organized into 20 distinct genomic clusters, representing enormous diversity and heterogeneity. The speciesmore » P. aeruginosa showed clear distinction in their genomic relatedness compared to other Pseudomonas species groups based on the pan and core genome analysis. The 19 isolates of our 21 Populus-associated isolates formed three distinct subgroups within the P. fluorescens major group, supported by pathway profiles analysis, while two isolates were more closely related to P. chlororaphis and P. putida. The specific genes to Populus-associated subgroups were identified where genes specific to subgroup 1 include several sensory systems such as proteins which act in two-component signal transduction, a TonB-dependent receptor, and a phosphorelay sensor; specific genes to subgroup 2 contain unique hypothetical genes; and genes specific to subgroup 3 organisms have a different hydrolase activity. IMPORTANCE The comparative genome analyses of the genus Pseudomonas that included Populus-associated isolates resulted in novel insights into high diversity of Pseudomonas. Consistent and robust genomic clusters with phylogenetic homogeneity were identified, which resolved species-clades that are not clearly defined by 16S rRNA gene sequence analysis alone. The genomic clusters may be reflective of distinct ecological niches to which the organisms have adapted, but this needs to be experimentally characterized with ecologically relevant phenotype properties. This study justifies the need to sequence multiple isolates, especially from P. fluorescens group in order to study functional capabilities from a pangenomic perspective. This information will prove useful when choosing Pseudomonas strains for use to promote growth and increase disease resistance in plants.« less
Integrating genomic selection into dairy cattle breeding programmes: a review.

PubMed

Bouquet, A; Juga, J

2013-05-01

Extensive genetic progress has been achieved in dairy cattle populations on many traits of economic importance because of efficient breeding programmes. Success of these programmes has relied on progeny testing of the best young males to accurately assess their genetic merit and hence their potential for breeding. Over the last few years, the integration of dense genomic information into statistical tools used to make selection decisions, commonly referred to as genomic selection, has enabled gains in predicting accuracy of breeding values for young animals without own performance. The possibility to select animals at an early stage allows defining new breeding strategies aimed at boosting genetic progress while reducing costs. The first objective of this article was to review methods used to model and optimize breeding schemes integrating genomic selection and to discuss their relative advantages and limitations. The second objective was to summarize the main results and perspectives on the use of genomic selection in practical breeding schemes, on the basis of the example of dairy cattle populations. Two main designs of breeding programmes integrating genomic selection were studied in dairy cattle. Genomic selection can be used either for pre-selecting males to be progeny tested or for selecting males to be used as active sires in the population. The first option produces moderate genetic gains without changing the structure of breeding programmes. The second option leads to large genetic gains, up to double those of conventional schemes because of a major reduction in the mean generation interval, but it requires greater changes in breeding programme structure. The literature suggests that genomic selection becomes more attractive when it is coupled with embryo transfer technologies to further increase selection intensity on the dam-to-sire pathway. The use of genomic information also offers new opportunities to improve preservation of genetic variation. However, recent simulation studies have shown that putting constraints on genomic inbreeding rates for defining optimal contributions of breeding animals could significantly reduce achievable genetic gain. Finally, the article summarizes the potential of genomic selection to include new traits in the breeding goal to meet societal demands regarding animal health and environmental efficiency in animal production.
Genome-wide scans between two honeybee populations reveal putative signatures of human-mediated selection.

PubMed

Parejo, M; Wragg, D; Henriques, D; Vignal, A; Neuditschko, M

2017-12-01

Human-mediated selection has left signatures in the genomes of many domesticated animals, including the European dark honeybee, Apis mellifera mellifera, which has been selected by apiculturists for centuries. Using whole-genome sequence information, we investigated selection signatures in spatially separated honeybee subpopulations (Switzerland, n = 39 and France, n = 17). Three different test statistics were calculated in windows of 2 kb (fixation index, cross-population extended haplotype homozygosity and cross-population composite likelihood ratio) and combined into a recently developed composite selection score. Applying a stringent false discovery rate of 0.01, we identified six significant selective sweeps distributed across five chromosomes covering eight genes. These genes are associated with multiple molecular and biological functions, including regulation of transcription, receptor binding and signal transduction. Of particular interest is a selection signature on chromosome 1, which corresponds to the WNT4 gene, the family of which is conserved across the animal kingdom with a variety of functions. In Drosophila melanogaster, WNT4 alleles have been associated with differential wing, cross vein and abdominal phenotypes. Defining phenotypic characteristics of different Apis mellifera ssp., which are typically used as selection criteria, include colour and wing venation pattern. This signal is therefore likely to be a good candidate for human mediated-selection arising from different applied breeding practices in the two managed populations. © 2017 The Authors. Animal Genetics published by John Wiley & Sons Ltd on behalf of Stichting International Foundation for Animal Genetics.
Performance of gout definitions for genetic epidemiological studies: analysis of UK Biobank.

PubMed

Cadzow, Murray; Merriman, Tony R; Dalbeth, Nicola

2017-08-09

Many different combinations of available data have been used to identify gout cases in large genetic studies. The aim of this study was to determine the performance of case definitions of gout using the limited items available in multipurpose cohorts for population-based genetic studies. This research was conducted using the UK Biobank Resource. Data, including genome-wide genotypes, were available for 105,421 European participants aged 40-69 years without kidney disease. Gout definitions and combinations of these definitions were identified from previous epidemiological studies. These definitions were tested for association with 30 urate-associated single-nucleotide polymorphisms (SNPs) by logistic regression, adjusted for age, sex, waist circumference, and ratio of waist circumference to height. Heritability estimates under an additive model were generated using GCTA version 1.26.0 and PLINK version 1.90b3.32 by partitioning the genome. There were 2066 (1.96%) cases defined by self-report of gout, 1652 (1.57%) defined by urate-lowering therapy (ULT) use, 382 (0.36%) defined by hospital diagnosis, 1861 (1.76%) defined by hospital diagnosis or gout-specific medications and 2295 (2.18%) defined by self-report of gout or ULT use. Association with gout at experiment-wide significance (P < 0.0017) was observed for 13 SNPs with gout using the self-report of gout or ULT use definition, 12 SNPs using the self-report of gout definition, 11 SNPs using the hospital diagnosis or gout-specific medication definition, 10 SNPs using ULT use definition and 3 SNPs using hospital diagnosis definition. Heritability estimates ranged from 0.282 to 0.308 for all definitions except hospital diagnosis (0.236). Of the limited items available in multipurpose cohorts, the case definition of self-report of gout or ULT use has high sensitivity and precision for detecting association in genetic epidemiological studies of gout.
Intracellular localization of adeno-associated viral proteins expressed in insect cells.

PubMed

Gallo-Ramírez, Lilí E; Ramírez, Octavio T; Palomares, Laura A

2011-01-01

Production of vectors derived from adeno-associated virus (AAVv) in insect cells represents a feasible option for large-scale applications. However, transducing particles yields obtained in this system are low compared with total capsid yields, suggesting the presence of genome encapsidation bottlenecks. Three components are required for AAVv production: viral capsid proteins (VP), the recombinant AAV genome, and Rep proteins for AAV genome replication and encapsidation. Little is known about the interaction between the three components in insect cells, which have intracellular conditions different to those in mammalian cells. In this work, the localization of AAV proteins in insect cells was assessed for the first time with the purpose of finding potential limiting factors. Unassembled VP were located either in the cytoplasm or in the nucleus. Their transport into the nucleus was dependent on protein concentration. Empty capsids were located in defined subnuclear compartments. Rep proteins expressed individually were efficiently translocated into the nucleus. Their intranuclear distribution was not uniform and differed from VP distribution. While Rep52 distribution and expression levels were not affected by AAV genomes or VP, Rep78 distribution and stability changed during coexpression. Expression of all AAV components modified capsid intranuclear distribution, and assembled VP were found in vesicles located in the nuclear periphery. Such vesicles were related to baculovirus infection, highlighting its role in AAVv production in insect cells. The results obtained in this work suggest that the intracellular distribution of AAV proteins allows their interaction and does not limit vector production in insect cells. Copyright © 2011 American Institute of Chemical Engineers (AIChE).
Population Stratification and Underrepresentation of Indian Subcontinent Genetic Diversity in the 1000 Genomes Project Dataset

PubMed Central

Sengupta, Dhriti; Choudhury, Ananyo; Basu, Analabha; Ramsay, Michèle

2016-01-01

Genomic variation in Indian populations is of great interest due to the diversity of ancestral components, social stratification, endogamy and complex admixture patterns. With an expanding population of 1.2 billion, India is also a treasure trove to catalogue innocuous as well as clinically relevant rare mutations. Recent studies have revealed four dominant ancestries in populations from mainland India: Ancestral North-Indian (ANI), Ancestral South-Indian (ASI), Ancestral Tibeto–Burman (ATB) and Ancestral Austro-Asiatic (AAA). The 1000 Genomes Project (KGP) Phase-3 data include about 500 genomes from five linguistically defined Indian-Subcontinent (IS) populations (Punjabi, Gujrati, Bengali, Telugu and Tamil) some of whom are recent migrants to USA or UK. Comparative analyses show that despite the distinct geographic origins of the KGP-IS populations, the ANI component is predominantly represented in this dataset. Previous studies demonstrated population substructure in the HapMap Gujrati population, and we found evidence for additional substructure in the Punjabi and Telugu populations. These substructured populations have characteristic/significant differences in heterozygosity and inbreeding coefficients. Moreover, we demonstrate that the substructure is better explained by factors like differences in proportion of ancestral components, and endogamy driven social structure rather than invoking a novel ancestral component to explain it. Therefore, using language and/or geography as a proxy for an ethnic unit is inadequate for many of the IS populations. This highlights the necessity for more nuanced sampling strategies or corrective statistical approaches, particularly for biomedical and population genetics research in India. PMID:27797945
Identifying anti-cancer drug response related genes using an integrative analysis of transcriptomic and genomic variations with cell line-based drug perturbations.

PubMed

Sun, Yi; Zhang, Wei; Chen, Yunqin; Ma, Qin; Wei, Jia; Liu, Qi

2016-02-23

Clinical responses to anti-cancer therapies often only benefit a defined subset of patients. Predicting the best treatment strategy hinges on our ability to effectively translate genomic data into actionable information on drug responses. To achieve this goal, we compiled a comprehensive collection of baseline cancer genome data and drug response information derived from a large panel of cancer cell lines. This data set was applied to identify the signature genes relevant to drug sensitivity and their resistance by integrating CNVs and the gene expression of cell lines with in vitro drug responses. We presented an efficient in-silico pipeline for integrating heterogeneous cell line data sources with the simultaneous modeling of drug response values across all the drugs and cell lines. Potential signature genes correlated with drug response (sensitive or resistant) in different cancer types were identified. Using signature genes, our collaborative filtering-based drug response prediction model outperformed the 44 algorithms submitted to the DREAM competition on breast cancer cells. The functions of the identified drug response related signature genes were carefully analyzed at the pathway level and the synthetic lethality level. Furthermore, we validated these signature genes by applying them to the classification of the different subtypes of the TCGA tumor samples, and further uncovered their in vivo implications using clinical patient data. Our work may have promise in translating genomic data into customized marker genes relevant to the response of specific drugs for a specific cancer type of individual patients.
Whole-Genome Resequencing of Experimental Populations Reveals Polygenic Basis of Egg-Size Variation in Drosophila melanogaster.

PubMed

Jha, Aashish R; Miles, Cecelia M; Lippert, Nodia R; Brown, Christopher D; White, Kevin P; Kreitman, Martin

2015-10-01

Complete genome resequencing of populations holds great promise in deconstructing complex polygenic traits to elucidate molecular and developmental mechanisms of adaptation. Egg size is a classic adaptive trait in insects, birds, and other taxa, but its highly polygenic architecture has prevented high-resolution genetic analysis. We used replicated experimental evolution in Drosophila melanogaster and whole-genome sequencing to identify consistent signatures of polygenic egg-size adaptation. A generalized linear-mixed model revealed reproducible allele frequency differences between replicated experimental populations selected for large and small egg volumes at approximately 4,000 single nucleotide polymorphisms (SNPs). Several hundred distinct genomic regions contain clusters of these SNPs and have lower heterozygosity than the genomic background, consistent with selection acting on polymorphisms in these regions. These SNPs are also enriched among genes expressed in Drosophila ovaries and many of these genes have well-defined functions in Drosophila oogenesis. Additional genes regulating egg development, growth, and cell size show evidence of directional selection as genes regulating these biological processes are enriched for highly differentiated SNPs. Genetic crosses performed with a subset of candidate genes demonstrated that these genes influence egg size, at least in the large genetic background. These findings confirm the highly polygenic architecture of this adaptive trait, and suggest the involvement of many novel candidate genes in regulating egg size. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Super-enhancers: Asset management in immune cell genomes.

PubMed

Witte, Steven; O'Shea, John J; Vahedi, Golnaz

2015-09-01

Super-enhancers (SEs) are regions of the genome consisting of clusters of regulatory elements bound with very high amounts of transcription factors, and this architecture appears to be the hallmark of genes and noncoding RNAs linked with cell identity. Recent studies have identified SEs in CD4(+) T cells and have further linked these regions to single nucleotide polymorphisms (SNPs) associated with immune-mediated disorders, pointing to an important role for these structures in the T cell differentiation and function. Here we review the features that define SEs, and discuss their function within the broader understanding of the mechanisms that define immune cell identity and function. We propose that SEs present crucial regulatory hubs, coordinating intrinsic and extrinsic differentiation signals, and argue that delineating these regions will provide important insight into the factors and mechanisms that define immune cell identity. Copyright © 2015 Elsevier Ltd. All rights reserved.
Defining the transcriptome assembly and its use for genome dynamics and transcriptome profiling studies in pigeonpea (Cajanus cajan L.)

USDA-ARS?s Scientific Manuscript database

This study reports generation of large-scale genomic resources for pigeonpea, a so-called ‘orphan crop species’ of the semi-arid tropic regions. Roche FLX/454 sequencing was carried out on a normalized cDNA pool prepared from 31 tissues produced 494,353 short transcript reads (STRs). Cluster analysi...
Genomic mechanisms of stress tolerance for the industrial yeast Saccharomyces cerevisiae against the major chemical classes of inhibitors derived from lignocellulosic biomass conversion

USDA-ARS?s Scientific Manuscript database

Scientists at ARS developed tolerant industrial yeast that is able to reduce major chemical classes of inhibitors into less toxic or none toxic compounds while producing ethanol. Using genomic studies, we defined mechanisms of in situ detoxification involved in novel gene functions, vital cofactor r...

Integrated genome-based studies of Shewanella Ecophysiology

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tiedje, James M.; Konstantinidis, Kostas; Worden, Mark

2014-01-08

The aim of the work reported is to study Shewanella population genomics, and to understand the evolution, ecophysiology, and speciation of Shewanella. The tasks supporting this aim are: to study genetic and ecophysiological bases defining the core and diversification of Shewanella species; to determine gene content patterns along redox gradients; and to Investigate the evolutionary processes, patterns and mechanisms of Shewanella.
Next Generation sequencing of the Trichinella murrelli mitochondrial genome allows comprehensive comparison of its divergence from the principal agent of human trichinellosis, Trichinella spiralis

USDA-ARS?s Scientific Manuscript database

The mitochondrial genome’s non-recombinant mode of inheritance and relatively rapid rate of evolution has promoted its use as a marker for studying the biogeographic history and evolutionary interrelationships among many metazoan species. A modest portion of the mitochondrial genome has been define...
Genomic standards consortium projects.

PubMed

Field, Dawn; Sterk, Peter; Kottmann, Renzo; De Smet, J Wim; Amaral-Zettler, Linda; Cochrane, Guy; Cole, James R; Davies, Neil; Dawyndt, Peter; Garrity, George M; Gilbert, Jack A; Glöckner, Frank Oliver; Hirschman, Lynette; Klenk, Hans-Peter; Knight, Rob; Kyrpides, Nikos; Meyer, Folker; Karsch-Mizrachi, Ilene; Morrison, Norman; Robbins, Robert; San Gil, Inigo; Sansone, Susanna; Schriml, Lynn; Tatusova, Tatiana; Ussery, Dave; Yilmaz, Pelin; White, Owen; Wooley, John; Caporaso, Gregory

2014-06-15

The Genomic Standards Consortium (GSC) is an open-membership community that was founded in 2005 to work towards the development, implementation and harmonization of standards in the field of genomics. Starting with the defined task of establishing a minimal set of descriptions the GSC has evolved into an active standards-setting body that currently has 18 ongoing projects, with additional projects regularly proposed from within and outside the GSC. Here we describe our recently enacted policy for proposing new activities that are intended to be taken on by the GSC, along with the template for proposing such new activities.
Genome Editing in Human Pluripotent Stem Cells.

PubMed

Carlson-Stevermer, Jared; Saha, Krishanu

2017-01-01

Genome editing in human pluripotent stem cells (hPSCs) enables the generation of reporter lines and knockout cell lines. Zinc finger nucleases, transcription activator-like effector nucleases (TALENs), and CRISPR/Cas9 technology have recently increased the efficiency of proper gene editing by creating double strand breaks (DSB) at defined sequences in the human genome. These systems typically use plasmids to transiently transcribe nucleases within the cell. Here, we describe the process for preparing hPSCs for transient expression of nucleases via electroporation and subsequent analysis to create genetically modified stem cell lines.
The challenge of informed consent and return of results in translational genomics: empirical analysis and recommendations.

PubMed

Henderson, Gail E; Wolf, Susan M; Kuczynski, Kristine J; Joffe, Steven; Sharp, Richard R; Parsons, D Williams; Knoppers, Bartha M; Yu, Joon-Ho; Appelbaum, Paul S

2014-01-01

As exome and genome sequencing move into clinical application, questions surround how to elicit consent and handle potential return of individual genomic results. This study analyzes nine consent forms used in NIH-funded sequencing studies. Content analysis reveals considerable heterogeneity, including in defining results that may be returned, identifying potential benefits and risks of return, protecting privacy, addressing placement of results in the medical record, and data-sharing. In response to lack of consensus, we offer recommendations. © 2014 American Society of Law, Medicine & Ethics, Inc.
A Polyphasic and Taxogenomic Evaluation Uncovers Arcobacter cryaerophilus as a Species Complex That Embraces Four Genomovars

PubMed Central

Pérez-Cataluña, Alba; Collado, Luis; Salgado, Oscar; Lefiñanco, Violeta; Figueras, María J.

2018-01-01

The species Arcobacter cryaerophilus is found in many food products of animal origin and is the dominating species in wastewater. In addition, it is associated with cases of farm animal and human infectious diseases,. The species embraces two subgroups i.e., 1A (LMG 24291T = LMG 9904T) and 1B (LMG 10829) that can be differentiated by their 16S rRNA-RFLP pattern. However, some authors, on the basis of the shared intermediate levels of DNA-DNA hybridization, have suggested abandoning the subgroup classification. This contradiction indicates that the taxonomy of this species is not yet resolved. The objective of the present study was to perform a taxonomic evaluation of the diversity of A. cryaerophilus. Genomic information was used along with a Multilocus Phylogenetic Analysis (MLPA) and phenotypic characterization on a group of 52 temporally and geographically dispersed strains, coming from different types of samples and hosts from nine countries. The MLPA analysis showed that those strains formed four clusters (I–IV). Values of Average Nucleotide Identity (ANI) and in silico DNA-DNA Hybridization (isDDH) obtained between 13 genomes representing strains of the four clusters were below the proposed cut-offs of 96 and 70%, respectively, confirming that each of the clusters represented a different genomic species. However, none of the evaluated phenotypic tests enabled their unequivocal differentiation into species. Therefore, the genomic delimited clusters should be considered genomovars of the species A. cryaerophilus. These genomovars could have different clinical importance, since only the cluster I included strains isolated from human specimens. The discovery of at least one stable distinctive phenotypic character would be needed to define each cluster or genomovar as a different species. Until then, we propose naming them “A. cryaerophilus gv. pseudocryaerophilus” (Cluster I = LMG 10229T), “A. cryaerophilus gv. crypticus” (Cluster II = LMG 9065T), “A. cryaerophilus gv. cryaerophilus” (Cluster III = LMG 24291T) and “A. cryaerophilus gv. occultus” (Cluster IV = LMG 29976T).
Extending the Bacillus cereus group genomics to putative food-borne pathogens of different toxicity.

PubMed

Lapidus, Alla; Goltsman, Eugene; Auger, Sandrine; Galleron, Nathalie; Ségurens, Béatrice; Dossat, Carole; Land, Miriam L; Broussolle, Veronique; Brillard, Julien; Guinebretiere, Marie-Helene; Sanchis, Vincent; Nguen-The, Christophe; Lereclus, Didier; Richardson, Paul; Wincker, Patrick; Weissenbach, Jean; Ehrlich, S Dusko; Sorokin, Alexei

2008-01-30

The Bacillus cereus group represents sporulating soil bacteria containing pathogenic strains which may cause diarrheic or emetic food poisoning outbreaks. Multiple locus sequence typing revealed a presence in natural samples of these bacteria of about 30 clonal complexes. Application of genomic methods to this group was however biased due to the major interest for representatives closely related to Bacillus anthracis. Albeit the most important food-borne pathogens were not yet defined, existing data indicate that they are scattered all over the phylogenetic tree. The preliminary analysis of the sequences of three genomes discussed in this paper narrows down the gaps in our knowledge of the B. cereus group. The strain NVH391-98 is a rare but particularly severe food-borne pathogen. Sequencing revealed that the strain should be a representative of a novel bacterial species, for which the name Bacillus cytotoxis or Bacillus cytotoxicus is proposed. This strain has a reduced genome size compared to other B. cereus group strains. Genome analysis revealed absence of sigma B factor and the presence of genes encoding diarrheic Nhe toxin, not detected earlier. The strain B. cereus F837/76 represents a clonal complex close to that of B. anthracis. Including F837/76, three such B. cereus strains had been sequenced. Alignment of genomes suggests that B. anthracis is their common ancestor. Since such strains often emerge from clinical cases, they merit a special attention. The third strain, KBAB4, is a typical facultative psychrophile generally found in soil. Phylogenic studies show that in nature it is the most active group in terms of gene exchange. Genomic sequence revealed high presence of extra-chromosomal genetic material (about 530kb) that may account for this phenomenon. Genes coding Nhe-like toxin were found on a big plasmid in this strain. This may indicate a potential mechanism of toxicity spread from the psychrophile strain community. The results of this genomic work and ecological compartments of different strains incite to consider a necessity of creating prophylactic vaccines against bacteria closely related to NVH391-98 and F837/76. Presumably developing of such vaccines can be based on the properties of non-pathogenic strains such as KBAB4 or ATCC14579 reported here or earlier. By comparing the protein coding genes of strains being sequenced in this project to others we estimate the shared proteome, or core genome, in the B. cereus group to be 3000+/-200 genes and the total proteome, or pan-genome, to be 20-25,000 genes.
Genomic and Functional Approaches to Understanding Cancer Aneuploidy.

PubMed

Taylor, Alison M; Shih, Juliann; Ha, Gavin; Gao, Galen F; Zhang, Xiaoyang; Berger, Ashton C; Schumacher, Steven E; Wang, Chen; Hu, Hai; Liu, Jianfang; Lazar, Alexander J; Cherniack, Andrew D; Beroukhim, Rameen; Meyerson, Matthew

2018-04-09

Aneuploidy, whole chromosome or chromosome arm imbalance, is a near-universal characteristic of human cancers. In 10,522 cancer genomes from The Cancer Genome Atlas, aneuploidy was correlated with TP53 mutation, somatic mutation rate, and expression of proliferation genes. Aneuploidy was anti-correlated with expression of immune signaling genes, due to decreased leukocyte infiltrates in high-aneuploidy samples. Chromosome arm-level alterations show cancer-specific patterns, including loss of chromosome arm 3p in squamous cancers. We applied genome engineering to delete 3p in lung cells, causing decreased proliferation rescued in part by chromosome 3 duplication. This study defines genomic and phenotypic correlates of cancer aneuploidy and provides an experimental approach to study chromosome arm aneuploidy. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.
Genome expansion and gene loss in powdery mildew fungi reveal tradeoffs in extreme parasitism.

PubMed

Spanu, Pietro D; Abbott, James C; Amselem, Joelle; Burgis, Timothy A; Soanes, Darren M; Stüber, Kurt; Ver Loren van Themaat, Emiel; Brown, James K M; Butcher, Sarah A; Gurr, Sarah J; Lebrun, Marc-Henri; Ridout, Christopher J; Schulze-Lefert, Paul; Talbot, Nicholas J; Ahmadinejad, Nahal; Ametz, Christian; Barton, Geraint R; Benjdia, Mariam; Bidzinski, Przemyslaw; Bindschedler, Laurence V; Both, Maike; Brewer, Marin T; Cadle-Davidson, Lance; Cadle-Davidson, Molly M; Collemare, Jerome; Cramer, Rainer; Frenkel, Omer; Godfrey, Dale; Harriman, James; Hoede, Claire; King, Brian C; Klages, Sven; Kleemann, Jochen; Knoll, Daniela; Koti, Prasanna S; Kreplak, Jonathan; López-Ruiz, Francisco J; Lu, Xunli; Maekawa, Takaki; Mahanil, Siraprapa; Micali, Cristina; Milgroom, Michael G; Montana, Giovanni; Noir, Sandra; O'Connell, Richard J; Oberhaensli, Simone; Parlange, Francis; Pedersen, Carsten; Quesneville, Hadi; Reinhardt, Richard; Rott, Matthias; Sacristán, Soledad; Schmidt, Sarah M; Schön, Moritz; Skamnioti, Pari; Sommer, Hans; Stephens, Amber; Takahara, Hiroyuki; Thordal-Christensen, Hans; Vigouroux, Marielle; Wessling, Ralf; Wicker, Thomas; Panstruga, Ralph

2010-12-10

Powdery mildews are phytopathogens whose growth and reproduction are entirely dependent on living plant cells. The molecular basis of this life-style, obligate biotrophy, remains unknown. We present the genome analysis of barley powdery mildew, Blumeria graminis f.sp. hordei (Blumeria), as well as a comparison with the analysis of two powdery mildews pathogenic on dicotyledonous plants. These genomes display massive retrotransposon proliferation, genome-size expansion, and gene losses. The missing genes encode enzymes of primary and secondary metabolism, carbohydrate-active enzymes, and transporters, probably reflecting their redundancy in an exclusively biotrophic life-style. Among the 248 candidate effectors of pathogenesis identified in the Blumeria genome, very few (less than 10) define a core set conserved in all three mildews, suggesting that most effectors represent species-specific adaptations.
Whole genome sequencing: an efficient approach to ensuring food safety

NASA Astrophysics Data System (ADS)

Lakicevic, B.; Nastasijevic, I.; Dimitrijevic, M.

2017-09-01

Whole genome sequencing is an effective, powerful tool that can be applied to a wide range of public health and food safety applications. A major difference between WGS and the traditional typing techniques is that WGS allows all genes to be included in the analysis, instead of a well-defined subset of genes or variable intergenic regions. Also, the use of WGS can facilitate the understanding of contamination/colonization routes of foodborne pathogens within the food production environment, and can also afford efficient tracking of pathogens’ entry routes and distribution from farm-to-consumer. Tracking foodborne pathogens in the food processing-distribution-retail-consumer continuum is of the utmost importance for facilitation of outbreak investigations and rapid action in controlling/preventing foodborne outbreaks. Therefore, WGS likely will replace most of the numerous workflows used in public health laboratories to characterize foodborne pathogens into one consolidated, efficient workflow.
Diet, Genetics, and Disease: A Focus on the Middle East and North Africa Region

PubMed Central

Fahed, Akl C.; El-Hage-Sleiman, Abdul-Karim M.; Farhat, Theresa I.; Nemer, Georges M.

2012-01-01

The Middle East and North Africa (MENA) region suffers a drastic change from a traditional diet to an industrialized diet. This has led to an unparalleled increase in the prevalence of chronic diseases. This review discusses the role of nutritional genomics, or the dietary signature, in these dietary and disease changes in the MENA. The diet-genetics-disease relation is discussed in detail. Selected disease categories in the MENA are discussed starting with a review of their epidemiology in the different MENA countries, followed by an examination of the known genetic factors that have been reported in the disease discussed, whether inside or outside the MENA. Several diet-genetics-disease relationships in the MENA may be contributing to the increased prevalence of civilization disorders of metabolism and micronutrient deficiencies. Future research in the field of nutritional genomics in the MENA is needed to better define these relationships. PMID:22536488
A genomic lifespan program that reorganises the young adult brain is targeted in schizophrenia.

PubMed

Skene, Nathan G; Roy, Marcia; Grant, Seth Gn

2017-09-12

The genetic mechanisms regulating the brain and behaviour across the lifespan are poorly understood. We found that lifespan transcriptome trajectories describe a calendar of gene regulatory events in the brain of humans and mice. Transcriptome trajectories defined a sequence of gene expression changes in neuronal, glial and endothelial cell-types, which enabled prediction of age from tissue samples. A major lifespan landmark was the peak change in trajectories occurring in humans at 26 years and in mice at 5 months of age. This species-conserved peak was delayed in females and marked a reorganization of expression of synaptic and schizophrenia-susceptibility genes. The lifespan calendar predicted the characteristic age of onset in young adults and sex differences in schizophrenia. We propose a genomic program generates a lifespan calendar of gene regulation that times age-dependent molecular organization of the brain and mutations that interrupt the program in young adults cause schizophrenia.
First complete chromosomal organization of a protozoan plant parasite (Phytomonas spp.).

PubMed

Marín, Clotilde; Alberge, Blandine; Dollet, Michel; Pagès, Michel; Bastien, Patrick

2008-01-01

Phytomonas spp. are members of the family Trypanosomatidae that parasitize plants and may cause lethal diseases in crops such as Coffee Phloem necrosis, Hartrot in coconut, and Marchitez sorpresiva in oil palm. In this study, the molecular karyotype of 6 isolates from latex plants has been entirely elucidated by pulsed-field gel electrophoresis and DNA hybridization. Twenty-one chromosomal linkage groups constituting heterologous chromosomes and sizing between 0.3 and 3 Mb could be physically defined by the use of 75 DNA markers (sequence-tagged sites and genes). From these data, the genome size can be estimated at 25.5 (+/-2) Mb. The physical linkage groups were consistently conserved in all strains examined. Moreover, the finding of several pairs of different-sized homologous chromosomes strongly suggest diploidy for this organism. The definition of the complete molecular karyotype of Phytomonas represents an essential primary step toward sequencing the genome of this parasite of economical importance.
A species-specific nucleosomal signature defines a periodic distribution of amino acids in proteins.

PubMed

Quintales, Luis; Soriano, Ignacio; Vázquez, Enrique; Segurado, Mónica; Antequera, Francisco

2015-04-01

Nucleosomes are the basic structural units of chromatin. Most of the yeast genome is organized in a pattern of positioned nucleosomes that is stably maintained under a wide range of physiological conditions. In this work, we have searched for sequence determinants associated with positioned nucleosomes in four species of fission and budding yeasts. We show that mononucleosomal DNA follows a highly structured base composition pattern, which differs among species despite the high degree of histone conservation. These nucleosomal signatures are present in transcribed and non-transcribed regions across the genome. In the case of open reading frames, they correctly predict the relative distribution of codons on mononucleosomal DNA, and they also determine a periodicity in the average distribution of amino acids along the proteins. These results establish a direct and species-specific connection between the position of each codon around the histone octamer and protein composition.
The Sponge Hologenome

PubMed Central

Thomas, Torsten

2016-01-01

ABSTRACT A paradigm shift has recently transformed the field of biological science; molecular advances have revealed how fundamentally important microorganisms are to many aspects of a host’s phenotype and evolution. In the process, an era of “holobiont” research has emerged to investigate the intricate network of interactions between a host and its symbiotic microbial consortia. Marine sponges are early-diverging metazoa known for hosting dense, specific, and often highly diverse microbial communities. Here we synthesize current thoughts about the environmental and evolutionary forces that influence the diversity, specificity, and distribution of microbial symbionts within the sponge holobiont, explore the physiological pathways that contribute to holobiont function, and describe the molecular mechanisms that underpin the establishment and maintenance of these symbiotic partnerships. The collective genomes of the sponge holobiont form the sponge hologenome, and we highlight how the forces that define a sponge’s phenotype in fact act on the genomic interplay between the different components of the holobiont. PMID:27103626
The utility of copy number variation (CNV) in studies of hypertension-related left ventricular hypertrophy (LVH): rationale, potential and challenges.

PubMed

Boonpeng, Hoh; Yusoff, Khalid

2013-03-01

The ultimate goal of human genetics is to understand the role of genome variation in elucidating human traits and diseases. Besides single nucleotide polymorphism (SNP), copy number variation (CNV), defined as gains or losses of a DNA segment larger than 1 kb, has recently emerged as an important tool in understanding heritable source of human genomic differences. It has been shown to contribute to genetic susceptibility of various common and complex diseases. Despite a handful of publications, its role in cardiovascular diseases remains largely unknown. Here, we deliberate on the currently available technologies for CNV detection. The possible utility and the potential roles of CNV in exploring the mechanisms of cardiac remodeling in hypertension will also be addressed. Finally, we discuss the challenges for investigations of CNV in cardiovascular diseases and its possible implications in diagnosis of hypertension-related left ventricular hypertrophy (LVH).
Genome sequence, comparative analysis and haplotype structure of the domestic dog.

PubMed

Lindblad-Toh, Kerstin; Wade, Claire M; Mikkelsen, Tarjei S; Karlsson, Elinor K; Jaffe, David B; Kamal, Michael; Clamp, Michele; Chang, Jean L; Kulbokas, Edward J; Zody, Michael C; Mauceli, Evan; Xie, Xiaohui; Breen, Matthew; Wayne, Robert K; Ostrander, Elaine A; Ponting, Chris P; Galibert, Francis; Smith, Douglas R; DeJong, Pieter J; Kirkness, Ewen; Alvarez, Pablo; Biagi, Tara; Brockman, William; Butler, Jonathan; Chin, Chee-Wye; Cook, April; Cuff, James; Daly, Mark J; DeCaprio, David; Gnerre, Sante; Grabherr, Manfred; Kellis, Manolis; Kleber, Michael; Bardeleben, Carolyne; Goodstadt, Leo; Heger, Andreas; Hitte, Christophe; Kim, Lisa; Koepfli, Klaus-Peter; Parker, Heidi G; Pollinger, John P; Searle, Stephen M J; Sutter, Nathan B; Thomas, Rachael; Webber, Caleb; Baldwin, Jennifer; Abebe, Adal; Abouelleil, Amr; Aftuck, Lynne; Ait-Zahra, Mostafa; Aldredge, Tyler; Allen, Nicole; An, Peter; Anderson, Scott; Antoine, Claudel; Arachchi, Harindra; Aslam, Ali; Ayotte, Laura; Bachantsang, Pasang; Barry, Andrew; Bayul, Tashi; Benamara, Mostafa; Berlin, Aaron; Bessette, Daniel; Blitshteyn, Berta; Bloom, Toby; Blye, Jason; Boguslavskiy, Leonid; Bonnet, Claude; Boukhgalter, Boris; Brown, Adam; Cahill, Patrick; Calixte, Nadia; Camarata, Jody; Cheshatsang, Yama; Chu, Jeffrey; Citroen, Mieke; Collymore, Alville; Cooke, Patrick; Dawoe, Tenzin; Daza, Riza; Decktor, Karin; DeGray, Stuart; Dhargay, Norbu; Dooley, Kimberly; Dooley, Kathleen; Dorje, Passang; Dorjee, Kunsang; Dorris, Lester; Duffey, Noah; Dupes, Alan; Egbiremolen, Osebhajajeme; Elong, Richard; Falk, Jill; Farina, Abderrahim; Faro, Susan; Ferguson, Diallo; Ferreira, Patricia; Fisher, Sheila; FitzGerald, Mike; Foley, Karen; Foley, Chelsea; Franke, Alicia; Friedrich, Dennis; Gage, Diane; Garber, Manuel; Gearin, Gary; Giannoukos, Georgia; Goode, Tina; Goyette, Audra; Graham, Joseph; Grandbois, Edward; Gyaltsen, Kunsang; Hafez, Nabil; Hagopian, Daniel; Hagos, Birhane; Hall, Jennifer; Healy, Claire; Hegarty, Ryan; Honan, Tracey; Horn, Andrea; Houde, Nathan; Hughes, Leanne; Hunnicutt, Leigh; Husby, M; Jester, Benjamin; Jones, Charlien; Kamat, Asha; Kanga, Ben; Kells, Cristyn; Khazanovich, Dmitry; Kieu, Alix Chinh; Kisner, Peter; Kumar, Mayank; Lance, Krista; Landers, Thomas; Lara, Marcia; Lee, William; Leger, Jean-Pierre; Lennon, Niall; Leuper, Lisa; LeVine, Sarah; Liu, Jinlei; Liu, Xiaohong; Lokyitsang, Yeshi; Lokyitsang, Tashi; Lui, Annie; Macdonald, Jan; Major, John; Marabella, Richard; Maru, Kebede; Matthews, Charles; McDonough, Susan; Mehta, Teena; Meldrim, James; Melnikov, Alexandre; Meneus, Louis; Mihalev, Atanas; Mihova, Tanya; Miller, Karen; Mittelman, Rachel; Mlenga, Valentine; Mulrain, Leonidas; Munson, Glen; Navidi, Adam; Naylor, Jerome; Nguyen, Tuyen; Nguyen, Nga; Nguyen, Cindy; Nguyen, Thu; Nicol, Robert; Norbu, Nyima; Norbu, Choe; Novod, Nathaniel; Nyima, Tenchoe; Olandt, Peter; O'Neill, Barry; O'Neill, Keith; Osman, Sahal; Oyono, Lucien; Patti, Christopher; Perrin, Danielle; Phunkhang, Pema; Pierre, Fritz; Priest, Margaret; Rachupka, Anthony; Raghuraman, Sujaa; Rameau, Rayale; Ray, Verneda; Raymond, Christina; Rege, Filip; Rise, Cecil; Rogers, Julie; Rogov, Peter; Sahalie, Julie; Settipalli, Sampath; Sharpe, Theodore; Shea, Terrance; Sheehan, Mechele; Sherpa, Ngawang; Shi, Jianying; Shih, Diana; Sloan, Jessie; Smith, Cherylyn; Sparrow, Todd; Stalker, John; Stange-Thomann, Nicole; Stavropoulos, Sharon; Stone, Catherine; Stone, Sabrina; Sykes, Sean; Tchuinga, Pierre; Tenzing, Pema; Tesfaye, Senait; Thoulutsang, Dawa; Thoulutsang, Yama; Topham, Kerri; Topping, Ira; Tsamla, Tsamla; Vassiliev, Helen; Venkataraman, Vijay; Vo, Andy; Wangchuk, Tsering; Wangdi, Tsering; Weiand, Michael; Wilkinson, Jane; Wilson, Adam; Yadav, Shailendra; Yang, Shuli; Yang, Xiaoping; Young, Geneva; Yu, Qing; Zainoun, Joanne; Zembek, Lisa; Zimmer, Andrew; Lander, Eric S

2005-12-08

Here we report a high-quality draft genome sequence of the domestic dog (Canis familiaris), together with a dense map of single nucleotide polymorphisms (SNPs) across breeds. The dog is of particular interest because it provides important evolutionary information and because existing breeds show great phenotypic diversity for morphological, physiological and behavioural traits. We use sequence comparison with the primate and rodent lineages to shed light on the structure and evolution of genomes and genes. Notably, the majority of the most highly conserved non-coding sequences in mammalian genomes are clustered near a small subset of genes with important roles in development. Analysis of SNPs reveals long-range haplotypes across the entire dog genome, and defines the nature of genetic diversity within and across breeds. The current SNP map now makes it possible for genome-wide association studies to identify genes responsible for diseases and traits, with important consequences for human and companion animal health.
Phylum-wide analysis of genes/proteins related to the last steps of assembly and export of extracellular polymeric substances (EPS) in cyanobacteria

NASA Astrophysics Data System (ADS)

Pereira, Sara B.; Mota, Rita; Vieira, Cristina P.; Vieira, Jorge; Tamagnini, Paula

2015-10-01

Many cyanobacteria produce extracellular polymeric substances (EPS) with particular characteristics (e.g. anionic nature and presence of sulfate) that make them suitable for industrial processes such as bioremediation of heavy metals or thickening, suspending or emulsifying agents. Nevertheless, their biosynthetic pathway(s) are still largely unknown, limiting their utilization. In this work, a phylum-wide analysis of genes/proteins putatively involved in the assembly and export of EPS in cyanobacteria was performed. Our results demonstrated that most strains harbor genes encoding proteins related to the three main pathways: Wzy-, ABC transporter-, and Synthase-dependent, but often not the complete set defining one pathway. Multiple gene copies are mainly correlated to larger genomes, and the strains with reduced genomes (e.g. the clade of marine unicellular Synechococcus and Prochlorococcus), seem to have lost most of the EPS-related genes. Overall, the distribution of the different genes/proteins within the cyanobacteria phylum raises the hypothesis that cyanobacterial EPS production may not strictly follow one of the pathways previously characterized. Moreover, for the proteins involved in EPS polymerization, amino acid patterns were defined and validated constituting a novel and robust tool to identify proteins with similar functions and giving a first insight to which polymer biosynthesis they are related to.
Mutations Affecting Expression of the rosy Locus in Drosophila melanogaster

PubMed Central

Lee, Chong Sung; Curtis, Daniel; McCarron, Margaret; Love, Carol; Gray, Mark; Bender, Welcome; Chovnick, Arthur

1987-01-01

The rosy locus in Drosophila melanogaster codes for the enzyme xanthine dehydrogenase (XDH). Previous studies defined a "control element" near the 5' end of the gene, where variant sites affected the amount of rosy mRNA and protein produced. We have determined the DNA sequence of this region from both genomic and cDNA clones, and from the ry+10 underproducer strain. This variant strain had many sequence differences, so that the site of the regulatory change could not be fixed. A mutagenesis was also undertaken to isolate new regulatory mutations. We induced 376 new mutations with 1-ethyl-1-nitrosourea (ENU) and screened them to isolate those that reduced the amount of XDH protein produced, but did not change the properties of the enzyme. Genetic mapping was used to find mutations located near the 5' end of the gene. DNA from each of seven mutants was cloned and sequenced through the 5' region. Mutant base changes were identified in all seven; they appear to affect splicing and translation of the rosy mRNA. In a related study (T. P. Keith et al. 1987), the genomic and cDNA sequences are extended through the 3' end of the gene; the combined sequences define the processing pattern of the rosy transcript and predict the amino acid sequence of XDH. PMID:3036645
Convergent bacterial microbiotas in the fungal agricultural systems of insects

DOE PAGES

Aylward, Frank O.; Suen, Garret; Biedermann, Peter H. W.; ...

2014-11-18

The ability to cultivate food is an innovation that has produced some of the most successful ecological strategies on the planet. Although most well recognized in humans, where agriculture represents a defining feature of civilization, species of ants, beetles, and termites have also independently evolved symbioses with fungi that they cultivate for food. Despite occurring across divergent insect and fungal lineages, the fungivorous niches of these insects are remarkably similar, indicating convergent evolution toward this successful ecological strategy. Here, we characterize the microbiota of ants, beetles, and termites engaged in nutritional symbioses with fungi to define the bacterial groups associatedmore » with these prominent herbivores and forest pests. Using culture-independent techniques and the in silico reconstruction of 37 composite genomes of dominant community members, we demonstrate that different insect-fungal symbioses that collectively shape ecosystems worldwide have highly similar bacterial microbiotas comprised primarily of the genera Enterobacter, Rahnella, and Pseudomonas. Although these symbioses span three orders of insects and two phyla of fungi, we show that they are associated with bacteria sharing high whole-genome nucleotide identity. Due to the fine-scale correspondence of the bacterial microbiotas of insects engaged in fungal symbioses, our findings indicate that this represents an example of convergence of entire host-microbe complexes.« less

Characterization of class II alpha genes and DLA-D region allelic associations in the dog.

PubMed

Sarmiento, U M; Storb, R F

1988-10-01

Human major histocompatibility complex (HLA) cDNA probes were used to analyze the restriction fragment length polymorphism (RFLP) of the alpha genes of the DLA-D region in dogs. Genomic DNA from peripheral blood leucocytes of 23 unrelated DLA-D homozygous dogs representing nine DLA-D types (defined by mixed leucocyte reaction) was digested with restriction enzymes (BamHI, EcoRI, Hind III, Pvu II, Taq I, Rsa I, Msp I, Pst I and Bgl II), separated by agarose gel electrophoresis and transferred onto Biotrace membrane. The Southern blots were successively hybridized with radiolabelled HLA cDNA probes corresponding to DQ, DP, DZ and DR alpha genes. Clear evidence was obtained for the canine homologues of DQ and DR alpha genes with simple bi- or tri-allelic polymorphism respectively. Evidence for a single, nonpolymorphic DP alpha gene was also obtained. However, the presence of a DZ alpha gene could not be clearly demonstrated in canine genomic DNA. This report extends our previous RFLP analysis documenting polymorphism of DLA class II beta genes in the same panel of homozygous typing cell dogs, and provides the basis for DLA-D genotyping at a population level. This study also characterizes the RFLP-defined preferential allelic associations across the DLA-D region in nine different homozygous typing cell specificities.
Convergent bacterial microbiotas in the fungal agricultural systems of insects

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aylward, Frank O.; Suen, Garret; Biedermann, Peter H. W.

The ability to cultivate food is an innovation that has produced some of the most successful ecological strategies on the planet. Although most well recognized in humans, where agriculture represents a defining feature of civilization, species of ants, beetles, and termites have also independently evolved symbioses with fungi that they cultivate for food. Despite occurring across divergent insect and fungal lineages, the fungivorous niches of these insects are remarkably similar, indicating convergent evolution toward this successful ecological strategy. Here, we characterize the microbiota of ants, beetles, and termites engaged in nutritional symbioses with fungi to define the bacterial groups associatedmore » with these prominent herbivores and forest pests. Using culture-independent techniques and the in silico reconstruction of 37 composite genomes of dominant community members, we demonstrate that different insect-fungal symbioses that collectively shape ecosystems worldwide have highly similar bacterial microbiotas comprised primarily of the genera Enterobacter, Rahnella, and Pseudomonas. Although these symbioses span three orders of insects and two phyla of fungi, we show that they are associated with bacteria sharing high whole-genome nucleotide identity. Due to the fine-scale correspondence of the bacterial microbiotas of insects engaged in fungal symbioses, our findings indicate that this represents an example of convergence of entire host-microbe complexes.« less
FRAGS: estimation of coding sequence substitution rates from fragmentary data

PubMed Central

Swart, Estienne C; Hide, Winston A; Seoighe, Cathal

2004-01-01

Background Rates of substitution in protein-coding sequences can provide important insights into evolutionary processes that are of biomedical and theoretical interest. Increased availability of coding sequence data has enabled researchers to estimate more accurately the coding sequence divergence of pairs of organisms. However the use of different data sources, alignment protocols and methods to estimate substitution rates leads to widely varying estimates of key parameters that define the coding sequence divergence of orthologous genes. Although complete genome sequence data are not available for all organisms, fragmentary sequence data can provide accurate estimates of substitution rates provided that an appropriate and consistent methodology is used and that differences in the estimates obtainable from different data sources are taken into account. Results We have developed FRAGS, an application framework that uses existing, freely available software components to construct in-frame alignments and estimate coding substitution rates from fragmentary sequence data. Coding sequence substitution estimates for human and chimpanzee sequences, generated by FRAGS, reveal that methodological differences can give rise to significantly different estimates of important substitution parameters. The estimated substitution rates were also used to infer upper-bounds on the amount of sequencing error in the datasets that we have analysed. Conclusion We have developed a system that performs robust estimation of substitution rates for orthologous sequences from a pair of organisms. Our system can be used when fragmentary genomic or transcript data is available from one of the organisms and the other is a completely sequenced genome within the Ensembl database. As well as estimating substitution statistics our system enables the user to manage and query alignment and substitution data. PMID:15005802
Next-generation sequencing of the Trichinella murrelli mitochondrial genome allows comprehensive comparison of its divergence from the principal agent of human trichinellosis, Trichinella spiralis.

PubMed

Webb, Kristen M; Rosenthal, Benjamin M

2011-01-01

The mitochondrial genome's non-recombinant mode of inheritance and relatively rapid rate of evolution has promoted its use as a marker for studying the biogeographic history and evolutionary interrelationships among many metazoan species. A modest portion of the mitochondrial genome has been defined for 12 species and genotypes of parasites in the genus Trichinella, but its adequacy in representing the mitochondrial genome as a whole remains unclear, as the complete coding sequence has been characterized only for Trichinella spiralis. Here, we sought to comprehensively describe the extent and nature of divergence between the mitochondrial genomes of T. spiralis (which poses the most appreciable zoonotic risk owing to its capacity to establish persistent infections in domestic pigs) and Trichinella murrelli (which is the most prevalent species in North American wildlife hosts, but which poses relatively little risk to the safety of pork). Next generation sequencing methodologies and scaffold and de novo assembly strategies were employed. The entire protein-coding region was sequenced (13,917 bp), along with a portion of the highly repetitive non-coding region (1524 bp) of the mitochondrial genome of T. murrelli with a combined average read depth of 250 reads. The accuracy of base calling, estimated from coding region sequence was found to exceed 99.3%. Genome content and gene order was not found to be significantly different from that of T. spiralis. An overall inter-species sequence divergence of 9.5% was estimated. Significant variation was identified when the amount of variation between species at each gene is compared to the average amount of variation between species across the coding region. Next generation sequencing is a highly effective means to obtain previously unknown mitochondrial genome sequence. Particular to parasites, the extremely deep coverage achieved through this method allows for the detection of sequence heterogeneity between the multiple individuals that necessarily comprise such templates. Copyright © 2010 Elsevier B.V. All rights reserved.
tDNA insulators and the emerging role of TFIIIC in genome organization

PubMed Central

Van Bortle, Kevin; Corces, Victor G.

2012-01-01

Recent findings provide evidence that tDNAs function as chromatin insulators from yeast to humans. TFIIIC, a transcription factor that interacts with the B-box in tDNAs as well as thousands of ETC sites in the genome, is responsible for insulator function. Though tDNAs are capable of enhancer-blocking and barrier activities for which insulators are defined, new insights into the relationship between insulators and chromatin structure suggest that TFIIIC serves a complex role in genome organization. We review the role of tRNA genes and TFIIIC as chromatin insulators, and highlight recent findings that have broadened our understanding of insulators in genome biology. PMID:22889843
When proteome meets genome: the alpha helix and the beta strand of proteins are eschewed by mRNA splice junctions and may define the minimal indivisible modules of protein architecture

PubMed Central

Barik, Sailen

2008-01-01

The significance of the intron-exon structure of genes is a mystery. As eukaryotic proteins are made up of modular functional domains, each exon was suspected to encode some form of module; however, the definition of a module remained vague. Comparison of pre-mRNA splice junctions with the three-dimensional architecture of its protein product from different eukaryotes revealed that the junctions were far less likely to occur inside the α-helices and β-strands of proteins than within the more flexible linker regions (‘turns’ and ‘loops’) connecting them. The splice junctions were equally distributed in the different types of linkers and throughout the linker sequence, although a slight preference for the central region of the linker was observed. The avoidance of the α-helix and the β-strand by splice junctions suggests the existence of a selection pressure against their disruption, perhaps underscoring the investment made by nature in building these intricate secondary structures. A corollary is that the helix and the strand are the smallest integral architectural units of a protein and represent the minimal modules in the evolution of protein structure. These results should find use in comparative genomics, designing of cloning strategies, and in the mutual verification of genome sequences with protein structures. PMID:15381847
When proteome meets genome: the alpha helix and the beta strand of proteins are eschewed by mRNA splice junctions and may define the minimal indivisible modules of protein architecture.

PubMed

Barik, Sailen

2004-09-01

The significance of the intron-exon structure of genes is a mystery. As eukaryotic proteins are made up of modular functional domains, each exon was suspected to encode some form of module; however, the definition of a module remained vague. Comparison of pre-mRNA splice junctions with the three-dimensional architecture of its protein product from different eukaryotes revealed that the junctions were far less likely to occur inside the alpha-helices and beta-strands of proteins than within the more flexible linker regions ('turns' and 'loops') connecting them. The splice junctions were equally distributed in the different types of linkers and throughout the linker sequence, although a slight preference for the central region of the linker was observed. The avoidance of the alpha-helix and the beta-strand by splice junctions suggests the existence of a selection pressure against their disruption, perhaps underscoring the investment made by nature in building these intricate secondary structures. A corollary is that the helix and the strand are the smallest integral architectural units of a protein and represent the minimal modules in the evolution of protein structure. These results should find use in comparative genomics, designing of cloning strategies, and in the mutual verification of genome sequences with protein structures.
Nature's combinatorial biosynthesis and recently engineered production of nucleoside antibiotics in Streptomyces.

PubMed

Chen, Shawn; Kinney, William A; Van Lanen, Steven

2017-04-01

Modified nucleosides produced by Streptomyces and related actinomycetes are widely used in agriculture and medicine as antibacterial, antifungal, anticancer and antiviral agents. These specialized small-molecule metabolites are biosynthesized by complex enzymatic machineries encoded within gene clusters in the genome. The past decade has witnessed a burst of reports defining the key metabolic processes involved in the biosynthesis of several distinct families of nucleoside antibiotics. Furthermore, genome sequencing of various Streptomyces species has dramatically increased over recent years. Potential biosynthetic gene clusters for novel nucleoside antibiotics are now apparent by analysis of these genomes. Here we revisit strategies for production improvement of nucleoside antibiotics that have defined mechanisms of action, and are in clinical or agricultural use. We summarize the progress for genetically manipulating biosynthetic pathways for structural diversification of nucleoside antibiotics. Microorganism-based biosynthetic examples are provided and organized under genetic principles and metabolic engineering guidelines. We show perspectives on the future of combinatorial biosynthesis, and present a working model for discovery of novel nucleoside natural products in Streptomyces.
Signaling pathway deregulation and molecular alterations across pediatric medulloblastomas.

PubMed

Lhermitte, B; Blandin, A F; Coca, A; Guerin, E; Durand, A; Entz-Werlé, N

2018-05-15

Medulloblastomas (MBs) account for 15% of brain tumors in children under the age of 15. To date, the overall 5-year survival rate for all children is only around 60%. Recent advances in cancer genomics have led to a fundamental change in medulloblastoma classification and is evolving along with the genomic discoveries, allowing to regularly reclassify this disease. The previous molecular classification defined 4 groups (WNT-activated MB, SHH-activated MB and the groups 3 and 4 characterized partially by NMYC and MYC driven MBs). This stratification moved forward recently to better define these groups and their correlation to outcome. This new stratification into 7 novel subgroups was helpful to lay foundations and complementary data on the understanding regarding molecular pathways and gene mutations underlying medulloblastoma biology. This review was aimed at answering the recent key questions on MB genomics and go further in the relevance of those genes in MB development as well as in their targeted therapies. Copyright © 2018 Elsevier Masson SAS. All rights reserved.
The modest beginnings of one genome project.

PubMed

Kaback, David B

2013-06-01

One of the top things on a geneticist's wish list has to be a set of mutants for every gene in their particular organism. Such a set was produced for the yeast, Saccharomyces cerevisiae near the end of the 20th century by a consortium of yeast geneticists. However, the functional genomic analysis of one chromosome, its smallest, had already begun more than 25 years earlier as a project that was designed to define most or all of that chromosome's essential genes by temperature-sensitive lethal mutations. When far fewer than expected genes were uncovered, the relatively new field of molecular cloning enabled us and indeed, the entire community of yeast researchers to approach this problem more definitively. These studies ultimately led to cloning, genomic sequencing, and the production and phenotypic analysis of the entire set of knockout mutations for this model organism as well as a better concept of what defines an essential function, a wish fulfilled that enables this model eukaryote to continue at the forefront of research in modern biology.
Sequencing of the sea lamprey (Petromyzon marinus) genome provides insights into vertebrate evolution

PubMed Central

Smith, Jeramiah J; Kuraku, Shigehiro; Holt, Carson; Sauka-Spengler, Tatjana; Jiang, Ning; Campbell, Michael S; Yandell, Mark D; Manousaki, Tereza; Meyer, Axel; Bloom, Ona E; Morgan, Jennifer R; Buxbaum, Joseph D; Sachidanandam, Ravi; Sims, Carrie; Garruss, Alexander S; Cook, Malcolm; Krumlauf, Robb; Wiedemann, Leanne M; Sower, Stacia A; Decatur, Wayne A; Hall, Jeffrey A; Amemiya, Chris T; Saha, Nil R; Buckley, Katherine M; Rast, Jonathan P; Das, Sabyasachi; Hirano, Masayuki; McCurley, Nathanael; Guo, Peng; Rohner, Nicolas; Tabin, Clifford J; Piccinelli, Paul; Elgar, Greg; Ruffier, Magali; Aken, Bronwen L; Searle, Stephen MJ; Muffato, Matthieu; Pignatelli, Miguel; Herrero, Javier; Jones, Matthew; Brown, C Titus; Chung-Davidson, Yu-Wen; Nanlohy, Kaben G; Libants, Scot V; Yeh, Chu-Yin; McCauley, David W; Langeland, James A; Pancer, Zeev; Fritzsch, Bernd; de Jong, Pieter J; Zhu, Baoli; Fulton, Lucinda L; Theising, Brenda; Flicek, Paul; Bronner, Marianne E; Warren, Wesley C; Clifton, Sandra W; Wilson, Richard K; Li, Weiming

2013-01-01

Lampreys are representatives of an ancient vertebrate lineage that diverged from our own ~500 million years ago. By virtue of this deeply shared ancestry, the sea lamprey (P. marinus) genome is uniquely poised to provide insight into the ancestry of vertebrate genomes and the underlying principles of vertebrate biology. Here, we present the first lamprey whole-genome sequence and assembly. We note challenges faced owing to its high content of repetitive elements and GC bases, as well as the absence of broad-scale sequence information from closely related species. Analyses of the assembly indicate that two whole-genome duplications likely occurred before the divergence of ancestral lamprey and gnathostome lineages. Moreover, the results help define key evolutionary events within vertebrate lineages, including the origin of myelin-associated proteins and the development of appendages. The lamprey genome provides an important resource for reconstructing vertebrate origins and the evolutionary events that have shaped the genomes of extant organisms. PMID:23435085
Genomic Definition of Hypervirulent and Multidrug-Resistant Klebsiella pneumoniae Clonal Groups

PubMed Central

Bialek-Davenet, Suzanne; Criscuolo, Alexis; Ailloud, Florent; Passet, Virginie; Jones, Louis; Delannoy-Vieillard, Anne-Sophie; Garin, Benoit; Le Hello, Simon; Arlet, Guillaume; Nicolas-Chanoine, Marie-Hélène; Decré, Dominique

2014-01-01

Multidrug-resistant and highly virulent Klebsiella pneumoniae isolates are emerging, but the clonal groups (CGs) corresponding to these high-risk strains have remained imprecisely defined. We aimed to identify K. pneumoniae CGs on the basis of genome-wide sequence variation and to provide a simple bioinformatics tool to extract virulence and resistance gene data from genomic data. We sequenced 48 K. pneumoniae isolates, mostly of serotypes K1 and K2, and compared the genomes with 119 publicly available genomes. A total of 694 highly conserved genes were included in a core-genome multilocus sequence typing scheme, and cluster analysis of the data enabled precise definition of globally distributed hypervirulent and multidrug-resistant CGs. In addition, we created a freely accessible database, BIGSdb-Kp, to enable rapid extraction of medically and epidemiologically relevant information from genomic sequences of K. pneumoniae. Although drug-resistant and virulent K. pneumoniae populations were largely nonoverlapping, isolates with combined virulence and resistance features were detected. PMID:25341126
The Cancer Cell Map Initiative: Defining the Hallmark Networks of Cancer

PubMed Central

Krogan, Nevan J.; Lippman, Scott; Agard, David A.; Ashworth, Alan; Ideker, Trey

2017-01-01

Progress in DNA sequencing has revealed the startling complexity of cancer genomes, which typically carry thousands of somatic mutations. However, it remains unclear which are the key driver mutations or dependencies in a given cancer and how these influence pathogenesis and response to therapy. Although tumors of similar types and clinical outcomes can have patterns of mutations that are strikingly different, it is becoming apparent that these mutations recurrently hijack the same hallmark molecular pathways and networks. For this reason, it is likely that successful interpretation of cancer genomes will require comprehensive knowledge of the molecular networks under selective pressure in oncogenesis. Here we announce the creation of a new effort, called The Cancer Cell Map Initiative (CCMI), aimed at systematically detailing these complex interactions among cancer genes and how they differ between diseased and healthy states. We discuss recent progress that enables creation of these Cancer Cell Maps across a range of tumor types and how they can be used to target networks disrupted in individual patients, significantly accelerating the development of precision medicine. PMID:26000852
The cancer cell map initiative: defining the hallmark networks of cancer.

PubMed

Krogan, Nevan J; Lippman, Scott; Agard, David A; Ashworth, Alan; Ideker, Trey

2015-05-21

Progress in DNA sequencing has revealed the startling complexity of cancer genomes, which typically carry thousands of somatic mutations. However, it remains unclear which are the key driver mutations or dependencies in a given cancer and how these influence pathogenesis and response to therapy. Although tumors of similar types and clinical outcomes can have patterns of mutations that are strikingly different, it is becoming apparent that these mutations recurrently hijack the same hallmark molecular pathways and networks. For this reason, it is likely that successful interpretation of cancer genomes will require comprehensive knowledge of the molecular networks under selective pressure in oncogenesis. Here we announce the creation of a new effort, The Cancer Cell Map Initiative (CCMI), aimed at systematically detailing these complex interactions among cancer genes and how they differ between diseased and healthy states. We discuss recent progress that enables creation of these cancer cell maps across a range of tumor types and how they can be used to target networks disrupted in individual patients, significantly accelerating the development of precision medicine. Copyright © 2015 Elsevier Inc. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Jusakul, Apinya; Cutcutache, Ioana; Yong, Chern Han

Cholangiocarcinoma (CCA) is a hepatobiliary malignancy exhibiting high incidence in countries with endemic liver-fluke infection. We analysed 489 CCAs from 10 countries, combining whole-genome (71 cases), targeted/exome, copy-number, gene expression, and DNA methylation information. Integrative clustering defined four CCA clusters - Fluke- Positive CCAs (Clusters 1/2) are enriched in ERBB2 amplifications and TP53 mutations, conversely Fluke-Negative CCAs (Clusters 3/4) exhibit high copy-number alterations and PD-1/PD-L2 expression, or epigenetic mutations (IDH1/2, BAP1) and FGFR/PRKA-related gene rearrangements. Whole-genome analysis highlighted FGFR2 3’UTR deletion as a mechanism of FGFR2 upregulation. Integration of non-coding promoter mutations with protein-DNA binding profiles demonstrates pervasive modulation ofmore » H3K27me3-associated sites in CCA. Clusters 1 and 4 exhibit distinct DNA hypermethylation patterns targeting either CpG islands or shores - mutation signature and subclonality analysis suggests that these reflect different mutational pathways. Lastly, our results exemplify how genetics, epigenetics and environmental carcinogens can interplay across different geographies to generate distinct molecular subtypes of cancer.« less
Recurrent Rearrangements of Human Amylase Genes Create Multiple Independent CNV Series.

PubMed

Shwan, Nzar A A; Louzada, Sandra; Yang, Fengtang; Armour, John A L

2017-05-01

The human amylase gene cluster includes the human salivary (AMY1) and pancreatic amylase genes (AMY2A and AMY2B), and is a highly variable and dynamic region of the genome. Copy number variation (CNV) of AMY1 has been implicated in human dietary adaptation, and in population association with obesity, but neither of these findings has been independently replicated. Despite these functional implications, the structural genomic basis of CNV has only been defined in detail very recently. In this work, we use high-resolution analysis of copy number, and analysis of segregation in trios, to define new, independent allelic series of amylase CNVs in sub-Saharan Africans, including a series of higher-order expansions of a unit consisting of one copy each of AMY1, AMY2A, and AMY2B. We use fiber-FISH (fluorescence in situ hybridization) to define unexpected complexity in the accompanying rearrangements. These findings demonstrate recurrent involvement of the amylase gene region in genomic instability, involving at least five independent rearrangements of the pancreatic amylase genes (AMY2A and AMY2B). Structural features shared by fundamentally distinct lineages strongly suggest that the common ancestral state for the human amylase cluster contained more than one, and probably three, copies of AMY1. © 2017 WILEY PERIODICALS, INC.
PROMISE: a tool to identify genomic features with a specific biologically interesting pattern of associations with multiple endpoint variables.

PubMed

Pounds, Stan; Cheng, Cheng; Cao, Xueyuan; Crews, Kristine R; Plunkett, William; Gandhi, Varsha; Rubnitz, Jeffrey; Ribeiro, Raul C; Downing, James R; Lamba, Jatinder

2009-08-15

In some applications, prior biological knowledge can be used to define a specific pattern of association of multiple endpoint variables with a genomic variable that is biologically most interesting. However, to our knowledge, there is no statistical procedure designed to detect specific patterns of association with multiple endpoint variables. Projection onto the most interesting statistical evidence (PROMISE) is proposed as a general procedure to identify genomic variables that exhibit a specific biologically interesting pattern of association with multiple endpoint variables. Biological knowledge of the endpoint variables is used to define a vector that represents the biologically most interesting values for statistics that characterize the associations of the endpoint variables with a genomic variable. A test statistic is defined as the dot-product of the vector of the observed association statistics and the vector of the most interesting values of the association statistics. By definition, this test statistic is proportional to the length of the projection of the observed vector of correlations onto the vector of most interesting associations. Statistical significance is determined via permutation. In simulation studies and an example application, PROMISE shows greater statistical power to identify genes with the interesting pattern of associations than classical multivariate procedures, individual endpoint analyses or listing genes that have the pattern of interest and are significant in more than one individual endpoint analysis. Documented R routines are freely available from www.stjuderesearch.org/depts/biostats and will soon be available as a Bioconductor package from www.bioconductor.org.
SU-F-R-01: Preclinical Radioimmunogenomics Study to Design Personalized Radiotherapy

DOE Office of Scientific and Technical Information (OSTI.GOV)

Abdollahi, H

2016-06-15

Purpose: Radiogenomics is an active area of research to find clinical correlation between genomics and radiotherapy outcomes. In this era, many different biological issues should be taken into account. In this study we aimed to introduce “Radioimmunogenomics” as a new approach to study immunogetics issue regard to radiotherapy induced clinical manifestations. Methods: We studied different immunological pathways and signaling molecules which underling radiation response of normal and malignant tissues. In the other hand, we found many genes and proteins are responsible to radiation effects on biological tissues. We defined a theoretical framework to correlate these genes with radiotherapy outcomes asmore » TCP and NTCP biological dose tools. Results: Our theoretical results showed, high-throughput immunogenomics biomarkers can be correlated with radiotherapy outcomes. Genes regarding to inflammation, apoptosis, repair molecules and many other immunological markers can be defined as radioimmune markers to predict radiotherapy response. Conclusion: Radioimmunogenomics can be used as a new personalized radiotherapy research area to enhance treatment outcome as well as quality of life.« less
Cultural differences define diagnosis and genomic medicine practice: implications for undiagnosed diseases program in China.

PubMed

Duan, Xiaohong; Markello, Thomas; Adams, David; Toro, Camilo; Tifft, Cynthia; Gahl, William A; Boerkoel, Cornelius F

2013-09-01

Despite the current acceleration and increasing leadership of Chinese genetics research, genetics and its clinical application have largely been imported to China from the Occident. Neither genetics nor the scientific reductionism underpinning its clinical application is integral to the traditional Chinese worldview. Given that disease concepts and their incumbent diagnoses are historically derived and culturally meaningful, we hypothesize that the cultural expectations of genetic diagnoses and medical genetics practice differ between the Occident and China. Specifically, we suggest that an undiagnosed diseases program in China will differ from the recently established Undiagnosed Diseases Program at the United States National Institutes of Health; a culturally sensitive concept will integrate traditional Chinese understanding of disease with the scientific reductionism of Occidental medicine.
Genomic Locus Modulating IOP in the BXD RI Mouse Strains

PubMed Central

King, Rebecca; Li, Ying; Wang, Jiaxing; Struebing, Felix L.; Geisert, Eldon E.

2018-01-01

Intraocular pressure (IOP) is the primary risk factor for developing glaucoma, yet little is known about the contribution of genomic background to IOP regulation. The present study leverages an array of systems genetics tools to study genomic factors modulating normal IOP in the mouse. The BXD recombinant inbred (RI) strain set was used to identify genomic loci modulating IOP. We measured the IOP in a total of 506 eyes from 38 different strains. Strain averages were subjected to conventional quantitative trait analysis by means of composite interval mapping. Candidate genes were defined, and immunohistochemistry and quantitative PCR (qPCR) were used for validation. Of the 38 BXD strains examined the mean IOP ranged from a low of 13.2mmHg to a high of 17.1mmHg. The means for each strain were used to calculate a genome wide interval map. One significant quantitative trait locus (QTL) was found on Chr.8 (96 to 103 Mb). Within this 7 Mb region only 4 annotated genes were found: Gm15679, Cdh8, Cdh11 and Gm8730. Only two genes (Cdh8 and Cdh11) were candidates for modulating IOP based on the presence of non-synonymous SNPs. Further examination using SIFT (Sorting Intolerant From Tolerant) analysis revealed that the SNPs in Cdh8 (Cadherin 8) were predicted to not change protein function; while the SNPs in Cdh11 (Cadherin 11) would not be tolerated, affecting protein function. Furthermore, immunohistochemistry demonstrated that CDH11 is expressed in the trabecular meshwork of the mouse. We have examined the genomic regulation of IOP in the BXD RI strain set and found one significant QTL on Chr. 8. Within this QTL, there is one good candidate gene, Cdh11. PMID:29496776

Genomic landscapes of endogenous retroviruses unveil intricate genetics of conventional and genetically-engineered laboratory mouse strains.

PubMed

Lee, Kang-Hoon; Lim, Debora; Chiu, Sophia; Greenhalgh, David; Cho, Kiho

2016-04-01

Laboratory strains of mice, both conventional and genetically engineered, have been introduced as critical components of a broad range of studies investigating normal and disease biology. Currently, the genetic identity of laboratory mice is primarily confirmed by surveying polymorphisms in selected sets of "conventional" genes and/or microsatellites in the absence of a single completely sequenced mouse genome. First, we examined variations in the genomic landscapes of transposable repetitive elements, named the TREome, in conventional and genetically engineered mouse strains using murine leukemia virus-type endogenous retroviruses (MLV-ERVs) as a probe. A survey of the genomes from 56 conventional strains revealed strain-specific TREome landscapes, and certain families (e.g., C57BL) of strains were discernible with defined patterns. Interestingly, the TREome landscapes of C3H/HeJ (toll-like receptor-4 [TLR4] mutant) inbred mice were different from its control C3H/HeOuJ (TLR4 wild-type) strain. In addition, a CD14 knock-out strain had a distinct TREome landscape compared to its control/backcross C57BL/6J strain. Second, an examination of superantigen (SAg, a "TREome gene") coding sequences of mouse mammary tumor virus-type ERVs in the genomes of the 46 conventional strains revealed a high diversity, suggesting a potential role of SAgs in strain-specific immune phenotypes. The findings from this study indicate that unexplored and intricate genomic variations exist in laboratory mouse strains, both conventional and genetically engineered. The TREome-based high-resolution genetics surveillance system for laboratory mice would contribute to efficient study design with quality control and accurate data interpretation. This genetics system can be easily adapted to other species ranging from plants to humans. Copyright © 2016 Elsevier Inc. All rights reserved.
Characterization of novel RS1 exonic deletions in juvenile X-linked retinoschisis

PubMed Central

D’Souza, Leera; Cukras, Catherine; Antolik, Christian; Craig, Candice; He, Hong; Li, Shibo; Hejtmancik, James F.; Sieving, Paul A.; Wang, Xinjing

2013-01-01

Purpose X-linked juvenile retinoschisis (XLRS) is a vitreoretinal dystrophy characterized by schisis (splitting) of the inner layers of the neuroretina. Mutations within the retinoschisis (RS1) gene are responsible for this disease. The mutation spectrum consists of amino acid substitutions, splice site variations, small indels, and larger genomic deletions. Clinically, genomic deletions are rarely reported. Here, we characterize two novel full exonic deletions: one encompassing exon 1 and the other spanning exons 4–5 of the RS1 gene. We also report the clinical findings in these patients with XLRS with two different exonic deletions. Methods Unrelated XLRS men and boys and their mothers (if available) were enrolled for molecular genetics evaluation. The patients also underwent ophthalmologic examination and in some cases electroretinogram (ERG) recording. All the exons and the flanking intronic regions of the RS1 gene were analyzed with direct sequencing. Two patients with exonic deletions were further evaluated with array comparative genomic hybridization to define the scope of the genomic aberrations. After the deleted genomic region was identified, primer walking followed by direct sequencing was used to determine the exact breakpoints. Results Two novel exonic deletions of the RS1 gene were identified: one including exon 1 and the other spanning exons 4 and 5. The exon 1 deletion extends from the 5′ region of the RS1 gene (including the promoter) through intron 1 (c.(−35)-1723_c.51+2664del4472). The exon 4–5 deletion spans introns 3 to intron 5 (c.185–1020_c.522+1844del5764). Conclusions Here we report two novel exonic deletions within the RS1 gene locus. We have also described the clinical presentations and hypothesized the genomic mechanisms underlying these schisis phenotypes. PMID:24227916
Characterization of novel RS1 exonic deletions in juvenile X-linked retinoschisis.

PubMed

D'Souza, Leera; Cukras, Catherine; Antolik, Christian; Craig, Candice; Lee, Ji-Yun; He, Hong; Li, Shibo; Smaoui, Nizar; Hejtmancik, James F; Sieving, Paul A; Wang, Xinjing

2013-01-01

X-linked juvenile retinoschisis (XLRS) is a vitreoretinal dystrophy characterized by schisis (splitting) of the inner layers of the neuroretina. Mutations within the retinoschisis (RS1) gene are responsible for this disease. The mutation spectrum consists of amino acid substitutions, splice site variations, small indels, and larger genomic deletions. Clinically, genomic deletions are rarely reported. Here, we characterize two novel full exonic deletions: one encompassing exon 1 and the other spanning exons 4-5 of the RS1 gene. We also report the clinical findings in these patients with XLRS with two different exonic deletions. Unrelated XLRS men and boys and their mothers (if available) were enrolled for molecular genetics evaluation. The patients also underwent ophthalmologic examination and in some cases electroretinogram (ERG) recording. All the exons and the flanking intronic regions of the RS1 gene were analyzed with direct sequencing. Two patients with exonic deletions were further evaluated with array comparative genomic hybridization to define the scope of the genomic aberrations. After the deleted genomic region was identified, primer walking followed by direct sequencing was used to determine the exact breakpoints. Two novel exonic deletions of the RS1 gene were identified: one including exon 1 and the other spanning exons 4 and 5. The exon 1 deletion extends from the 5' region of the RS1 gene (including the promoter) through intron 1 (c.(-35)-1723_c.51+2664del4472). The exon 4-5 deletion spans introns 3 to intron 5 (c.185-1020_c.522+1844del5764). Here we report two novel exonic deletions within the RS1 gene locus. We have also described the clinical presentations and hypothesized the genomic mechanisms underlying these schisis phenotypes.
Draft Genome Sequence of Escherichia coli MS499, Isolated from the Infected Uterus of a Postpartum Cow with Metritis

PubMed Central

Goldstone, Robert J.; Talbot, Richard; Schuberth, Hans-Joachim; Sandra, Olivier; Sheldon, I. Martin

2014-01-01

Specific Escherichia coli strains associated with bovine postpartum uterine infection have recently been described. Many recognized virulence factors are absent in these strains; therefore, to define a prototypic strain, we report here the genome sequence of E. coli isolate MS499 from a cow with the postpartum disease metritis. PMID:24994791
l-Alanine Auxotrophy of Lactobacillus johnsonii as Demonstrated by Physiological, Genomic, and Gene Complementation Approaches

PubMed Central

van der Kaaij, Hengameh; Desiere, Frank; Mollet, Beat; Germond, Jacques-Edouard

2004-01-01

Using a chemically defined medium without l-alanine, Lactobacillus johnsonii was demonstrated to be strictly auxotrophic for that amino acid. A comparative genetic analysis showed that all known genes involved in l-alanine biosynthesis are absent from the genome of L. johnsonii. This auxotrophy was complemented by heterologous expression of the Bacillus subtilis l-alanine dehydrogenase. PMID:15006820
Text mining in livestock animal science: introducing the potential of text mining to animal sciences.

PubMed

Sahadevan, S; Hofmann-Apitius, M; Schellander, K; Tesfaye, D; Fluck, J; Friedrich, C M

2012-10-01

In biological research, establishing the prior art by searching and collecting information already present in the domain has equal importance as the experiments done. To obtain a complete overview about the relevant knowledge, researchers mainly rely on 2 major information sources: i) various biological databases and ii) scientific publications in the field. The major difference between the 2 information sources is that information from databases is available, typically well structured and condensed. The information content in scientific literature is vastly unstructured; that is, dispersed among the many different sections of scientific text. The traditional method of information extraction from scientific literature occurs by generating a list of relevant publications in the field of interest and manually scanning these texts for relevant information, which is very time consuming. It is more than likely that in using this "classical" approach the researcher misses some relevant information mentioned in the literature or has to go through biological databases to extract further information. Text mining and named entity recognition methods have already been used in human genomics and related fields as a solution to this problem. These methods can process and extract information from large volumes of scientific text. Text mining is defined as the automatic extraction of previously unknown and potentially useful information from text. Named entity recognition (NER) is defined as the method of identifying named entities (names of real world objects; for example, gene/protein names, drugs, enzymes) in text. In animal sciences, text mining and related methods have been briefly used in murine genomics and associated fields, leaving behind other fields of animal sciences, such as livestock genomics. The aim of this work was to develop an information retrieval platform in the livestock domain focusing on livestock publications and the recognition of relevant data from cattle and pigs. For this purpose, the rather noncomprehensive resources of pig and cattle gene and protein terminologies were enriched with orthologue synonyms, integrated in the NER platform, ProMiner, which is successfully used in human genomics domain. Based on the performance tests done, the present system achieved a fair performance with precision 0.64, recall 0.74, and F(1) measure of 0.69 in a test scenario based on cattle literature.
Population Genomics of Fungal and Oomycete Pathogens.

PubMed

Grünwald, Niklaus J; McDonald, Bruce A; Milgroom, Michael G

2016-08-04

We are entering a new era in plant pathology in which whole-genome sequences of many individuals of a pathogen species are becoming readily available. Population genomics aims to discover genetic mechanisms underlying phenotypes associated with adaptive traits such as pathogenicity, virulence, fungicide resistance, and host specialization, as genome sequences or large numbers of single nucleotide polymorphisms become readily available from multiple individuals of the same species. This emerging field encompasses detailed genetic analyses of natural populations, comparative genomic analyses of closely related species, identification of genes under selection, and linkage analyses involving association studies in natural populations or segregating populations resulting from crosses. The era of pathogen population genomics will provide new opportunities and challenges, requiring new computational and analytical tools. This review focuses on conceptual and methodological issues as well as the approaches to answering questions in population genomics. The major steps start with defining relevant biological and evolutionary questions, followed by sampling, genotyping, and phenotyping, and ending in analytical methods and interpretations. We provide examples of recent applications of population genomics to fungal and oomycete plant pathogens.
Hierarchical Scaffolding With Bambus

PubMed Central

Pop, Mihai; Kosack, Daniel S.; Salzberg, Steven L.

2004-01-01

The output of a genome assembler generally comprises a collection of contiguous DNA sequences (contigs) whose relative placement along the genome is not defined. A procedure called scaffolding is commonly used to order and orient these contigs using paired read information. This ordering of contigs is an essential step when finishing and analyzing the data from a whole-genome shotgun project. Most recent assemblers include a scaffolding module; however, users have little control over the scaffolding algorithm or the information produced. We thus developed a general-purpose scaffolder, called Bambus, which affords users significant flexibility in controlling the scaffolding parameters. Bambus was used recently to scaffold the low-coverage draft dog genome data. Most significantly, Bambus enables the use of linking data other than that inferred from mate-pair information. For example, the sequence of a completed genome can be used to guide the scaffolding of a related organism. We present several applications of Bambus: support for finishing, comparative genomics, analysis of the haplotype structure of genomes, and scaffolding of a mammalian genome at low coverage. Bambus is available as an open-source package from our Web site. PMID:14707177
Hierarchical scaffolding with Bambus.

PubMed

Pop, Mihai; Kosack, Daniel S; Salzberg, Steven L

2004-01-01

The output of a genome assembler generally comprises a collection of contiguous DNA sequences (contigs) whose relative placement along the genome is not defined. A procedure called scaffolding is commonly used to order and orient these contigs using paired read information. This ordering of contigs is an essential step when finishing and analyzing the data from a whole-genome shotgun project. Most recent assemblers include a scaffolding module; however, users have little control over the scaffolding algorithm or the information produced. We thus developed a general-purpose scaffolder, called Bambus, which affords users significant flexibility in controlling the scaffolding parameters. Bambus was used recently to scaffold the low-coverage draft dog genome data. Most significantly, Bambus enables the use of linking data other than that inferred from mate-pair information. For example, the sequence of a completed genome can be used to guide the scaffolding of a related organism. We present several applications of Bambus: support for finishing, comparative genomics, analysis of the haplotype structure of genomes, and scaffolding of a mammalian genome at low coverage. Bambus is available as an open-source package from our Web site.
Ultrafast Comparison of Personal Genomes via Precomputed Genome Fingerprints.

PubMed

Glusman, Gustavo; Mauldin, Denise E; Hood, Leroy E; Robinson, Max

2017-01-01

We present an ultrafast method for comparing personal genomes. We transform the standard genome representation (lists of variants relative to a reference) into "genome fingerprints" via locality sensitive hashing. The resulting genome fingerprints can be meaningfully compared even when the input data were obtained using different sequencing technologies, processed using different pipelines, represented in different data formats and relative to different reference versions. Furthermore, genome fingerprints are robust to up to 30% missing data. Because of their reduced size, computation on the genome fingerprints is fast and requires little memory. For example, we could compute all-against-all pairwise comparisons among the 2504 genomes in the 1000 Genomes data set in 67 s at high quality (21 μs per comparison, on a single processor), and achieved a lower quality approximation in just 11 s. Efficient computation enables scaling up a variety of important genome analyses, including quantifying relatedness, recognizing duplicative sequenced genomes in a set, population reconstruction, and many others. The original genome representation cannot be reconstructed from its fingerprint, effectively decoupling genome comparison from genome interpretation; the method thus has significant implications for privacy-preserving genome analytics.
An in silico pan-genomic probe for the molecular traits behind Lactobacillus ruminis gut autochthony

PubMed Central

Kant, Ravi; Palva, Airi

2017-01-01

As an ecological niche, the mammalian intestine provides the ideal habitat for a variety of bacterial microorganisms. Purportedly, some commensal genera and species offer a beneficial mix of metabolic, protective, and structural processes that help sustain the natural digestive health of the host. Among these sort of gut inhabitants is the Gram-positive lactic acid bacterium Lactobacillus ruminis, a strict anaerobe with both pili and flagella on its cell surface, but also known for being autochthonous (indigenous) to the intestinal environment. Given that the molecular basis of gut autochthony for this species is largely unexplored and unknown, we undertook a study at the genome level to pinpoint some of the adaptive traits behind its colonization behavior. In our pan-genomic probe of L. ruminis, the genomes of nine different strains isolated from human, bovine, porcine, and equine host guts were compiled and compared for in silico analysis. For this, we conducted a geno-phenotypic assessment of protein-coding genes, with an emphasis on those products involved with cell-surface morphology and anaerobic fermentation and respiration. We also categorized and examined the core and accessory genes that define the L. ruminis species and its strains. Here, we made an attempt to identify those genes having ecologically relevant phenotypes that might support or bring about intestinal indigenousness. PMID:28414739
Genome-wide molecular dissection of serotype M3 group A Streptococcus strains causing two epidemics of invasive infections.

PubMed

Beres, Stephen B; Sylva, Gail L; Sturdevant, Daniel E; Granville, Chanel N; Liu, Mengyao; Ricklefs, Stacy M; Whitney, Adeline R; Parkins, Larye D; Hoe, Nancy P; Adams, Gerald J; Low, Donald E; DeLeo, Frank R; McGeer, Allison; Musser, James M

2004-08-10

Molecular factors that contribute to the emergence of new virulent bacterial subclones and epidemics are poorly understood. We hypothesized that analysis of a population-based strain sample of serotype M3 group A Streptococcus (GAS) recovered from patients with invasive infection by using genome-wide investigative methods would provide new insight into this fundamental infectious disease problem. Serotype M3 GAS strains (n = 255) cultured from patients in Ontario, Canada, over 11 years and representing two distinct infection peaks were studied. Genetic diversity was indexed by pulsed-field gel electrophoresis, DNA-DNA microarray, whole-genome PCR scanning, prophage genotyping, targeted gene sequencing, and single-nucleotide polymorphism genotyping. All variation in gene content was attributable to acquisition or loss of prophages, a molecular process that generated unique combinations of proven or putative virulence genes. Distinct serotype M3 genotypes experienced rapid population expansion and caused infections that differed significantly in character and severity. Molecular genetic analysis, combined with immunologic studies, implicated a 4-aa duplication in the extreme N terminus of M protein as a factor contributing to an epidemic wave of serotype M3 invasive infections. This finding has implications for GAS vaccine research. Genome-wide analysis of population-based strain samples cultured from clinically well defined patients is crucial for understanding the molecular events underlying bacterial epidemics.
Pernicious plans revealed: Plasmodium falciparum genome wide expression analysis.

PubMed

Llinás, Manuel; DeRisi, Joseph L

2004-08-01

The asexual intraerythrocytic developmental cycle (IDC) of Plasmodium falciparum is responsible for the majority of the clinical manifestations of malaria in humans. Although malaria has been studied for over a century, the elucidation of the full genome sequence of P. falciparum has now allowed for in-depth studies of gene expression throughout the entire intraerythrocytic stage. As the mainstays of anti-malarial chemotherapy become increasingly ineffective, we need a deeper understanding of fundamental plasmodial bioregulatory mechanisms to successfully subvert them. Recent gene expression studies have begun to examine different aspects of the IDC and are providing key insights into the basic mechanisms of Plasmodium gene regulation and are helping to define gene functions. However, to date, no transcription factor has been fully characterized from Plasmodium and the definitive identification of cis-acting regulatory elements along with their corresponding trans-acting partners is still lacking. The characterization of the transcriptome of P. falciparum is the first major step towards the understanding of the genome wide regulation of gene expression in this parasite. IDC expression data for almost every gene in the P. falciparum genome can now be publicly queried at and. The results of these studies suggest promising leads for identifying novel targets for anti-malarial therapeutics and vaccines in addition to providing a solid foundation for the ongoing elucidation of plasmodial gene expression.
Improved Prediction of Non-methylated Islands in Vertebrates Highlights Different Characteristic Sequence Patterns

PubMed Central

Vingron, Martin

2016-01-01

Non-methylated islands (NMIs) of DNA are genomic regions that are important for gene regulation and development. A recent study of genome-wide non-methylation data in vertebrates by Long et al. (eLife 2013;2:e00348) has shown that many experimentally identified non-methylated regions do not overlap with classically defined CpG islands which are computationally predicted using simple DNA sequence features. This is especially true in cold-blooded vertebrates such as Danio rerio (zebrafish). In order to investigate how predictive DNA sequence is of a region’s methylation status, we applied a supervised learning approach using a spectrum kernel support vector machine, to see if a more complex model and supervised learning can be used to improve non-methylated island prediction and to understand the sequence properties of these regions. We demonstrate that DNA sequence is highly predictive of methylation status, and that in contrast to existing CpG island prediction methods our method is able to provide more useful predictions of NMIs genome-wide in all vertebrate organisms that were studied. Our results also show that in cold-blooded vertebrates (Anolis carolinensis, Xenopus tropicalis and Danio rerio) where genome-wide classical CpG island predictions consist primarily of false positives, longer primarily AT-rich DNA sequence features are able to identify these regions much more accurately. PMID:27984582
Comparative genomic analysis of 45 type strains of the genus Bifidobacterium: a snapshot of its genetic diversity and evolution.

PubMed

Sun, Zhihong; Zhang, Wenyi; Guo, Chenyi; Yang, Xianwei; Liu, Wenjun; Wu, Yarong; Song, Yuqin; Kwok, Lai Yu; Cui, Yujun; Menghe, Bilige; Yang, Ruifu; Hu, Liangping; Zhang, Heping

2015-01-01

Bifidobacteria are well known for their human health-promoting effects and are therefore widely applied in the food industry. Members of the Bifidobacterium genus were first identified from the human gastrointestinal tract and were then found to be widely distributed across various ecological niches. Although the genetic diversity of Bifidobacterium has been determined based on several marker genes or a few genomes, the global diversity and evolution scenario for the entire genus remain unresolved. The present study comparatively analyzed the genomes of 45 type strains. We built a robust genealogy for Bifidobacterium based on 402 core genes and defined its root according to the phylogeny of the tree of bacteria. Our results support that all human isolates are of younger lineages, and although species isolated from bees dominate the more ancient lineages, the bee was not necessarily the original host for bifidobacteria. Moreover, the species isolated from different hosts are enriched with specific gene sets, suggesting host-specific adaptation. Notably, bee-specific genes are strongly associated with respiratory metabolism and are potential in helping those bacteria adapt to the oxygen-rich gut environment in bees. This study provides a snapshot of the genetic diversity and evolution of Bifidobacterium, paving the way for future studies on the taxonomy and functional genomics of the genus.
Towards Defining the Ecological Niches of Novel Coastal Gulf of Mexico Bacterial Isolates

NASA Astrophysics Data System (ADS)

Henson, M. W.; Thrash, C.; Nall, E.

2016-02-01

The study of microbial contributions to biogeochemistry is critical to understanding the cycles of fundamental compounds and gain predictive capabilities in a changing environment. Such study requires observation of microbial communities and genetics in nature, coupled with experimental testing of hypotheses both in situ and in laboratory settings. This study combines dilution-to-extinction based high-throughput culturing (HTC) with cultivation-independent and geochemical measurements to define potential ecological niches of novel bacterial isolates from the coastal northern Gulf of Mexico (cnGOM). Here we report findings from the first of a three-year project. In total, 43 cultures from seven HTC experiments were capable of being repeatedly transferred. Sanger sequencing of the 16S rRNA gene identified these isolates as belonging to the phyla Gammaproteobacteria, Alphaproteobacteria, Actinobacteria, and Betaproteobacteria. Eight are being genome sequenced, with two selected for further physiological characterization due to their phylogenic novelty and potential ecological significance. Strain LSUCC101 likely represents a novel family of Gammaproteobacteria (best blast hit to a cultured representative showed 91% sequence identity) and strain LSUCC96 belongs to the OM252 clade, with the Hawaiian isolate HIMB30 as its closest relative. Both are small (0.3-0.5 µm) cocci. The environmental importance of both LSUCC101 and LSUCC96 was illustrated by their presence within the top 30 OTU0.03 of cnGOM 16S rRNA gene datasets as well as within clone libraries from coastal regions around the world. Ongoing work is determining growth efficiencies, substrate utilization profiles, and metabolic potential to elucidate the roles of these organisms in the cnGOM. Comparative genomics will examine the evolutionary divergence of these organisms from their closest neighbors, and metagenomic recruitment to genomes will help identify strain-based variation from different coastal regions.
Human-Specific Duplication and Mosaic Transcripts: The Recent Paralogous Structure of Chromosome 22

PubMed Central

Bailey, Jeffrey A. ; Yavor, Amy M. ; Viggiano, Luigi ; Misceo, Doriana ; Horvath, Juliann E. ; Archidiacono, Nicoletta ; Schwartz, Stuart ; Rocchi, Mariano ; Eichler, Evan E.

2002-01-01

In recent decades, comparative chromosomal banding, chromosome painting, and gene-order studies have shown strong conservation of gross chromosome structure and gene order in mammals. However, findings from the human genome sequence suggest an unprecedented degree of recent (<35 million years ago) segmental duplication. This dynamism of segmental duplications has important implications in disease and evolution. Here we present a chromosome-wide view of the structure and evolution of the most highly homologous duplications (⩾1 kb and ⩾90%) on chromosome 22. Overall, 10.8% (3.7/33.8 Mb) of chromosome 22 is duplicated, with an average sequence identity of 95.4%. To organize the duplications into tractable units, intron-exon structure and well-defined duplication boundaries were used to define 78 duplicated modules (minimally shared evolutionary segments) with 157 copies on chromosome 22. Analysis of these modules provides evidence for the creation or modification of 11 novel transcripts. Comparative FISH analyses of human, chimpanzee, gorilla, orangutan, and macaque reveal qualitative and quantitative differences in the distribution of these duplications—consistent with their recent origin. Several duplications appear to be human specific, including a ∼400-kb duplication (99.4%–99.8% sequence identity) that transposed from chromosome 14 to the most proximal pericentromeric region of chromosome 22. Experimental and in silico data further support a pericentromeric gradient of duplications where the most recent duplications transpose adjacent to the centromere. Taken together, these data suggest that segmental duplications have been an ongoing process of primate genome evolution, contributing to recent gene innovation and the dynamic transformation of genome architecture within and among closely related species. PMID:11731936
A phased SNP-based classification of sickle cell anemia HBB haplotypes.

PubMed

Shaikho, Elmutaz M; Farrell, John J; Alsultan, Abdulrahman; Qutub, Hatem; Al-Ali, Amein K; Figueiredo, Maria Stella; Chui, David H K; Farrer, Lindsay A; Murphy, George J; Mostoslavsky, Gustavo; Sebastiani, Paola; Steinberg, Martin H

2017-08-11

Sickle cell anemia causes severe complications and premature death. Five common β-globin gene cluster haplotypes are each associated with characteristic fetal hemoglobin (HbF) levels. As HbF is the major modulator of disease severity, classifying patients according to haplotype is useful. The first method of haplotype classification used restriction fragment length polymorphisms (RFLPs) to detect single nucleotide polymorphisms (SNPs) in the β-globin gene cluster. This is labor intensive, and error prone. We used genome-wide SNP data imputed to the 1000 Genomes reference panel to obtain phased data distinguishing parental alleles. We successfully haplotyped 813 sickle cell anemia patients previously classified by RFLPs with a concordance >98%. Four SNPs (rs3834466, rs28440105, rs10128556, and rs968857) marking four different restriction enzyme sites unequivocally defined most haplotypes. We were able to assign a haplotype to 86% of samples that were either partially or misclassified using RFLPs. Phased data using only four SNPs allowed unequivocal assignment of a haplotype that was not always possible using a larger number of RFLPs. Given the availability of genome-wide SNP data, our method is rapid and does not require high computational resources.
Genome amplification and promoter mutation expand the range of csgD-dependent biofilm responses in an STEC population.

PubMed

Uhlich, Gaylen A; Chen, Chin-Yi; Cottrell, Bryan J; Andreozzi, Elisa; Irwin, Peter L; Nguyen, Ly-Huong

2017-04-01

Expression of the major biofilm components of E. coli, curli fimbriae and cellulose, requires the CsgD transcription factor. A complex regulatory network allows environmental control of csgD transcription and biofilm formation. However, most clinical serotype O157 : H7 strains contain prophage insertions in the csgD regulator, mlrA, or mutations in other regulators that restrict csgD expression. These barriers can be circumvented by certain compensating mutations that restore higher csgD expression. One mechanism is via csgD promoter mutations that switch sigma factor utilization. Biofilm-forming variants utilizing RpoD rather than RpoS have been identified in glycerol freezer stocks of the non-biofilm-forming food-borne outbreak strain, ATCC 43894. In this study we used whole genome sequencing and RNA-seq to study genotypic and transcriptomic differences between those strains. In addition to defining the consequences of the csgD promoter switch and identifying new csgD-controlled genes, we discovered a region of genome amplification in our laboratory stock of 43894 (designated 43894OW) that contributed to the regulation of csgD-dependent properties.
Virion Architecture Unifies Globally Distributed Pleolipoviruses Infecting Halophilic Archaea

PubMed Central

Pietilä, Maija K.; Atanasova, Nina S.; Manole, Violeta; Liljeroos, Lassi; Butcher, Sarah J.; Oksanen, Hanna M.

2012-01-01

Our understanding of the third domain of life, Archaea, has greatly increased since its establishment some 20 years ago. The increasing information on archaea has also brought their viruses into the limelight. Today, about 100 archaeal viruses are known, which is a low number compared to the numbers of characterized bacterial or eukaryotic viruses. Here, we have performed a comparative biological and structural study of seven pleomorphic viruses infecting extremely halophilic archaea. The pleomorphic nature of this novel virion type was established by sedimentation analysis and cryo-electron microscopy. These nonlytic viruses form virions characterized by a lipid vesicle enclosing the genome, without any nucleoproteins. The viral lipids are unselectively acquired from host cell membranes. The virions contain two to three major structural proteins, which either are embedded in the membrane or form spikes distributed randomly on the external membrane surface. Thus, the most important step during virion assembly is most likely the interaction of the membrane proteins with the genome. The interaction can be driven by single-stranded or double-stranded DNA, resulting in the virions having similar architectures but different genome types. Based on our comparative study, these viruses probably form a novel group, which we define as pleolipoviruses. PMID:22357279

Integrative genomic profiling of large-cell neuroendocrine carcinomas reveals distinct subtypes of high-grade neuroendocrine lung tumors.

PubMed

George, Julie; Walter, Vonn; Peifer, Martin; Alexandrov, Ludmil B; Seidel, Danila; Leenders, Frauke; Maas, Lukas; Müller, Christian; Dahmen, Ilona; Delhomme, Tiffany M; Ardin, Maude; Leblay, Noemie; Byrnes, Graham; Sun, Ruping; De Reynies, Aurélien; McLeer-Florin, Anne; Bosco, Graziella; Malchers, Florian; Menon, Roopika; Altmüller, Janine; Becker, Christian; Nürnberg, Peter; Achter, Viktor; Lang, Ulrich; Schneider, Peter M; Bogus, Magdalena; Soloway, Matthew G; Wilkerson, Matthew D; Cun, Yupeng; McKay, James D; Moro-Sibilot, Denis; Brambilla, Christian G; Lantuejoul, Sylvie; Lemaitre, Nicolas; Soltermann, Alex; Weder, Walter; Tischler, Verena; Brustugun, Odd Terje; Lund-Iversen, Marius; Helland, Åslaug; Solberg, Steinar; Ansén, Sascha; Wright, Gavin; Solomon, Benjamin; Roz, Luca; Pastorino, Ugo; Petersen, Iver; Clement, Joachim H; Sänger, Jörg; Wolf, Jürgen; Vingron, Martin; Zander, Thomas; Perner, Sven; Travis, William D; Haas, Stefan A; Olivier, Magali; Foll, Matthieu; Büttner, Reinhard; Hayes, David Neil; Brambilla, Elisabeth; Fernandez-Cuesta, Lynnette; Thomas, Roman K

2018-03-13

Pulmonary large-cell neuroendocrine carcinomas (LCNECs) have similarities with other lung cancers, but their precise relationship has remained unclear. Here we perform a comprehensive genomic (n = 60) and transcriptomic (n = 69) analysis of 75 LCNECs and identify two molecular subgroups: "type I LCNECs" with bi-allelic TP53 and STK11/KEAP1 alterations (37%), and "type II LCNECs" enriched for bi-allelic inactivation of TP53 and RB1 (42%). Despite sharing genomic alterations with adenocarcinomas and squamous cell carcinomas, no transcriptional relationship was found; instead LCNECs form distinct transcriptional subgroups with closest similarity to SCLC. While type I LCNECs and SCLCs exhibit a neuroendocrine profile with ASCL1 high /DLL3 high /NOTCH low , type II LCNECs bear TP53 and RB1 alterations and differ from most SCLC tumors with reduced neuroendocrine markers, a pattern of ASCL1 low /DLL3 low /NOTCH high , and an upregulation of immune-related pathways. In conclusion, LCNECs comprise two molecularly defined subgroups, and distinguishing them from SCLC may allow stratified targeted treatment of high-grade neuroendocrine lung tumors.
Statistical methods for detecting periodic fragments in DNA sequence data

PubMed Central

2011-01-01

Background Period 10 dinucleotides are structurally and functionally validated factors that influence the ability of DNA to form nucleosomes, histone core octamers. Robust identification of periodic signals in DNA sequences is therefore required to understand nucleosome organisation in genomes. While various techniques for identifying periodic components in genomic sequences have been proposed or adopted, the requirements for such techniques have not been considered in detail and confirmatory testing for a priori specified periods has not been developed. Results We compared the estimation accuracy and suitability for confirmatory testing of autocorrelation, discrete Fourier transform (DFT), integer period discrete Fourier transform (IPDFT) and a previously proposed Hybrid measure. A number of different statistical significance procedures were evaluated but a blockwise bootstrap proved superior. When applied to synthetic data whose period-10 signal had been eroded, or for which the signal was approximately period-10, the Hybrid technique exhibited superior properties during exploratory period estimation. In contrast, confirmatory testing using the blockwise bootstrap procedure identified IPDFT as having the greatest statistical power. These properties were validated on yeast sequences defined from a ChIP-chip study where the Hybrid metric confirmed the expected dominance of period-10 in nucleosome associated DNA but IPDFT identified more significant occurrences of period-10. Application to the whole genomes of yeast and mouse identified ~ 21% and ~ 19% respectively of these genomes as spanned by period-10 nucleosome positioning sequences (NPS). Conclusions For estimating the dominant period, we find the Hybrid period estimation method empirically to be the most effective for both eroded and approximate periodicity. The blockwise bootstrap was found to be effective as a significance measure, performing particularly well in the problem of period detection in the presence of eroded periodicity. The autocorrelation method was identified as poorly suited for use with the blockwise bootstrap. Application of our methods to the genomes of two model organisms revealed a striking proportion of the yeast and mouse genomes are spanned by NPS. Despite their markedly different sizes, roughly equivalent proportions (19-21%) of the genomes lie within period-10 spans of the NPS dinucleotides {AA, TT, TA}. The biological significance of these regions remains to be demonstrated. To facilitate this, the genomic coordinates are available as Additional files 1, 2, and 3 in a format suitable for visualisation as tracks on popular genome browsers. Reviewers This article was reviewed by Prof Tomas Radivoyevitch, Dr Vsevolod Makeev (nominated by Dr Mikhail Gelfand), and Dr Rob D Knight. PMID:21527008
Comparative Genomics in Homo sapiens.

PubMed

Oti, Martin; Sammeth, Michael

2018-01-01

Genomes can be compared at different levels of divergence, either between species or within species. Within species genomes can be compared between different subpopulations, such as human subpopulations from different continents. Investigating the genomic differences between different human subpopulations is important when studying complex diseases that are affected by many genetic variants, as the variants involved can differ between populations. The 1000 Genomes Project collected genome-scale variation data for 2504 human individuals from 26 different populations, enabling a systematic comparison of variation between human subpopulations. In this chapter, we present step-by-step a basic protocol for the identification of population-specific variants employing the 1000 Genomes data. These variants are subsequently further investigated for those that affect the proteome or RNA splice sites, to investigate potentially biologically relevant differences between the populations.
Tailoring Medulloblastoma Treatment Through Genomics: Making a Change, One Subgroup at a Time.

PubMed

Holgado, Borja L; Guerreiro Stucklin, Ana; Garzia, Livia; Daniels, Craig; Taylor, Michael D

2017-08-31

After more than a decade of genomic studies in medulloblastoma, the time has come to capitalize on the knowledge gained and use it to directly improve patient care. Although metastatic and relapsed disease remain poorly understood, much has changed in how we define medulloblastoma, and it has become evident that with conventional therapies, specific groups of patients are currently under- or overtreated. In this review, we summarize the latest insights into medulloblastoma biology, focusing on how genomics is affecting patient stratification, informing preclinical studies of targeted therapies, and shaping the new generation of clinical trials.
Genomic imprinting—an epigenetic gene-regulatory model

PubMed Central

Koerner, Martha V; Barlow, Denise P

2010-01-01

Epigenetic mechanisms (Box 1) are considered to play major gene-regulatory roles in development, differentiation and disease. However, the relative importance of epigenetics in defining the mammalian transcriptome in normal and disease states is unknown. The mammalian genome contains only a few model systems where epigenetic gene regulation has been shown to play a major role in transcriptional control. These model systems are important not only to investigate the biological function of known epigenetic modifications but also to identify new and unexpected epigenetic mechanisms in the mammalian genome. Here we review recent progress in understanding how epigenetic mechanisms control imprinted gene expression. PMID:20153958
An Inherited Efficiencies Model of Non-Genomic Evolution

NASA Technical Reports Server (NTRS)

New, Michael H.; Pohorille, Andrew

1999-01-01

A model for the evolution of biological systems in the absence of a nucleic acid-like genome is proposed and applied to model the earliest living organisms -- protocells composed of membrane encapsulated peptides. Assuming that the peptides can make and break bonds between amino acids, and bonds in non-functional peptides are more likely to be destroyed than in functional peptides, it is demonstrated that the catalytic capabilities of the system as a whole can increase. This increase is defined to be non-genomic evolution. The relationship between the proposed mechanism for evolution and recent experiments on self-replicating peptides is discussed.
Surviving an Identity Crisis: A Revised View of Chromatin Insulators in the Genomics Era

PubMed Central

Matzat, Leah H.; Lei, Elissa P.

2013-01-01

The control of complex, developmentally regulated loci and partitioning of the genome into active and silent domains is in part accomplished through the activity of DNA-protein complexes termed chromatin insulators. Together, the multiple, well-studied classes of insulators in Drosophila melanogaster appear to be generally functionally conserved. In this review, we discuss recent genomic-scale experiments and attempt to reconcile these newer findings in the context of previously defined insulator characteristics based on classical genetic analyses and transgenic approaches. Finally, we discuss the emerging understanding of mechanisms of chromatin insulator regulation. PMID:24189492
Genetics and Genomics of Endometriosis

PubMed Central

Hansen, Keith A.; Eyster, Kathleen M.

2015-01-01

Endometriosis is a common cause of morbidity in women with an unknown etiology. Studies have demonstrated the familial nature of endometriosis and suggest that inheritance occurs in a polygenic/multifactorial fashion. Studies have attempted to define the gene or genes responsible for endometriosis through association or linkage studies with candidate genes or DNA mapping technology. A number of genomics studies have demonstrated significant alterations in gene expression in endometriosis. A more thorough understanding of the genetics and genomics of endometriosis will facilitate understanding the basic biology of the disease and open new inroads to diagnosis and treatment of this enigmatic condition. PMID:20436317
Accuracy of predicting genomic breeding values for residual feed intake in Angus and Charolais beef cattle.

PubMed

Chen, L; Schenkel, F; Vinsky, M; Crews, D H; Li, C

2013-10-01

In beef cattle, phenotypic data that are difficult and/or costly to measure, such as feed efficiency, and DNA marker genotypes are usually available on a small number of animals of different breeds or populations. To achieve a maximal accuracy of genomic prediction using the phenotype and genotype data, strategies for forming a training population to predict genomic breeding values (GEBV) of the selection candidates need to be evaluated. In this study, we examined the accuracy of predicting GEBV for residual feed intake (RFI) based on 522 Angus and 395 Charolais steers genotyped on SNP with the Illumina Bovine SNP50 Beadchip for 3 training population forming strategies: within breed, across breed, and by pooling data from the 2 breeds (i.e., combined). Two other scenarios with the training and validation data split by birth year and by sire family within a breed were also investigated to assess the impact of genetic relationships on the accuracy of genomic prediction. Three statistical methods including the best linear unbiased prediction with the relationship matrix defined based on the pedigree (PBLUP), based on the SNP genotypes (GBLUP), and a Bayesian method (BayesB) were used to predict the GEBV. The results showed that the accuracy of the GEBV prediction was the highest when the prediction was within breed and when the validation population had greater genetic relationships with the training population, with a maximum of 0.58 for Angus and 0.64 for Charolais. The within-breed prediction accuracies dropped to 0.29 and 0.38, respectively, when the validation populations had a minimal pedigree link with the training population. When the training population of a different breed was used to predict the GEBV of the validation population, that is, across-breed genomic prediction, the accuracies were further reduced to 0.10 to 0.22, depending on the prediction method used. Pooling data from the 2 breeds to form the training population resulted in accuracies increased to 0.31 and 0.43, respectively, for the Angus and Charolais validation populations. The results suggested that the genetic relationship of selection candidates with the training population has a greater impact on the accuracy of GEBV using the Illumina Bovine SNP50 Beadchip. Pooling data from different breeds to form the training population will improve the accuracy of across breed genomic prediction for RFI in beef cattle.
Heritabilities and genetic correlations in the same traits across different strata of herds created according to continuous genomic, genetic, and phenotypic descriptors.

PubMed

Yin, Tong; König, Sven

2018-03-01

The most common approach in dairy cattle to prove genotype by environment interactions is a multiple-trait model application, and considering the same traits in different environments as different traits. We enhanced such concepts by defining continuous phenotypic, genetic, and genomic herd descriptors, and applying random regression sire models. Traits of interest were test-day traits for milk yield, fat percentage, protein percentage, and somatic cell score, considering 267,393 records from 32,707 first-lactation Holstein cows. Cows were born in the years 2010 to 2013, and kept in 52 large-scale herds from 2 federal states of north-east Germany. The average number of genotyped cows per herd (45,613 single nucleotide polymorphism markers per cow) was 133.5 (range: 45 to 415 genotyped cows). Genomic herd descriptors were (1) the level of linkage disequilibrium (r 2 ) within specific chromosome segments, and (2) the average allele frequency for single nucleotide polymorphisms in close distance to a functional mutation. Genetic herd descriptors were the (1) intra-herd inbreeding coefficient, and (2) the percentage of daughters from foreign sires. Phenotypic herd descriptors were (1) herd size, and (2) the herd mean for nonreturn rate. Most correlations among herd descriptors were close to 0, indicating independence of genomic, genetic, and phenotypic characteristics. Heritabilities for milk yield increased with increasing intra-herd linkage disequilibrium, inbreeding, and herd size. Genetic correlations in same traits between adjacent levels of herd descriptors were close to 1, but declined for descriptor levels in greater distance. Genetic correlation declines were more obvious for somatic cell score, compared with test-day traits with larger heritabilities (fat percentage and protein percentage). Also, for milk yield, alterations of herd descriptor levels had an obvious effect on heritabilities and genetic correlations. By trend, multiple trait model results (based on created discrete herd classes) confirmed the random regression estimates. Identified alterations of breeding values in dependency of herd descriptors suggest utilization of specific sires for specific herd structures, offering new possibilities to improve sire selection strategies. Regarding genomic selection designs and genetic gain transfer into commercial herds, cow herds for the utilization in cow training sets should reflect the genomic, genetic, and phenotypic pattern of the broad population. Copyright © 2018 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
The Complete Genome Phylogeny of Geographically Distinct Dengue Virus Serotype 2 Isolates (1944-2013) Supports Further Groupings within the Cosmopolitan Genotype

PubMed Central

Ali, Akhtar; Ali, Ijaz

2015-01-01

Dengue virus serotype 2 (DENV-2) isolates have been implicated in deadly outbreaks of dengue fever (DF) and dengue hemorrhagic fever (DHF) in several regions of the world. Phylogenetic analysis of DENV-2 isolates collected from particular countries has been performed using partial or individual genes but only a few studies have examined complete whole-genome sequences collected worldwide. Herein, 50 complete genome sequences of DENV-2 isolates, reported over the past 70 years from 19 different countries, were downloaded from GenBank. Phylogenetic analysis was conducted and evolutionary distances of the 50 DENV-2 isolates were determined using maximum likelihood (ML) trees or Bayesian phylogenetic analysis created from complete genome nucleotide (nt) and amino acid (aa) sequences or individual gene sequences. The results showed that all DENV-2 isolates fell into seven main groups containing five previously defined genotypes. A Cosmopolitan genotype showed further division into three groups (C-I, C-II, and C-III) with the C-I group containing two subgroups (C-IA and C-IB). Comparison of the aa sequences showed specific mutations among the various groups of DENV-2 isolates. A maximum number of aa mutations was observed in the NS5 gene, followed by the NS2A, NS3 and NS1 genes, while the smallest number of aa substitutions was recorded in the capsid gene, followed by the PrM/M, NS4A, and NS4B genes. Maximum evolutionary distances were found in the NS2A gene, followed by the NS4A and NS4B genes. Based on these results, we propose that genotyping of DENV-2 isolates in future studies should be performed on entire genome sequences in order to gain a complete understanding of the evolution of various isolates reported from different geographical locations around the world. PMID:26414178
Mitotic Evolution of Plasmodium falciparum Shows a Stable Core Genome but Recombination in Antigen Families

PubMed Central

Bopp, Selina E. R.; Manary, Micah J.; Bright, A. Taylor; Johnston, Geoffrey L.; Dharia, Neekesh V.; Luna, Fabio L.; McCormack, Susan; Plouffe, David; McNamara, Case W.; Walker, John R.; Fidock, David A.; Denchi, Eros Lazzerini; Winzeler, Elizabeth A.

2013-01-01

Malaria parasites elude eradication attempts both within the human host and across nations. At the individual level, parasites evade the host immune responses through antigenic variation. At the global level, parasites escape drug pressure through single nucleotide variants and gene copy amplification events conferring drug resistance. Despite their importance to global health, the rates at which these genomic alterations emerge have not been determined. We studied the complete genomes of different Plasmodium falciparum clones that had been propagated asexually over one year in the presence and absence of drug pressure. A combination of whole-genome microarray analysis and next-generation deep resequencing (totaling 14 terabases) revealed a stable core genome with only 38 novel single nucleotide variants appearing in seventeen evolved clones (avg. 5.4 per clone). In clones exposed to atovaquone, we found cytochrome b mutations as well as an amplification event encompassing the P. falciparum multidrug resistance associated protein (mrp1) on chromosome 1. We observed 18 large-scale (>1 kb on average) deletions of telomere-proximal regions encoding multigene families, involved in immune evasion (9.5×10−6 structural variants per base pair per generation). Six of these deletions were associated with chromosomal crossovers generated during mitosis. We found only minor differences in rates between genetically distinct strains and between parasites cultured in the presence or absence of drug. Using these derived mutation rates for P. falciparum (1.0–9.7×10−9 mutations per base pair per generation), we can now model the frequency at which drug or immune resistance alleles will emerge under a well-defined set of assumptions. Further, the detection of mitotic recombination events in var gene families illustrates how multigene families can arise and change over time in P. falciparum. These results will help improve our understanding of how P. falciparum evolves to evade control efforts within both the individual hosts and large populations. PMID:23408914
Implementing genomics and pharmacogenomics in the clinic: The National Human Genome Research Institute’s genomic medicine portfolio

PubMed Central

Manolio, Teri A.

2016-01-01

Increasing knowledge about the influence of genetic variation on human health and growing availability of reliable, cost-effective genetic testing have spurred the implementation of genomic medicine in the clinic. As defined by the National Human Genome Research Institute (NHGRI), genomic medicine uses an individual’s genetic information in his or her clinical care, and has begun to be applied effectively in areas such as cancer genomics, pharmacogenomics, and rare and undiagnosed diseases. In 2011 NHGRI published its strategic vision for the future of genomic research, including an ambitious research agenda to facilitate and promote the implementation of genomic medicine. To realize this agenda, NHGRI is consulting and facilitating collaborations with the external research community through a series of “Genomic Medicine Meetings,” under the guidance and leadership of the National Advisory Council on Human Genome Research. These meetings have identified and begun to address significant obstacles to implementation, such as lack of evidence of efficacy, limited availability of genomics expertise and testing, lack of standards, and diffficulties in integrating genomic results into electronic medical records. The six research and dissemination initiatives comprising NHGRI’s genomic research portfolio are designed to speed the evaluation and incorporation, where appropriate, of genomic technologies and findings into routine clinical care. Actual adoption of successful approaches in clinical care will depend upon the willingness, interest, and energy of professional societies, practitioners, patients, and payers to promote their responsible use and share their experiences in doing so. PMID:27612677
Implementing genomics and pharmacogenomics in the clinic: The National Human Genome Research Institute's genomic medicine portfolio.

PubMed

Manolio, Teri A

2016-10-01

Increasing knowledge about the influence of genetic variation on human health and growing availability of reliable, cost-effective genetic testing have spurred the implementation of genomic medicine in the clinic. As defined by the National Human Genome Research Institute (NHGRI), genomic medicine uses an individual's genetic information in his or her clinical care, and has begun to be applied effectively in areas such as cancer genomics, pharmacogenomics, and rare and undiagnosed diseases. In 2011 NHGRI published its strategic vision for the future of genomic research, including an ambitious research agenda to facilitate and promote the implementation of genomic medicine. To realize this agenda, NHGRI is consulting and facilitating collaborations with the external research community through a series of "Genomic Medicine Meetings," under the guidance and leadership of the National Advisory Council on Human Genome Research. These meetings have identified and begun to address significant obstacles to implementation, such as lack of evidence of efficacy, limited availability of genomics expertise and testing, lack of standards, and difficulties in integrating genomic results into electronic medical records. The six research and dissemination initiatives comprising NHGRI's genomic research portfolio are designed to speed the evaluation and incorporation, where appropriate, of genomic technologies and findings into routine clinical care. Actual adoption of successful approaches in clinical care will depend upon the willingness, interest, and energy of professional societies, practitioners, patients, and payers to promote their responsible use and share their experiences in doing so. Published by Elsevier Ireland Ltd.
WhopGenome: high-speed access to whole-genome variation and sequence data in R.

PubMed

Wittelsbürger, Ulrich; Pfeifer, Bastian; Lercher, Martin J

2015-02-01

The statistical programming language R has become a de facto standard for the analysis of many types of biological data, and is well suited for the rapid development of new algorithms. However, variant call data from population-scale resequencing projects are typically too large to be read and processed efficiently with R's built-in I/O capabilities. WhopGenome can efficiently read whole-genome variation data stored in the widely used variant call format (VCF) file format into several R data types. VCF files can be accessed either on local hard drives or on remote servers. WhopGenome can associate variants with annotations such as those available from the UCSC genome browser, and can accelerate the reading process by filtering loci according to user-defined criteria. WhopGenome can also read other Tabix-indexed files and create indices to allow fast selective access to FASTA-formatted sequence files. The WhopGenome R package is available on CRAN at http://cran.r-project.org/web/packages/WhopGenome/. A Bioconductor package has been submitted. lercher@cs.uni-duesseldorf.de. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Genomic identification of regulatory elements by evolutionary sequence comparison and functional analysis.

PubMed

Loots, Gabriela G

2008-01-01

Despite remarkable recent advances in genomics that have enabled us to identify most of the genes in the human genome, comparable efforts to define transcriptional cis-regulatory elements that control gene expression are lagging behind. The difficulty of this task stems from two equally important problems: our knowledge of how regulatory elements are encoded in genomes remains elementary, and there is a vast genomic search space for regulatory elements, since most of mammalian genomes are noncoding. Comparative genomic approaches are having a remarkable impact on the study of transcriptional regulation in eukaryotes and currently represent the most efficient and reliable methods of predicting noncoding sequences likely to control the patterns of gene expression. By subjecting eukaryotic genomic sequences to computational comparisons and subsequent experimentation, we are inching our way toward a more comprehensive catalog of common regulatory motifs that lie behind fundamental biological processes. We are still far from comprehending how the transcriptional regulatory code is encrypted in the human genome and providing an initial global view of regulatory gene networks, but collectively, the continued development of comparative and experimental approaches will rapidly expand our knowledge of the transcriptional regulome.
Fold or hold: experimental evolution in vitro

PubMed Central

Collins, S; Rambaut, A; Bridgett, S J

2013-01-01

We introduce a system for experimental evolution consisting of populations of short oligonucleotides (Oli populations) evolving in a modified quantitative polymerase chain reaction (qPCR). It is tractable at the genetic, genomic, phenotypic and fitness levels. The Oli system uses DNA hairpins designed to form structures that self-prime under defined conditions. Selection acts on the phenotype of self-priming, after which differences in fitness are amplified and quantified using qPCR. We outline the methodological and bioinformatics tools for the Oli system here and demonstrate that it can be used as a conventional experimental evolution model system by test-driving it in an experiment investigating adaptive evolution under different rates of environmental change. PMID:24003997
[The great virus comeback].

PubMed

Forterre, Patrick

2013-01-01

Viruses have been considered for a long time as by-products of biological evolution. This view is changing now as a result of several recent discoveries. Viral ecologists have shown that viral particles are the most abundant biological entities on our planet, whereas metagenomic analyses have revealed an unexpected abundance and diversity of viral genes in the biosphere. Comparative genomics have highlighted the uniqueness of viral sequences, in contradiction with the traditional view of viruses as pickpockets of cellular genes. On the contrary, cellular genomes, especially eukaryotic ones, turned out to be full of genes derived from viruses or related elements (plasmids, transposons, retroelements and so on). The discovery of unusual viruses infecting archaea has shown that the viral world is much more diverse than previously thought, ruining the traditional dichotomy between bacteriophages and viruses. Finally, the discovery of giant viruses has blurred the traditional image of viruses as small entities. Furthermore, essential clues on virus history have been obtained in the last ten years. In particular, structural analyses of capsid proteins have uncovered deeply rooted homologies between viruses infecting different cellular domains, suggesting that viruses originated before the last universal common ancestor (LUCA). These studies have shown that several lineages of viruses originated independently, i.e., viruses are polyphyletic. From the time of LUCA, viruses have coevolved with their hosts, and viral lineages can be viewed as lianas wrapping around the trunk, branches and leaves of the tree of life. Although viruses are very diverse, with genomes encoding from one to more than one thousand proteins, they can all be simply defined as organisms producing virions. Virions themselves can be defined as infectious particles made of at least one protein associated with the viral nucleic acid, endowed with the capability to protect the viral genome and ensure its delivery to the infected cell. These definitions, which clearly distinguish viruses from plasmids, suggest that infectious RNA molecules that only encode an RNA replicase presently classified among viruses by the ICTV (International Committee for the Taxonomy of Viruses) into families of Endornaviridae and Hypoviridae are in fact RNA plasmids. Since a viral genome should encode for at least one structural protein, these definitions also imply that viruses originated after the emergence of the ribosome in an RNA-protein cellular world. Although virions are the hallmarks of viruses, viruses and virions should not be confused. The infection transforms the ribocell (cell encoding ribosomes and dividing by binary fission) into a virocell (cell producing virions) or ribovirocell (cell that produces virions but can still divide by binary fission). In the ribovirocell, two different organisms, defined by their distinct evolutionary histories, coexist in symbiosis in the same cell. The virocells or ribovirocells are the living forms of the virus, which can be in fine considered to be a living organism. In the virocell, the metabolism is reorganized for the production of virions, while the ability to capture and store free energy is retained, as in other cellular organisms. In the virocell, viral genomes replicate, recombine and evolve, leading to the emergence of new viral proteins and potentially novel functions. Some of these new functions can be later on transferred to the cell, explaining how viruses can play a major (often underestimated) role in the evolution of cellular organisms. The virocell concept thus helps to understand recent hypotheses suggesting that viruses played a critical role in major evolutionary transitions, such as the origin of DNA genomes or else the origin of the eukaryotic nucleus. Finally, it is more and more recognized that viruses are the major source of variation and selection in living organisms (both viruses and cells), the two pillars of darwinism. One can thus conclude that the continuous interaction between viruses and cells, all along the history of life, has been, and still is, a major engine of biological evolution. © Société de Biologie, 2013.
Linking maternal and somatic 5S rRNA types with different sequence-specific non-LTR retrotransposons

PubMed Central

Pagano, Johanna F.B.; Ensink, Wim A.; van Olst, Marina; van Leeuwen, Selina; Nehrdich, Ulrike; Zhu, Kongju; Spaink, Herman P.; Girard, Geneviève; Rauwerda, Han; Jonker, Martijs J.; Dekker, Rob J.

2017-01-01

5S rRNA is a ribosomal core component, transcribed from many gene copies organized in genomic repeats. Some eukaryotic species have two 5S rRNA types defined by their predominant expression in oogenesis or adult tissue. Our next-generation sequencing study on zebrafish egg, embryo, and adult tissue identified maternal-type 5S rRNA that is exclusively accumulated during oogenesis, replaced throughout the embryogenesis by a somatic-type, and thus virtually absent in adult somatic tissue. The maternal-type 5S rDNA contains several thousands of gene copies on chromosome 4 in tandem repeats with small intergenic regions, whereas the somatic-type is present in only 12 gene copies on chromosome 18 with large intergenic regions. The nine-nucleotide variation between the two 5S rRNA types likely affects TFIII binding and riboprotein L5 binding, probably leading to storage of maternal-type rRNA. Remarkably, these sequence differences are located exactly at the sequence-specific target site for genome integration by the 5S rRNA-specific Mutsu retrotransposon family. Thus, we could define maternal- and somatic-type MutsuDr subfamilies. Furthermore, we identified four additional maternal-type and two new somatic-type MutsuDr subfamilies, each with their own target sequence. This target-site specificity, frequently intact maternal-type retrotransposon elements, plus specific presence of Mutsu retrotransposon RNA and piRNA in egg and adult tissue, suggest an involvement of retrotransposons in achieving the differential copy number of the two types of 5S rDNA loci. PMID:28003516
Population Stratification and Underrepresentation of Indian Subcontinent Genetic Diversity in the 1000 Genomes Project Dataset.

PubMed

Sengupta, Dhriti; Choudhury, Ananyo; Basu, Analabha; Ramsay, Michèle

2016-12-31

Genomic variation in Indian populations is of great interest due to the diversity of ancestral components, social stratification, endogamy and complex admixture patterns. With an expanding population of 1.2 billion, India is also a treasure trove to catalogue innocuous as well as clinically relevant rare mutations. Recent studies have revealed four dominant ancestries in populations from mainland India: Ancestral North-Indian (ANI), Ancestral South-Indian (ASI), Ancestral Tibeto-Burman (ATB) and Ancestral Austro-Asiatic (AAA). The 1000 Genomes Project (KGP) Phase-3 data include about 500 genomes from five linguistically defined Indian-Subcontinent (IS) populations (Punjabi, Gujrati, Bengali, Telugu and Tamil) some of whom are recent migrants to USA or UK. Comparative analyses show that despite the distinct geographic origins of the KGP-IS populations, the ANI component is predominantly represented in this dataset. Previous studies demonstrated population substructure in the HapMap Gujrati population, and we found evidence for additional substructure in the Punjabi and Telugu populations. These substructured populations have characteristic/significant differences in heterozygosity and inbreeding coefficients. Moreover, we demonstrate that the substructure is better explained by factors like differences in proportion of ancestral components, and endogamy driven social structure rather than invoking a novel ancestral component to explain it. Therefore, using language and/or geography as a proxy for an ethnic unit is inadequate for many of the IS populations. This highlights the necessity for more nuanced sampling strategies or corrective statistical approaches, particularly for biomedical and population genetics research in India. © The Author(s) 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.