Science.gov

Sample records for resolution genomic analysis

  1. Bisulfite-free and Base-resolution Analysis of 5-formylcytosine at Whole-genome Scale

    PubMed Central

    Xia, Bo; Han, Dali; Lu, Xingyu; Sun, Zhaozhu; Zhou, Ankun; Yin, Qiangzong; Zeng, Hu; Liu, Menghao; Jiang, Xiang; Xie, Wei; He, Chuan; Yi, Chengqi

    2015-01-01

    Active DNA demethylation in mammals involves TET-mediated oxidation of 5-methylcytosine (5mC) to 5-hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC) and 5-carboxycytosine (5caC). However, genome-wide detection of 5fC at single-base resolution remains challenging. Here we present a bisulfite-free method for whole-genome analysis of 5fC, based on selective chemical labeling of 5fC and subsequent C-to-T transition during PCR. Base-resolution 5fC maps reveal limited overlap with 5hmC, with 5fC-marked regions more active than 5hmC-marked ones. PMID:26344045

  2. Plastid: nucleotide-resolution analysis of next-generation sequencing and genomics data.

    PubMed

    Dunn, Joshua G; Weissman, Jonathan S

    2016-11-22

    Next-generation sequencing (NGS) informs many biological questions with unprecedented depth and nucleotide resolution. These assays have created a need for analytical tools that enable users to manipulate data nucleotide-by-nucleotide robustly and easily. Furthermore, because many NGS assays encode information jointly within multiple properties of read alignments - for example, in ribosome profiling, the locations of ribosomes are jointly encoded in alignment coordinates and length - analytical tools are often required to extract the biological meaning from the alignments before analysis. Many assay-specific pipelines exist for this purpose, but there remains a need for user-friendly, generalized, nucleotide-resolution tools that are not limited to specific experimental regimes or analytical workflows. Plastid is a Python library designed specifically for nucleotide-resolution analysis of genomics and NGS data. As such, Plastid is designed to extract assay-specific information from read alignments while retaining generality and extensibility to novel NGS assays. Plastid represents NGS and other biological data as arrays of values associated with genomic or transcriptomic positions, and contains configurable tools to convert data from a variety of sources to such arrays. Plastid also includes numerous tools to manipulate even discontinuous genomic features, such as spliced transcripts, with nucleotide precision. Plastid automatically handles conversion between genomic and feature-centric coordinates, accounting for splicing and strand, freeing users of burdensome accounting. Finally, Plastid's data models use consistent and familiar biological idioms, enabling even beginners to develop sophisticated analytical workflows with minimal effort. Plastid is a versatile toolkit that has been used to analyze data from multiple NGS assays, including RNA-seq, ribosome profiling, and DMS-seq. It forms the genomic engine of our ORF annotation tool, ORF-RATER, and is readily

  3. Base-Resolution Analysis of Cisplatin–DNA Adducts at the Genome Scale

    PubMed Central

    Shu, Xiaoting; Xiong, Xushen; Song, Jinghui; He, Chuan; Yi, Chengqi

    2016-01-01

    Cisplatin, one of the most widely used anticancer drugs, crosslinks DNA and ultimately induces cell death. However, the genomic pattern of cisplatin–DNA adducts has remained unknown owing to the lack of a reliable and sensitive genome-wide method. Herein we present “cisplatin-seq” to identify genome-wide cisplatin crosslinking sites at base resolution. Cisplatin-seq reveals that mitochondrial DNA is a preferred target of cisplatin. For nuclear genomes, cisplatin–DNA adducts are enriched within promoters and regions harboring transcription termination sites. While the density of GG dinucleotides determines the initial crosslinking of cisplatin, binding of proteins to the genome largely contributes to the accumulative pattern of cisplatin–DNA adducts. PMID:27736024

  4. Analysis of Complete Chloroplast Genome Sequences Improves Phylogenetic Resolution in Paris (Melanthiaceae)

    PubMed Central

    Huang, Yuling; Li, Xiaojuan; Yang, Zhenyan; Yang, Chengjin; Yang, Junbo; Ji, Yunheng

    2016-01-01

    The genus Paris in the broad concept is an economically important group in the monocotyledonous family Melanthiaceae (tribe Parideae). The phylogeny of Paris was controversial in previous morphology-based classification and molecular phylogeny. Here, the complete cp genomes of eleven Paris taxa were sequenced, to better understand the evolutionary relationships among these plants and the mutation patterns in their chloroplast (cp) genomes. Comparative analyses indicated that the overall cp genome structure among the Paris taxa is quite similar. The triplication of trnI-CAU was found only in the cp genomes of P. quadrifolia and P. verticillata. Phylogenetic analyses based on the complete cp genomes did not resolve Paris as a monophyletic group, instead providing evidence supporting division of the twelve taxa into two segregate genera: Paris sensu strict and Daiswa. The sister relationship between Daiswa and Trillium was well supported. We recovered two fully supported lineages with divergent distribution in Daiswa; however, none of the previously recognized sections in Daiswa was resolved as monophyletic using plastome data, suggesting that the infrageneric relationships and biogeography of Daiswa species require further investigation. Ten highly divergent DNA regions, suitable for species identification, were detected among the 12 cp genomes. This study is the first successful attempt to provide well-supported evolutionary relationships in Paris based on phylogenomic analyses. The findings highlight the potential of the whole cp genomes for improving resolution in phylogeny as well as species identification in phylogenetically and taxonomically difficult plant genera. PMID:27965698

  5. High-Throughput Genome Editing and Phenotyping Facilitated by High Resolution Melting Curve Analysis

    PubMed Central

    Thomas, Holly R.; Percival, Stefanie M.; Yoder, Bradley K.; Parant, John M.

    2014-01-01

    With the goal to generate and characterize the phenotypes of null alleles in all genes within an organism and the recent advances in custom nucleases, genome editing limitations have moved from mutation generation to mutation detection. We previously demonstrated that High Resolution Melting (HRM) analysis is a rapid and efficient means of genotyping known zebrafish mutants. Here we establish optimized conditions for HRM based detection of novel mutant alleles. Using these conditions, we demonstrate that HRM is highly efficient at mutation detection across multiple genome editing platforms (ZFNs, TALENs, and CRISPRs); we observed nuclease generated HRM positive targeting in 1 of 6 (16%) open pool derived ZFNs, 14 of 23 (60%) TALENs, and 58 of 77 (75%) CRISPR nucleases. Successful targeting, based on HRM of G0 embryos correlates well with successful germline transmission (46 of 47 nucleases); yet, surprisingly mutations in the somatic tail DNA weakly correlate with mutations in the germline F1 progeny DNA. This suggests that analysis of G0 tail DNA is a good indicator of the efficiency of the nuclease, but not necessarily a good indicator of germline alleles that will be present in the F1s. However, we demonstrate that small amplicon HRM curve profiles of F1 progeny DNA can be used to differentiate between specific mutant alleles, facilitating rare allele identification and isolation; and that HRM is a powerful technique for screening possible off-target mutations that may be generated by the nucleases. Our data suggest that micro-homology based alternative NHEJ repair is primarily utilized in the generation of CRISPR mutant alleles and allows us to predict likelihood of generating a null allele. Lastly, we demonstrate that HRM can be used to quickly distinguish genotype-phenotype correlations within F1 embryos derived from G0 intercrosses. Together these data indicate that custom nucleases, in conjunction with the ease and speed of HRM, will facilitate future high

  6. Genome-wide analysis of chromatin packing in Arabidopsis thaliana at single-gene resolution

    PubMed Central

    Liu, Chang; Wang, Congmao; Wang, George; Becker, Claude; Zaidem, Maricris; Weigel, Detlef

    2016-01-01

    The three-dimensional packing of the genome plays an important role in regulating gene expression. We have used Hi-C, a genome-wide chromatin conformation capture (3C) method, to analyze Arabidopsis thaliana chromosomes dissected into subkilobase segments, which is required for gene-level resolution in this species with a gene-dense genome. We found that the repressive H3K27me3 histone mark is overrepresented in the promoter regions of genes that are in conformational linkage over long distances. In line with the globally dispersed distribution of RNA polymerase II in A. thaliana nuclear space, actively transcribed genes do not show a strong tendency to associate with each other. In general, there are often contacts between 5′ and 3′ ends of genes, forming local chromatin loops. Such self-loop structures of genes are more likely to occur in more highly expressed genes, although they can also be found in silent genes. Silent genes with local chromatin loops are highly enriched for the histone variant H3.3 at their 5′ and 3′ ends but depleted of repressive marks such as heterochromatic histone modifications and DNA methylation in flanking regions. Our results suggest that, different from animals, a major theme of genome folding in A. thaliana is the formation of structural units that correspond to gene bodies. PMID:27225844

  7. Analysis of complete mitochondrial genomes from extinct and extant rhinoceroses reveals lack of phylogenetic resolution

    PubMed Central

    Willerslev, Eske; Gilbert, M Thomas P; Binladen, Jonas; Ho, Simon YW; Campos, Paula F; Ratan, Aakrosh; Tomsho, Lynn P; da Fonseca, Rute R; Sher, Andrei; Kuznetsova, Tatanya V; Nowak-Kemp, Malgosia; Roth, Terri L; Miller, Webb; Schuster, Stephan C

    2009-01-01

    Background The scientific literature contains many examples where DNA sequence analyses have been used to provide definitive answers to phylogenetic problems that traditional (non-DNA based) approaches alone have failed to resolve. One notable example concerns the rhinoceroses, a group for which several contradictory phylogenies were proposed on the basis of morphology, then apparently resolved using mitochondrial DNA fragments. Results In this study we report the first complete mitochondrial genome sequences of the extinct ice-age woolly rhinoceros (Coelodonta antiquitatis), and the threatened Javan (Rhinoceros sondaicus), Sumatran (Dicerorhinus sumatrensis), and black (Diceros bicornis) rhinoceroses. In combination with the previously published mitochondrial genomes of the white (Ceratotherium simum) and Indian (Rhinoceros unicornis) rhinoceroses, this data set putatively enables reconstruction of the rhinoceros phylogeny. While the six species cluster into three strongly supported sister-pairings: (i) The black/white, (ii) the woolly/Sumatran, and (iii) the Javan/Indian, resolution of the higher-level relationships has no statistical support. The phylogenetic signal from individual genes is highly diffuse, with mixed topological support from different genes. Furthermore, the choice of outgroup (horse vs tapir) has considerable effect on reconstruction of the phylogeny. The lack of resolution is suggestive of a hard polytomy at the base of crown-group Rhinocerotidae, and this is supported by an investigation of the relative branch lengths. Conclusion Satisfactory resolution of the rhinoceros phylogeny may not be achievable without additional analyses of substantial amounts of nuclear DNA. This study provides a compelling demonstration that, in spite of substantial sequence length, there are significant limitations with single-locus phylogenetics. We expect further examples of this to appear as next-generation, large-scale sequencing of complete mitochondrial

  8. Genome-wide analysis of genetic alterations in testicular primary seminoma using high resolution single nucleotide polymorphism arrays.

    PubMed

    LeBron, Cynthia; Pal, Prodipto; Brait, Mariana; Dasgupta, Santanu; Guerrero-Preston, Rafael; Looijenga, Leendert H J; Kowalski, Jeanne; Netto, George; Hoque, Mohammad O

    2011-06-01

    Testicular germ cell tumors (TGCT) represent the most common malignancy among young males. To our knowledge no comprehensive Copy Number Variation (CNVs) studies of TGCT using high-resolution Single Nucleotide Polymorphism (SNP) array have been performed. By a genome-wide analysis of CNV and loss of heterozygosity (LOH) in 25 primary seminomas, we confirmed several previously reported genomic alterations and discovered eight novel genomic alterations including amplifications and homozygous deletions. Moreover, a comparison of genomic alterations of early and late stage seminoma identified CNVs that correlate with progression, which included deletions in chromosomes 4q, 5p, 9q, 13q and 20p and amplifications in chromosomes 9q and 13q. We compared previously perform Affymetrix expression analysis in a subset of samples and found robust correlation between expression and genomic alterations. Furthermore, high correlations (40-75%) were observed between CNV by SNP analysis and quantitative PCR. Our findings may lead to better understanding of TGTC's pathogenesis. Copyright © 2011. Published by Elsevier Inc.

  9. High-throughput engineering of the mouse genome coupled with high-resolution expression analysis.

    PubMed

    Valenzuela, David M; Murphy, Andrew J; Frendewey, David; Gale, Nicholas W; Economides, Aris N; Auerbach, Wojtek; Poueymirou, William T; Adams, Niels C; Rojas, Jose; Yasenchak, Jason; Chernomorsky, Rostislav; Boucher, Marylene; Elsasser, Andrea L; Esau, Lakeisha; Zheng, Jenny; Griffiths, Jennifer A; Wang, Xiaorong; Su, Hong; Xue, Yingzi; Dominguez, Melissa G; Noguera, Irene; Torres, Richard; Macdonald, Lynn E; Stewart, A Francis; DeChiara, Thomas M; Yancopoulos, George D

    2003-06-01

    One of the most effective approaches for determining gene function involves engineering mice with mutations or deletions in endogenous genes of interest. Historically, this approach has been limited by the difficulty and time required to generate such mice. We describe the development of a high-throughput and largely automated process, termed VelociGene, that uses targeting vectors based on bacterial artificial chromosomes (BACs). VelociGene permits genetic alteration with nucleotide precision, is not limited by the size of desired deletions, does not depend on isogenicity or on positive-negative selection, and can precisely replace the gene of interest with a reporter that allows for high-resolution localization of target-gene expression. We describe custom genetic alterations for hundreds of genes, corresponding to about 0.5-1.0% of the entire genome. We also provide dozens of informative expression patterns involving cells in the nervous system, immune system, vasculature, skeleton, fat and other tissues.

  10. Genome-wide and fine-resolution association analysis of malaria in West Africa.

    PubMed

    Jallow, Muminatou; Teo, Yik Ying; Small, Kerrin S; Rockett, Kirk A; Deloukas, Panos; Clark, Taane G; Kivinen, Katja; Bojang, Kalifa A; Conway, David J; Pinder, Margaret; Sirugo, Giorgio; Sisay-Joof, Fatou; Usen, Stanley; Auburn, Sarah; Bumpstead, Suzannah J; Campino, Susana; Coffey, Alison; Dunham, Andrew; Fry, Andrew E; Green, Angela; Gwilliam, Rhian; Hunt, Sarah E; Inouye, Michael; Jeffreys, Anna E; Mendy, Alieu; Palotie, Aarno; Potter, Simon; Ragoussis, Jiannis; Rogers, Jane; Rowlands, Kate; Somaskantharajah, Elilan; Whittaker, Pamela; Widden, Claire; Donnelly, Peter; Howie, Bryan; Marchini, Jonathan; Morris, Andrew; SanJoaquin, Miguel; Achidi, Eric Akum; Agbenyega, Tsiri; Allen, Angela; Amodu, Olukemi; Corran, Patrick; Djimde, Abdoulaye; Dolo, Amagana; Doumbo, Ogobara K; Drakeley, Chris; Dunstan, Sarah; Evans, Jennifer; Farrar, Jeremy; Fernando, Deepika; Hien, Tran Tinh; Horstmann, Rolf D; Ibrahim, Muntaser; Karunaweera, Nadira; Kokwaro, Gilbert; Koram, Kwadwo A; Lemnge, Martha; Makani, Julie; Marsh, Kevin; Michon, Pascal; Modiano, David; Molyneux, Malcolm E; Mueller, Ivo; Parker, Michael; Peshu, Norbert; Plowe, Christopher V; Puijalon, Odile; Reeder, John; Reyburn, Hugh; Riley, Eleanor M; Sakuntabhai, Anavaj; Singhasivanon, Pratap; Sirima, Sodiomon; Tall, Adama; Taylor, Terrie E; Thera, Mahamadou; Troye-Blomberg, Marita; Williams, Thomas N; Wilson, Michael; Kwiatkowski, Dominic P

    2009-06-01

    We report a genome-wide association (GWA) study of severe malaria in The Gambia. The initial GWA scan included 2,500 children genotyped on the Affymetrix 500K GeneChip, and a replication study included 3,400 children. We used this to examine the performance of GWA methods in Africa. We found considerable population stratification, and also that signals of association at known malaria resistance loci were greatly attenuated owing to weak linkage disequilibrium (LD). To investigate possible solutions to the problem of low LD, we focused on the HbS locus, sequencing this region of the genome in 62 Gambian individuals and then using these data to conduct multipoint imputation in the GWA samples. This increased the signal of association, from P = 4 × 10(-7) to P = 4 × 10(-14), with the peak of the signal located precisely at the HbS causal variant. Our findings provide proof of principle that fine-resolution multipoint imputation, based on population-specific sequencing data, can substantially boost authentic GWA signals and enable fine mapping of causal variants in African populations.

  11. High-Resolution Genome-Wide Analysis of Irradiated (UV and γ-Rays) Diploid Yeast Cells Reveals a High Frequency of Genomic Loss of Heterozygosity (LOH) Events

    PubMed Central

    St. Charles, Jordan; Hazkani-Covo, Einat; Yin, Yi; Andersen, Sabrina L.; Dietrich, Fred S.; Greenwell, Patricia W.; Malc, Ewa; Mieczkowski, Piotr; Petes, Thomas D.

    2012-01-01

    In diploid eukaryotes, repair of double-stranded DNA breaks by homologous recombination often leads to loss of heterozygosity (LOH). Most previous studies of mitotic recombination in Saccharomyces cerevisiae have focused on a single chromosome or a single region of one chromosome at which LOH events can be selected. In this study, we used two techniques (single-nucleotide polymorphism microarrays and high-throughput DNA sequencing) to examine genome-wide LOH in a diploid yeast strain at a resolution averaging 1 kb. We examined both selected LOH events on chromosome V and unselected events throughout the genome in untreated cells and in cells treated with either γ-radiation or ultraviolet (UV) radiation. Our analysis shows the following: (1) spontaneous and damage-induced mitotic gene conversion tracts are more than three times larger than meiotic conversion tracts, and conversion tracts associated with crossovers are usually longer and more complex than those unassociated with crossovers; (2) most of the crossovers and conversions reflect the repair of two sister chromatids broken at the same position; and (3) both UV and γ-radiation efficiently induce LOH at doses of radiation that cause no significant loss of viability. Using high-throughput DNA sequencing, we also detected new mutations induced by γ-rays and UV. To our knowledge, our study represents the first high-resolution genome-wide analysis of DNA damage-induced LOH events performed in any eukaryote. PMID:22267500

  12. High-resolution genome-wide analysis of irradiated (UV and γ-rays) diploid yeast cells reveals a high frequency of genomic loss of heterozygosity (LOH) events.

    PubMed

    St Charles, Jordan; Hazkani-Covo, Einat; Yin, Yi; Andersen, Sabrina L; Dietrich, Fred S; Greenwell, Patricia W; Malc, Ewa; Mieczkowski, Piotr; Petes, Thomas D

    2012-04-01

    In diploid eukaryotes, repair of double-stranded DNA breaks by homologous recombination often leads to loss of heterozygosity (LOH). Most previous studies of mitotic recombination in Saccharomyces cerevisiae have focused on a single chromosome or a single region of one chromosome at which LOH events can be selected. In this study, we used two techniques (single-nucleotide polymorphism microarrays and high-throughput DNA sequencing) to examine genome-wide LOH in a diploid yeast strain at a resolution averaging 1 kb. We examined both selected LOH events on chromosome V and unselected events throughout the genome in untreated cells and in cells treated with either γ-radiation or ultraviolet (UV) radiation. Our analysis shows the following: (1) spontaneous and damage-induced mitotic gene conversion tracts are more than three times larger than meiotic conversion tracts, and conversion tracts associated with crossovers are usually longer and more complex than those unassociated with crossovers; (2) most of the crossovers and conversions reflect the repair of two sister chromatids broken at the same position; and (3) both UV and γ-radiation efficiently induce LOH at doses of radiation that cause no significant loss of viability. Using high-throughput DNA sequencing, we also detected new mutations induced by γ-rays and UV. To our knowledge, our study represents the first high-resolution genome-wide analysis of DNA damage-induced LOH events performed in any eukaryote.

  13. The interaction of high-resolution electrophoresis and computational analysis in genome mapping

    SciTech Connect

    Carrano, A.V.; Branscomb, E.W.; de Jong, P.J.; Mohrenweiser, H.; Olsen, A.; Slezak, T.

    1990-07-26

    The construction of physical maps and the determination of the DNA sequence of chromosome-size segments of the human genome is a complex, multidisciplinary undertaking. The approach we have taken to construct a physical map and sequence of human chromosome 19 typifies these interactions. We exploit the power of both acrylamide and agarose gel electrophoresis to provide a simple and versatile method for DNA fingerprinting and the creation of contigs or sets of overlapping genomic clones. Cosmid libraries are constructed from Yeast Artificial Chromosomes (YAC) clones or from flow-sorted chromosomes. Cosmid DNA isolated from the screened library array is cut with a combination of five restriction enzymes and the fragment ends labeled with one of four different fluorochromes. Our approach to contig construction uses a robotic system to label restriction fragments from cosmids with fluorochromes, use of an automated DNA sequencer to capture fragment mobility data in a high resolution multiplex mode processes the mobility data to determine fragment length and provide a statistical measure of overlap among cosmids; and display the contigs and underlying cosmids for operator interaction and access to a database. Data analyses and interactions are conducted over a network of SUN workstations using a set of software tools that we developed and coupled to a commercially available database. Applying these methods, we have analyzed 5154 cosmid clones and assembled 515 contigs for chromosome 19. Some of these contigs have been identified with known genes and many have been mapped to the chromosome by fluorescence in situ hybridization. Existing contigs are being extended by a combination of walking and fingerprinting. 21 refs., 2 figs.

  14. High Resolution Melt (HRM) analysis is an efficient tool to genotype EMS mutants in complex crop genomes

    PubMed Central

    2011-01-01

    Background Targeted Induced Loci Lesions IN Genomes (TILLING) is increasingly being used to generate and identify mutations in target genes of crop genomes. TILLING populations of several thousand lines have been generated in a number of crop species including Brassica rapa. Genetic analysis of mutants identified by TILLING requires an efficient, high-throughput and cost effective genotyping method to track the mutations through numerous generations. High resolution melt (HRM) analysis has been used in a number of systems to identify single nucleotide polymorphisms (SNPs) and insertion/deletions (IN/DELs) enabling the genotyping of different types of samples. HRM is ideally suited to high-throughput genotyping of multiple TILLING mutants in complex crop genomes. To date it has been used to identify mutants and genotype single mutations. The aim of this study was to determine if HRM can facilitate downstream analysis of multiple mutant lines identified by TILLING in order to characterise allelic series of EMS induced mutations in target genes across a number of generations in complex crop genomes. Results We demonstrate that HRM can be used to genotype allelic series of mutations in two genes, BraA.CAX1a and BraA.MET1.a in Brassica rapa. We analysed 12 mutations in BraA.CAX1.a and five in BraA.MET1.a over two generations including a back-cross to the wild-type. Using a commercially available HRM kit and the Lightscanner™ system we were able to detect mutations in heterozygous and homozygous states for both genes. Conclusions Using HRM genotyping on TILLING derived mutants, it is possible to generate an allelic series of mutations within multiple target genes rapidly. Lines suitable for phenotypic analysis can be isolated approximately 8-9 months (3 generations) from receiving M3 seed of Brassica rapa from the RevGenUK TILLING service. PMID:22152063

  15. A 1.5-Mb-resolution radiation hybrid map of the cat genome and comparative analysis with the canine and human genomes.

    PubMed

    Murphy, William J; Davis, Brian; David, Victor A; Agarwala, Richa; Schäffer, Alejandro A; Pearks Wilkerson, Alison J; Neelam, Beena; O'Brien, Stephen J; Menotti-Raymond, Marilyn

    2007-02-01

    We report the construction of a 1.5-Mb-resolution radiation hybrid map of the domestic cat genome. This new map includes novel microsatellite loci and markers derived from the 2X genome sequence that target previous gaps in the feline-human comparative map. Ninety-six percent of the 1793 cat markers we mapped have identifiable orthologues in the canine and human genome sequences. The updated autosomal and X-chromosome comparative maps identify 152 cat-human and 134 cat-dog homologous synteny blocks. Comparative analysis shows the marked change in chromosomal evolution in the canid lineage relative to the felid lineage since divergence from their carnivoran ancestor. The canid lineage has a 30-fold difference in the number of interchromosomal rearrangements relative to felids, while the felid lineage has primarily undergone intrachromosomal rearrangements. We have also refined the pseudoautosomal region and boundary in the cat and show that it is markedly longer than those of human or mouse. This improved RH comparative map provides a useful tool to facilitate positional cloning studies in the feline model.

  16. A high-resolution genomic analysis of multidrug-resistant hospital outbreaks of Klebsiella pneumoniae

    PubMed Central

    Chung The, Hao; Karkey, Abhilasha; Pham Thanh, Duy; Boinett, Christine J; Cain, Amy K; Ellington, Matthew; Baker, Kate S; Dongol, Sabina; Thompson, Corinne; Harris, Simon R; Jombart, Thibaut; Le Thi Phuong, Tu; Tran Do Hoang, Nhu; Ha Thanh, Tuyen; Shretha, Shrijana; Joshi, Suchita; Basnyat, Buddha; Thwaites, Guy; Thomson, Nicholas R; Rabaa, Maia A; Baker, Stephen

    2015-01-01

    Multidrug-resistant (MDR) Klebsiella pneumoniae has become a leading cause of nosocomial infections worldwide. Despite its prominence, little is known about the genetic diversity of K. pneumoniae in resource-poor hospital settings. Through whole-genome sequencing (WGS), we reconstructed an outbreak of MDR K. pneumoniae occurring on high-dependency wards in a hospital in Kathmandu during 2012 with a case-fatality rate of 75%. The WGS analysis permitted the identification of two MDR K. pneumoniae lineages causing distinct outbreaks within the complex endemic K. pneumoniae. Using phylogenetic reconstruction and lineage-specific PCR, our data predicted a scenario in which K. pneumoniae, circulating for 6 months before the outbreak, underwent a series of ward-specific clonal expansions after the acquisition of genes facilitating virulence and MDR. We suggest that the early detection of a specific NDM-1 containing lineage in 2011 would have alerted the high-dependency ward staff to intervene. We argue that some form of real-time genetic characterisation, alongside clade-specific PCR during an outbreak, should be factored into future healthcare infection control practices in both high- and low-income settings. PMID:25712531

  17. A high-resolution genomic analysis of multidrug-resistant hospital outbreaks of Klebsiella pneumoniae.

    PubMed

    Chung The, Hao; Karkey, Abhilasha; Pham Thanh, Duy; Boinett, Christine J; Cain, Amy K; Ellington, Matthew; Baker, Kate S; Dongol, Sabina; Thompson, Corinne; Harris, Simon R; Jombart, Thibaut; Le Thi Phuong, Tu; Tran Do Hoang, Nhu; Ha Thanh, Tuyen; Shretha, Shrijana; Joshi, Suchita; Basnyat, Buddha; Thwaites, Guy; Thomson, Nicholas R; Rabaa, Maia A; Baker, Stephen

    2015-03-01

    Multidrug-resistant (MDR) Klebsiella pneumoniae has become a leading cause of nosocomial infections worldwide. Despite its prominence, little is known about the genetic diversity of K. pneumoniae in resource-poor hospital settings. Through whole-genome sequencing (WGS), we reconstructed an outbreak of MDR K. pneumoniae occurring on high-dependency wards in a hospital in Kathmandu during 2012 with a case-fatality rate of 75%. The WGS analysis permitted the identification of two MDR K. pneumoniae lineages causing distinct outbreaks within the complex endemic K. pneumoniae. Using phylogenetic reconstruction and lineage-specific PCR, our data predicted a scenario in which K. pneumoniae, circulating for 6 months before the outbreak, underwent a series of ward-specific clonal expansions after the acquisition of genes facilitating virulence and MDR. We suggest that the early detection of a specific NDM-1 containing lineage in 2011 would have alerted the high-dependency ward staff to intervene. We argue that some form of real-time genetic characterisation, alongside clade-specific PCR during an outbreak, should be factored into future healthcare infection control practices in both high- and low-income settings.

  18. High-resolution genomic analysis suggests the absence of recurrent genomic alterations other than SMARCB1 aberrations in atypical teratoid/rhabdoid tumors.

    PubMed

    Hasselblatt, Martin; Isken, Sarah; Linge, Anna; Eikmeier, Kristin; Jeibmann, Astrid; Oyen, Florian; Nagel, Inga; Richter, Julia; Bartelheim, Kerstin; Kordes, Uwe; Schneppenheim, Reinhard; Frühwald, Michael; Siebert, Reiner; Paulus, Werner

    2013-02-01

    Atypical teratoid/rhabdoid tumor (AT/RT) is a rare malignant pediatric brain tumor characterized by genetic alterations affecting the SMARCB1 (hSNF5/INI1) locus in chromosome band 22q11.2. To identify potential additional genetic alterations, high-resolution genome-wide analysis was performed using a molecular inversion probe single-nucleotide polymorphism (MIP SNP) assay (Affymetrix OncoScan formalin-fixed paraffin-embedded express) on DNA isolated from 18 formalin-fixed paraffin-embedded archival samples. Alterations affecting the SMARCB1 locus could be demonstrated by MIP SNP in 15 out of 16 evaluable cases (94%). These comprised five tumors with homozygous deletions, six tumors with heterozygous deletions, and four tumors with copy number neutral loss of heterozygosity (LOH) involving chromosome band 22q11.2. Remarkably, MIB SNP analysis did not yield any further recurrent chromosomal gains, losses, or copy neutral LOH. On MIP SNP screening for somatic mutations, the presence of a SMARCB1 mutation (c.472C>T p.R158X) was confirmed, but no recurrent mutations of other cancer relevant genes could be identified. Results of fluorescence in situ hybridization, multiplex ligation-dependent probe amplification, and SMARCB1 sequencing were highly congruent with that of the MIP SNP assay. In conclusion, these data further suggest the absence of recurrent genomic alterations other than SMARCB1 in AT/RT. Copyright © 2012 Wiley Periodicals, Inc.

  19. Analysis of complete mitochondrial genome sequences increases phylogenetic resolution of bears (Ursidae), a mammalian family that experienced rapid speciation.

    PubMed

    Yu, Li; Li, Yi-Wei; Ryder, Oliver A; Zhang, Ya-Ping

    2007-10-24

    Despite the small number of ursid species, bear phylogeny has long been a focus of study due to their conservation value, as all bear genera have been classified as endangered at either the species or subspecies level. The Ursidae family represents a typical example of rapid evolutionary radiation. Previous analyses with a single mitochondrial (mt) gene or a small number of mt genes either provide weak support or a large unresolved polytomy for ursids. We revisit the contentious relationships within Ursidae by analyzing complete mt genome sequences and evaluating the performance of both entire mt genomes and constituent mtDNA genes in recovering a phylogeny of extremely recent speciation events. This mitochondrial genome-based phylogeny provides strong evidence that the spectacled bear diverged first, while within the genus Ursus, the sloth bear is the sister taxon of all the other five ursines. The latter group is divided into the brown bear/polar bear and the two black bears/sun bear assemblages. These findings resolve the previous conflicts between trees using partial mt genes. The ability of different categories of mt protein coding genes to recover the correct phylogeny is concordant with previous analyses for taxa with deep divergence times. This study provides a robust Ursidae phylogenetic framework for future validation by additional independent evidence, and also has significant implications for assisting in the resolution of other similarly difficult phylogenetic investigations. Identification of base composition bias and utilization of the combined data of whole mitochondrial genome sequences has allowed recovery of a strongly supported phylogeny that is upheld when using multiple alternative outgroups for the Ursidae, a mammalian family that underwent a rapid radiation since the mid- to late Pliocene. It remains to be seen if the reliability of mt genome analysis will hold up in studies of other difficult phylogenetic issues. Although the whole

  20. Analysis of complete mitochondrial genome sequences increases phylogenetic resolution of bears (Ursidae), a mammalian family that experienced rapid speciation

    PubMed Central

    Yu, Li; Li, Yi-Wei; Ryder, Oliver A; Zhang, Ya-Ping

    2007-01-01

    Background Despite the small number of ursid species, bear phylogeny has long been a focus of study due to their conservation value, as all bear genera have been classified as endangered at either the species or subspecies level. The Ursidae family represents a typical example of rapid evolutionary radiation. Previous analyses with a single mitochondrial (mt) gene or a small number of mt genes either provide weak support or a large unresolved polytomy for ursids. We revisit the contentious relationships within Ursidae by analyzing complete mt genome sequences and evaluating the performance of both entire mt genomes and constituent mtDNA genes in recovering a phylogeny of extremely recent speciation events. Results This mitochondrial genome-based phylogeny provides strong evidence that the spectacled bear diverged first, while within the genus Ursus, the sloth bear is the sister taxon of all the other five ursines. The latter group is divided into the brown bear/polar bear and the two black bears/sun bear assemblages. These findings resolve the previous conflicts between trees using partial mt genes. The ability of different categories of mt protein coding genes to recover the correct phylogeny is concordant with previous analyses for taxa with deep divergence times. This study provides a robust Ursidae phylogenetic framework for future validation by additional independent evidence, and also has significant implications for assisting in the resolution of other similarly difficult phylogenetic investigations. Conclusion Identification of base composition bias and utilization of the combined data of whole mitochondrial genome sequences has allowed recovery of a strongly supported phylogeny that is upheld when using multiple alternative outgroups for the Ursidae, a mammalian family that underwent a rapid radiation since the mid- to late Pliocene. It remains to be seen if the reliability of mt genome analysis will hold up in studies of other difficult phylogenetic

  1. Functionally-focused algorithmic analysis of high resolution microarray-CGH genomic landscapes demonstrates comparable genomic copy number aberrations in MSI and MSS sporadic colorectal cancer

    PubMed Central

    Ali, Hamad; Bitar, Milad S.; Al Madhoun, Ashraf; Marafie, Makia; Al-Mulla, Fahd

    2017-01-01

    Array-based comparative genomic hybridization (aCGH) emerged as a powerful technology for studying copy number variations at higher resolution in many cancers including colorectal cancer. However, the lack of standardized systematic protocols including bioinformatic algorithms to obtain and analyze genomic data resulted in significant variation in the reported copy number aberration (CNA) data. Here, we present genomic aCGH data obtained using highly stringent and functionally relevant statistical algorithms from 116 well-defined microsatellites instable (MSI) and microsatellite stable (MSS) colorectal cancers. We utilized aCGH to characterize genomic CNAs in 116 well-defined sets of colorectal cancer (CRC) cases. We further applied the significance testing for aberrant copy number (STAC) and Genomic Identification of Significant Targets in Cancer (GISTIC) algorithms to identify functionally relevant (nonrandom) chromosomal aberrations in the analyzed colorectal cancer samples. Our results produced high resolution genomic landscapes of both, MSI and MSS sporadic CRC. We found that CNAs in MSI and MSS CRCs are heterogeneous in nature but may be divided into 3 distinct genomic patterns. Moreover, we show that although CNAs in MSI and MSS CRCs differ with respect to their size, number and chromosomal distribution, the functional copy number aberrations obtained from MSI and MSS CRCs were in fact comparable but not identical. These unifying CNAs were verified by MLPA tumor-loss gene panel, which spans 15 different chromosomal locations and contains 50 probes for at least 20 tumor suppressor genes. Consistently, deletion/amplification in these frequently cancer altered genes were identical in MSS and MSI CRCs. Our results suggest that MSI and MSS copy number aberrations driving CRC may be functionally comparable. PMID:28231327

  2. Functionally-focused algorithmic analysis of high resolution microarray-CGH genomic landscapes demonstrates comparable genomic copy number aberrations in MSI and MSS sporadic colorectal cancer.

    PubMed

    Ali, Hamad; Bitar, Milad S; Al Madhoun, Ashraf; Marafie, Makia; Al-Mulla, Fahd

    2017-01-01

    Array-based comparative genomic hybridization (aCGH) emerged as a powerful technology for studying copy number variations at higher resolution in many cancers including colorectal cancer. However, the lack of standardized systematic protocols including bioinformatic algorithms to obtain and analyze genomic data resulted in significant variation in the reported copy number aberration (CNA) data. Here, we present genomic aCGH data obtained using highly stringent and functionally relevant statistical algorithms from 116 well-defined microsatellites instable (MSI) and microsatellite stable (MSS) colorectal cancers. We utilized aCGH to characterize genomic CNAs in 116 well-defined sets of colorectal cancer (CRC) cases. We further applied the significance testing for aberrant copy number (STAC) and Genomic Identification of Significant Targets in Cancer (GISTIC) algorithms to identify functionally relevant (nonrandom) chromosomal aberrations in the analyzed colorectal cancer samples. Our results produced high resolution genomic landscapes of both, MSI and MSS sporadic CRC. We found that CNAs in MSI and MSS CRCs are heterogeneous in nature but may be divided into 3 distinct genomic patterns. Moreover, we show that although CNAs in MSI and MSS CRCs differ with respect to their size, number and chromosomal distribution, the functional copy number aberrations obtained from MSI and MSS CRCs were in fact comparable but not identical. These unifying CNAs were verified by MLPA tumor-loss gene panel, which spans 15 different chromosomal locations and contains 50 probes for at least 20 tumor suppressor genes. Consistently, deletion/amplification in these frequently cancer altered genes were identical in MSS and MSI CRCs. Our results suggest that MSI and MSS copy number aberrations driving CRC may be functionally comparable.

  3. Xenopus tropicalis Genome Re-Scaffolding and Re-Annotation Reach the Resolution Required for In Vivo ChIA-PET Analysis

    PubMed Central

    Buisine, Nicolas; Ruan, Xiaoan; Bilesimo, Patrice; Grimaldi, Alexis; Alfama, Gladys; Ariyaratne, Pramila; Mulawadi, Fabianus; Chen, Jieqi; Sung, Wing-Kin; Liu, Edison T.; Demeneix, Barbara A.; Ruan, Yijun; Sachs, Laurent M.

    2015-01-01

    Genome-wide functional analyses require high-resolution genome assembly and annotation. We applied ChIA-PET to analyze gene regulatory networks, including 3D chromosome interactions, underlying thyroid hormone (TH) signaling in the frog Xenopus tropicalis. As the available versions of Xenopus tropicalis assembly and annotation lacked the resolution required for ChIA-PET we improve the genome assembly version 4.1 and annotations using data derived from the paired end tag (PET) sequencing technologies and approaches (e.g., DNA-PET [gPET], RNA-PET etc.). The large insert (~10Kb, ~17Kb) paired end DNA-PET with high throughput NGS sequencing not only significantly improved genome assembly quality, but also strongly reduced genome “fragmentation”, reducing total scaffold numbers by ~60%. Next, RNA-PET technology, designed and developed for the detection of full-length transcripts and fusion mRNA in whole transcriptome studies (ENCODE consortia), was applied to capture the 5' and 3' ends of transcripts. These amendments in assembly and annotation were essential prerequisites for the ChIA-PET analysis of TH transcription regulation. Their application revealed complex regulatory configurations of target genes and the structures of the regulatory networks underlying physiological responses. Our work allowed us to improve the quality of Xenopus tropicalis genomic resources, reaching the standard required for ChIA-PET analysis of transcriptional networks. We consider that the workflow proposed offers useful conceptual and methodological guidance and can readily be applied to other non-conventional models that have low-resolution genome data. PMID:26348928

  4. Xenopus tropicalis Genome Re-Scaffolding and Re-Annotation Reach the Resolution Required for In Vivo ChIA-PET Analysis.

    PubMed

    Buisine, Nicolas; Ruan, Xiaoan; Bilesimo, Patrice; Grimaldi, Alexis; Alfama, Gladys; Ariyaratne, Pramila; Mulawadi, Fabianus; Chen, Jieqi; Sung, Wing-Kin; Liu, Edison T; Demeneix, Barbara A; Ruan, Yijun; Sachs, Laurent M

    2015-01-01

    Genome-wide functional analyses require high-resolution genome assembly and annotation. We applied ChIA-PET to analyze gene regulatory networks, including 3D chromosome interactions, underlying thyroid hormone (TH) signaling in the frog Xenopus tropicalis. As the available versions of Xenopus tropicalis assembly and annotation lacked the resolution required for ChIA-PET we improve the genome assembly version 4.1 and annotations using data derived from the paired end tag (PET) sequencing technologies and approaches (e.g., DNA-PET [gPET], RNA-PET etc.). The large insert (~10 Kb, ~17 Kb) paired end DNA-PET with high throughput NGS sequencing not only significantly improved genome assembly quality, but also strongly reduced genome "fragmentation", reducing total scaffold numbers by ~60%. Next, RNA-PET technology, designed and developed for the detection of full-length transcripts and fusion mRNA in whole transcriptome studies (ENCODE consortia), was applied to capture the 5' and 3' ends of transcripts. These amendments in assembly and annotation were essential prerequisites for the ChIA-PET analysis of TH transcription regulation. Their application revealed complex regulatory configurations of target genes and the structures of the regulatory networks underlying physiological responses. Our work allowed us to improve the quality of Xenopus tropicalis genomic resources, reaching the standard required for ChIA-PET analysis of transcriptional networks. We consider that the workflow proposed offers useful conceptual and methodological guidance and can readily be applied to other non-conventional models that have low-resolution genome data.

  5. Combined Analysis of Variation in Core, Accessory and Regulatory Genome Regions Provides a Super-Resolution View into the Evolution of Bacterial Populations

    PubMed Central

    McNally, Alan; Oren, Yaara; Kelly, Darren; Sreecharan, Tristan; Vehkala, Minna; Välimäki, Niko; Prentice, Michael B.; Ashour, Amgad; Avram, Oren; Pupko, Tal; Literak, Ivan; Guenther, Sebastian; Schaufler, Katharina; Wieler, Lothar H.; Zhiyong, Zong; Sheppard, Samuel K.; Corander, Jukka

    2016-01-01

    The use of whole-genome phylogenetic analysis has revolutionized our understanding of the evolution and spread of many important bacterial pathogens due to the high resolution view it provides. However, the majority of such analyses do not consider the potential role of accessory genes when inferring evolutionary trajectories. Moreover, the recently discovered importance of the switching of gene regulatory elements suggests that an exhaustive analysis, combining information from core and accessory genes with regulatory elements could provide unparalleled detail of the evolution of a bacterial population. Here we demonstrate this principle by applying it to a worldwide multi-host sample of the important pathogenic E. coli lineage ST131. Our approach reveals the existence of multiple circulating subtypes of the major drug–resistant clade of ST131 and provides the first ever population level evidence of core genome substitutions in gene regulatory regions associated with the acquisition and maintenance of different accessory genome elements. PMID:27618184

  6. Genome-wide SNP identification for the construction of a high-resolution genetic map of Japanese flounder (Paralichthys olivaceus): applications to QTL mapping of Vibrio anguillarum disease resistance and comparative genomic analysis

    PubMed Central

    Shao, Changwei; Niu, Yongchao; Rastas, Pasi; Liu, Yang; Xie, Zhiyuan; Li, Hengde; Wang, Lei; Jiang, Yong; Tai, Shuaishuai; Tian, Yongsheng; Sakamoto, Takashi; Chen, Songlin

    2015-01-01

    High-resolution genetic maps are essential for fine mapping of complex traits, genome assembly, and comparative genomic analysis. Single-nucleotide polymorphisms (SNPs) are the primary molecular markers used for genetic map construction. In this study, we identified 13,362 SNPs evenly distributed across the Japanese flounder (Paralichthys olivaceus) genome. Of these SNPs, 12,712 high-confidence SNPs were subjected to high-throughput genotyping and assigned to 24 consensus linkage groups (LGs). The total length of the genetic linkage map was 3,497.29 cM with an average distance of 0.47 cM between loci, thereby representing the densest genetic map currently reported for Japanese flounder. Nine positive quantitative trait loci (QTLs) forming two main clusters for Vibrio anguillarum disease resistance were detected. All QTLs could explain 5.1–8.38% of the total phenotypic variation. Synteny analysis of the QTL regions on the genome assembly revealed 12 immune-related genes, among them 4 genes strongly associated with V. anguillarum disease resistance. In addition, 246 genome assembly scaffolds with an average size of 21.79 Mb were anchored onto the LGs; these scaffolds, comprising 522.99 Mb, represented 95.78% of assembled genomic sequences. The mapped assembly scaffolds in Japanese flounder were used for genome synteny analyses against zebrafish (Danio rerio) and medaka (Oryzias latipes). Flounder and medaka were found to possess almost one-to-one synteny, whereas flounder and zebrafish exhibited a multi-syntenic correspondence. The newly developed high-resolution genetic map, which will facilitate QTL mapping, scaffold assembly, and genome synteny analysis of Japanese flounder, marks a milestone in the ongoing genome project for this species. PMID:25762582

  7. Genome-wide SNP identification for the construction of a high-resolution genetic map of Japanese flounder (Paralichthys olivaceus): applications to QTL mapping of Vibrio anguillarum disease resistance and comparative genomic analysis.

    PubMed

    Shao, Changwei; Niu, Yongchao; Rastas, Pasi; Liu, Yang; Xie, Zhiyuan; Li, Hengde; Wang, Lei; Jiang, Yong; Tai, Shuaishuai; Tian, Yongsheng; Sakamoto, Takashi; Chen, Songlin

    2015-04-01

    High-resolution genetic maps are essential for fine mapping of complex traits, genome assembly, and comparative genomic analysis. Single-nucleotide polymorphisms (SNPs) are the primary molecular markers used for genetic map construction. In this study, we identified 13,362 SNPs evenly distributed across the Japanese flounder (Paralichthys olivaceus) genome. Of these SNPs, 12,712 high-confidence SNPs were subjected to high-throughput genotyping and assigned to 24 consensus linkage groups (LGs). The total length of the genetic linkage map was 3,497.29 cM with an average distance of 0.47 cM between loci, thereby representing the densest genetic map currently reported for Japanese flounder. Nine positive quantitative trait loci (QTLs) forming two main clusters for Vibrio anguillarum disease resistance were detected. All QTLs could explain 5.1-8.38% of the total phenotypic variation. Synteny analysis of the QTL regions on the genome assembly revealed 12 immune-related genes, among them 4 genes strongly associated with V. anguillarum disease resistance. In addition, 246 genome assembly scaffolds with an average size of 21.79 Mb were anchored onto the LGs; these scaffolds, comprising 522.99 Mb, represented 95.78% of assembled genomic sequences. The mapped assembly scaffolds in Japanese flounder were used for genome synteny analyses against zebrafish (Danio rerio) and medaka (Oryzias latipes). Flounder and medaka were found to possess almost one-to-one synteny, whereas flounder and zebrafish exhibited a multi-syntenic correspondence. The newly developed high-resolution genetic map, which will facilitate QTL mapping, scaffold assembly, and genome synteny analysis of Japanese flounder, marks a milestone in the ongoing genome project for this species. © The Author 2015. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  8. Ultra High-Resolution Gene Centric Genomic Structural Analysis of a Non-Syndromic Congenital Heart Defect, Tetralogy of Fallot

    PubMed Central

    Bittel, Douglas C.; Zhou, Xin-Gang; Kibiryeva, Nataliya; Fiedler, Stephanie; O’Brien, James E.; Marshall, Jennifer; Yu, Shihui; Liu, Hong-Yu

    2014-01-01

    Tetralogy of Fallot (TOF) is one of the most common severe congenital heart malformations. Great progress has been made in identifying key genes that regulate heart development, yet approximately 70% of TOF cases are sporadic and nonsyndromic with no known genetic cause. We created an ultra high-resolution gene centric comparative genomic hybridization (gcCGH) microarray based on 591 genes with a validated association with cardiovascular development or function. We used our gcCGH array to analyze the genomic structure of 34 infants with sporadic TOF without a deletion on chromosome 22q11.2 (n male = 20; n female = 14; age range of 2 to 10 months). Using our custom-made gcCGH microarray platform, we identified a total of 613 copy number variations (CNVs) ranging in size from 78 base pairs to 19.5 Mb. We identified 16 subjects with 33 CNVs that contained 13 different genes which are known to be directly associated with heart development. Additionally, there were 79 genes from the broader list of genes that were partially or completely contained in a CNV. All 34 individuals examined had at least one CNV involving these 79 genes. Furthermore, we had available whole genome exon arrays from right ventricular tissue in 13 of our subjects. We analyzed these for correlations between copy number and gene expression level. Surprisingly, we could detect only one clear association between CNVs and expression (GSTT1) for any of the 591 focal genes on the gcCGH array. The expression levels of GSTT1 were correlated with copy number in all cases examined (r = 0.95, p = 0.001). We identified a large number of small CNVs in genes with varying associations with heart development. Our results illustrate the complexity of human genome structural variation and underscore the need for multifactorial assessment of potential genetic/genomic factors that contribute to congenital heart defects. PMID:24498113

  9. Construction of high-resolution genetic maps of Zoysia matrella (L.) Merrill and applications to comparative genomic analysis and QTL mapping of resistance to fall armyworm.

    PubMed

    Huang, Xiaoen; Wang, Fangfang; Singh, Ratnesh; Reinert, James A; Engelke, M C; Genovesi, Anthony D; Chandra, Ambika; Yu, Qingyi

    2016-08-08

    Zoysia matrella, widely used in lawns and sports fields, is of great economic and ecological value. Z. matrella is an allotetraploid species (2n = 4x = 40) in the genus zoysia under the subfamily Chloridoideae. Despite its ecological impacts and economic importance, the subfamily Chloridoideae has received little attention in genomics studies. As a result, limited genetic and genomic information are available for this subfamily, which have impeded progress in understanding evolutionary history of grasses in this important lineage. The lack of a high-resolution genetic map has hampered efforts to improve zoysiagrass using molecular genetic tools. We used restriction site-associated DNA sequencing (RADSeq) approach and a segregating population developed from the cross between Z. matrella cultivars 'Diamond' and 'Cavalier' to construct high-resolution genetic maps of Z. matrella. The genetic map of Diamond consists of 2,375 Single Nucleotide Polymorphism (SNP) markers mapped on 20 linkage groups (LGs) with a total length of 1754.48 cM and an average distance between adjacent markers at 0.74 cM. The genetic map of Cavalier contains 3,563 SNP markers on 20 LGs, covering 1824.92 cM, with an average distance between adjacent markers at 0.51 cM. A higher level of genome collinearity between Z. matrella and rice than that between Z. matrella and sorghum was revealed by comparative genomic analysis. Pairwise comparison revealed that two independent nested chromosome fusion events occurred after Z. matrella and sorghum split from a common ancestor. The high-resolution linkage maps were applied into mapping QTLs associated with fall armyworm (FAW) resistance and six loci located on LGs 8 and 20 were detected to be significantly associated with FAW resistance. The high-resolution linkage maps provide anchor points for comparative genomics analysis between Z. matrella and other grass species. Our comparative genomic analysis suggested that the chromosome number

  10. Whole-genome analysis of 5-hydroxymethylcytosine and 5-methylcytosine at base resolution in the human brain

    PubMed Central

    2014-01-01

    Background 5-methylcytosine (mC) can be oxidized by the tet methylcytosine dioxygenase (Tet) family of enzymes to 5-hydroxymethylcytosine (hmC), which is an intermediate of mC demethylation and may also be a stable epigenetic modification that influences chromatin structure. hmC is particularly abundant in mammalian brains but its function is currently unknown. A high-resolution hydroxymethylome map is required to fully understand the function of hmC in the human brain. Results We present genome-wide and single-base resolution maps of hmC and mC in the human brain by combined application of Tet-assisted bisulfite sequencing and bisulfite sequencing. We demonstrate that hmCs increase markedly from the fetal to the adult stage, and in the adult brain, 13% of all CpGs are highly hydroxymethylated with strong enrichment at genic regions and distal regulatory elements. Notably, hmC peaks are identified at the 5′splicing sites at the exon-intron boundary, suggesting a mechanistic link between hmC and splicing. We report a surprising transcription-correlated hmC bias toward the sense strand and an mC bias toward the antisense strand of gene bodies. Furthermore, hmC is negatively correlated with H3K27me3-marked and H3K9me3-marked repressive genomic regions, and is more enriched at poised enhancers than active enhancers. Conclusions We provide single-base resolution hmC and mC maps in the human brain and our data imply novel roles of hmC in regulating splicing and gene expression. Hydroxymethylation is the main modification status for a large portion of CpGs situated at poised enhancers and actively transcribed regions, suggesting its roles in epigenetic tuning at these regions. PMID:24594098

  11. Integrative analysis of genome-wide loss of heterozygosity and monoallelic expression at nucleotide resolution reveals disrupted pathways in triple-negative breast cancer.

    PubMed

    Ha, Gavin; Roth, Andrew; Lai, Daniel; Bashashati, Ali; Ding, Jiarui; Goya, Rodrigo; Giuliany, Ryan; Rosner, Jamie; Oloumi, Arusha; Shumansky, Karey; Chin, Suet-Feung; Turashvili, Gulisa; Hirst, Martin; Caldas, Carlos; Marra, Marco A; Aparicio, Samuel; Shah, Sohrab P

    2012-10-01

    Loss of heterozygosity (LOH) and copy number alteration (CNA) feature prominently in the somatic genomic landscape of tumors. As such, karyotypic aberrations in cancer genomes have been studied extensively to discover novel oncogenes and tumor-suppressor genes. Advances in sequencing technology have enabled the cost-effective detection of tumor genome and transcriptome mutation events at single-base-pair resolution; however, computational methods for predicting segmental regions of LOH in this context are not yet fully explored. Consequently, whole transcriptome, nucleotide-level resolution analysis of monoallelic expression patterns associated with LOH has not yet been undertaken in cancer. We developed a novel approach for inference of LOH from paired tumor/normal sequence data and applied it to a cohort of 23 triple-negative breast cancer (TNBC) genomes. Following extensive benchmarking experiments, we describe the nucleotide-resolution landscape of LOH in TNBC and assess the consequent effect of LOH on the transcriptomes of these tumors using RNA-seq-derived measurements of allele-specific expression. We show that the majority of monoallelic expression in the transcriptomes of triple-negative breast cancer can be explained by genomic regions of LOH and establish an upper bound for monoallelic expression that may be explained by other tumor-specific modifications such as epigenetics or mutations. Monoallelically expressed genes associated with LOH reveal that cell cycle, homologous recombination and actin-cytoskeletal functions are putatively disrupted by LOH in TNBC. Finally, we show how inference of LOH can be used to interpret allele frequencies of somatic mutations and postulate on temporal ordering of mutations in the evolutionary history of these tumors.

  12. Integrative analysis of genome-wide loss of heterozygosity and monoallelic expression at nucleotide resolution reveals disrupted pathways in triple-negative breast cancer

    PubMed Central

    Ha, Gavin; Roth, Andrew; Lai, Daniel; Bashashati, Ali; Ding, Jiarui; Goya, Rodrigo; Giuliany, Ryan; Rosner, Jamie; Oloumi, Arusha; Shumansky, Karey; Chin, Suet-Feung; Turashvili, Gulisa; Hirst, Martin; Caldas, Carlos; Marra, Marco A.; Aparicio, Samuel; Shah, Sohrab P.

    2012-01-01

    Loss of heterozygosity (LOH) and copy number alteration (CNA) feature prominently in the somatic genomic landscape of tumors. As such, karyotypic aberrations in cancer genomes have been studied extensively to discover novel oncogenes and tumor-suppressor genes. Advances in sequencing technology have enabled the cost-effective detection of tumor genome and transcriptome mutation events at single-base-pair resolution; however, computational methods for predicting segmental regions of LOH in this context are not yet fully explored. Consequently, whole transcriptome, nucleotide-level resolution analysis of monoallelic expression patterns associated with LOH has not yet been undertaken in cancer. We developed a novel approach for inference of LOH from paired tumor/normal sequence data and applied it to a cohort of 23 triple-negative breast cancer (TNBC) genomes. Following extensive benchmarking experiments, we describe the nucleotide-resolution landscape of LOH in TNBC and assess the consequent effect of LOH on the transcriptomes of these tumors using RNA-seq-derived measurements of allele-specific expression. We show that the majority of monoallelic expression in the transcriptomes of triple-negative breast cancer can be explained by genomic regions of LOH and establish an upper bound for monoallelic expression that may be explained by other tumor-specific modifications such as epigenetics or mutations. Monoallelically expressed genes associated with LOH reveal that cell cycle, homologous recombination and actin-cytoskeletal functions are putatively disrupted by LOH in TNBC. Finally, we show how inference of LOH can be used to interpret allele frequencies of somatic mutations and postulate on temporal ordering of mutations in the evolutionary history of these tumors. PMID:22637570

  13. Identification of genomic aberrations associated with disease transformation by means of high-resolution SNP array analysis in patients with myeloproliferative neoplasm.

    PubMed

    Rumi, Elisa; Harutyunyan, Ashot; Elena, Chiara; Pietra, Daniela; Klampfl, Thorsten; Bagienski, Klaudia; Berg, Tiina; Casetti, Ilaria; Pascutto, Cristiana; Passamonti, Francesco; Kralovics, Robert; Cazzola, Mario

    2011-12-01

    Myeloproliferative neoplasms (MPN) include polycythemia vera (PV), essential thrombocythemia (ET), and primary myelofibrosis (PMF). These disorders may undergo phenotypic shifts, and may specifically evolve into secondary myelofibrosis (MF) or acute myeloid leukemia (AML). We studied genomic changes associated with these transformations in 29 patients who had serial samples collected in different phases of disease. Genomic DNA from granulocytes, i.e., the myeloproliferative genome, was processed and hybridized to genome-wide human SNP 6.0 arrays. Most patients in chronic phase had chromosomal regions with uniparental disomy (UPD) and/or copy number changes. Disease progression to secondary MF or AML was associated with the acquisition of additional chromosomal aberrations in granulocytes (P = 0.002). A close relationship was observed between aberrations of chromosome 9p (UPD and/or gain) and progression from PV to post-PV MF (P = 0.002). The acquisition of one or more aberrations involving chromosome 5, 7, or 17p was specifically associated with progression to AML (OR 5.9, 95% CI 1.2-27.7, P = 0.006), and significantly affected overall survival (HR 18, 95% CI 1.9-164, P = 0.01). These observations indicate that disease progression from chronic-phase MPN to secondary MF or AML is associated with specific chromosomal aberrations that can be detected by means of high-resolution SNP array analysis of granulocyte DNA.

  14. High-resolution genomic analysis does not qualify atypical plexus papilloma as a separate entity among choroid plexus tumors.

    PubMed

    Japp, Anna Sophia; Gessi, Marco; Messing-Jünger, Martina; Denkhaus, Dorota; Zur Mühlen, Anja; Wolff, Johannes Ernst; Hartung, Stefan; Kordes, Uwe; Klein-Hitpass, Ludger; Pietsch, Torsten

    2015-02-01

    Choroid plexus tumors are rare neoplasms that mainly affect children. They include papillomas, atypical papillomas, and carcinomas. Detailed genetic studies are rare, and information about their molecular pathogenesis is limited. Molecular inversion probe analysis is a hybridization-based method that represents a reliable tool for the analysis of highly fragmented formalin-fixed paraffin-embedded tissue-derived DNA. Here, analysis of 62 cases showed frequent hyperdiploidy in papillomas and atypical papillomas that appeared very similar in their cytogenetic profiles. In contrast, carcinomas showed mainly losses of chromosomes. Besides recurrent focal chromosomal gains common to all choroid plexus tumors, including chromosome 14q21-q22 (harboring OTX2), chromosome 7q22 (LAMB1), and chromosome 9q21.12 (TRPM3), Genomic Identification of Significant Targets in Cancer analysis uncovered focal alterations specific for papillomas and atypical papillomas (e.g. 7p21.3 [ARL4A]) and for carcinomas (16p13.3 [RBFOX1] and 6p21 [POLH, GTPBP2, RSPH9, and VEGFA]). Additional RNA expression profiling and gene set enrichment analysis revealed greater expression of cell cycle-related genes in atypical papillomas in comparison with that in papillomas. These findings suggest that atypical papillomas represent an immature variant of papillomas characterized by increased proliferative activity, whereas carcinomas seem to represent a genetically distinct tumor group.

  15. A whole-genome mouse BAC microarray with 1-Mb resolution for analysis of DNA copy number changes by array comparative genomic hybridization.

    PubMed

    Chung, Yeun-Jun; Jonkers, Jos; Kitson, Hannah; Fiegler, Heike; Humphray, Sean; Scott, Carol; Hunt, Sarah; Yu, Yuejin; Nishijima, Ichiko; Velds, Arno; Holstege, Henne; Carter, Nigel; Bradley, Allan

    2004-01-01

    Microarray-based comparative genomic hybridization (CGH) has become a powerful method for the genome-wide detection of chromosomal imbalances. Although BAC microarrays have been used for mouse CGH studies, the resolving power of these analyses was limited because high-density whole-genome mouse BAC microarrays were not available. We therefore developed a mouse BAC microarray containing 2803 unique BAC clones from mouse genomic libraries at 1-Mb intervals. For the general amplification of BAC clone DNA prior to spotting, we designed a set of three novel degenerate oligonucleotide-primed (DOP) PCR primers that preferentially amplify mouse genomic sequences while minimizing unwanted amplification of contaminating Escherichia coli DNA. The resulting 3K mouse BAC microarrays reproducibly identified DNA copy number alterations in cell lines and primary tumors, such as single-copy deletions, regional amplifications, and aneuploidy.

  16. An object model for genome information at all levels of resolution

    SciTech Connect

    Honda, S.; Parrott, N.W.; Smith, R.; Lawrence, C.

    1993-12-31

    An object model for genome data at all levels of resolution is described. The model was derived by considering the requirements for representing genome related objects in three application domains: genome maps, large-scale DNA sequencing, and exploring functional information in gene and protein sequences. The methodology used for the object-oriented analysis is also described.

  17. Super-Resolution Genome Mapping in Silicon Nanochannels.

    PubMed

    Jeffet, Jonathan; Kobo, Asaf; Su, Tianxiang; Grunwald, Assaf; Green, Ori; Nilsson, Adam N; Eisenberg, Eli; Ambjörnsson, Tobias; Westerlund, Fredrik; Weinhold, Elmar; Shabat, Doron; Purohit, Prashant K; Ebenstein, Yuval

    2016-11-22

    Optical genome mapping in nanochannels is a powerful genetic analysis method, complementary to deoxyribonucleic acid (DNA) sequencing. The method is based on detecting a pattern of fluorescent labels attached along individual DNA molecules. When such molecules are extended in nanochannels, the labels create a fluorescent genetic barcode that is used for mapping the DNA molecule to its genomic locus and identifying large-scale variation from the genome reference. Mapping resolution is currently limited by two main factors: the optical diffraction limit and the thermal fluctuations of DNA molecules suspended in the nanochannels. Here, we utilize single-molecule tracking and super-resolution localization in order to improve the mapping accuracy and resolving power of this genome mapping technique and achieve a 15-fold increase in resolving power compared to currently practiced methods. We took advantage of a naturally occurring genetic repeat array and labeled each repeat with custom-designed Trolox conjugated fluorophores for enhanced photostability. This model system allowed us to acquire extremely long image sequences of the equally spaced fluorescent markers along DNA molecules, enabling detailed characterization of nanoconfined DNA dynamics and quantitative comparison to the Odijk theory for confined polymer chains. We present a simple method to overcome the thermal fluctuations in the nanochannels and exploit single-step photobleaching to resolve subdiffraction spaced fluorescent markers along fluctuating DNA molecules with ∼100 bp resolution. In addition, we show how time-averaging over just ∼50 frames of 40 ms enhances mapping accuracy, improves mapping P-value scores by 3 orders of magnitude compared to nonaveraged alignment, and provides a significant advantage for analyzing structural variations between DNA molecules with similar sequence composition.

  18. Genetic architecture of cold tolerance in rice (Oryza sativa) determined through high resolution genome-wide analysis

    PubMed Central

    Shakiba, Ehsan; Edwards, Jeremy D.; Jodari, Farman; Duke, Sara E.; Baldo, Angela M.; Korniliev, Pavel; McCouch, Susan R.; Eizenga, Georgia C.

    2017-01-01

    Cold temperature is an important abiotic stress which negatively affects morphological development and seed production in rice (Oryza sativa L.). At the seedling stage, cold stress causes poor germination, seedling injury and poor stand establishment; and at the reproductive stage cold decreases seed yield. The Rice Diversity Panel 1 (RDP1) is a global collection of over 400 O. sativa accessions representing the five major subpopulations from the INDICA and JAPONICA varietal groups, with a genotypic dataset consisting of 700,000 SNP markers. The objectives of this study were to evaluate the RDP1 accessions for the complex, quantitatively inherited cold tolerance traits at the germination and reproductive stages, and to conduct genome-wide association (GWA) mapping to identify SNPs and candidate genes associated with cold stress at these stages. GWA mapping of the germination index (calculated as percent germination in cold divided by warm treatment) revealed 42 quantitative trait loci (QTLs) associated with cold tolerance at the seedling stage, including 18 in the panel as a whole, seven in temperate japonica, six in tropical japonica, 14 in JAPONICA, and nine in INDICA, with five shared across all subpopulations. Twenty-two of these QTLs co-localized with 32 previously reported cold tolerance QTLs. GWA mapping of cold tolerance at the reproductive stage detected 29 QTLs, including seven associated with percent sterility, ten with seed weight per panicle, 14 with seed weight per plant and one region overlapping for two traits. Fifteen co-localized with previously reported QTLs for cold tolerance or yield components. Candidate gene ontology searches revealed these QTLs were associated with significant enrichment for genes related to with lipid metabolism, response to stimuli, response to biotic stimuli (suggesting cross-talk between biotic and abiotic stresses), and oxygen binding. Overall the JAPONICA accessions were more tolerant to cold stress than INDICA

  19. High-Resolution Analysis by Whole-Genome Sequencing of an International Lineage (Sequence Type 111) of Pseudomonas aeruginosa Associated with Metallo-Carbapenemases in the United Kingdom

    PubMed Central

    Wright, Laura; Underwood, Anthony; Witney, Adam A.; Chan, Yuen-Ting; Al-Shahib, Ali; Arnold, Catherine; Doumith, Michel; Patel, Bharat; Planche, Timothy D.; Green, Jonathan; Holliman, Richard; Woodford, Neil

    2015-01-01

    Whole-genome sequencing (WGS) was carried out on 87 isolates of sequence type 111 (ST-111) of Pseudomonas aeruginosa collected between 2005 and 2014 from 65 patients and 12 environmental isolates from 24 hospital laboratories across the United Kingdom on an Illumina HiSeq instrument. Most isolates (73) carried VIM-2, but others carried IMP-1 or IMP-13 (5) or NDM-1 (1); one isolate had VIM-2 and IMP-18, and 7 carried no metallo-beta-lactamase (MBL) gene. Single nucleotide polymorphism analysis divided the isolates into distinct clusters; the NDM-1 isolate was an outlier, and the IMP isolates and 6/7 MBL-negative isolates clustered separately from the main set of 73 VIM-2 isolates. Within the VIM-2 set, there were at least 3 distinct clusters, including a tightly clustered set of isolates from 3 hospital laboratories consistent with an outbreak from a single introduction that was quickly brought under control and a much broader set dominated by isolates from a long-running outbreak in a London hospital likely seeded from an environmental source, requiring different control measures; isolates from 7 other hospital laboratories in London and southeast England were also included. Bayesian evolutionary analysis indicated that all the isolates shared a common ancestor dating back ∼50 years (1960s), with the main VIM-2 set separating approximately 20 to 30 years ago. Accessory gene profiling revealed blocks of genes associated with particular clusters, with some having high similarity (≥95%) to bacteriophage genes. WGS of widely found international lineages such as ST-111 provides the necessary resolution to inform epidemiological investigations and intervention policies. PMID:26041902

  20. High resolution comparative genomic hybridisation in clinical cytogenetics

    PubMed Central

    Kirchhoff, M.; Rose, H.; Lundsteen, C.

    2001-01-01

    High resolution comparative genomic hybridisation (HR-CGH) is a diagnostic tool in our clinical cytogenetics laboratory. The present survey reports the results of 253 clinical cases in which 47 abnormalities were detected. Among 144 dysmorphic and mentally retarded subjects with a normal conventional karyotype, 15 (10%) had small deletions or duplications, of which 11 were interstitial. In addition, a case of mosaic trisomy 9 was detected. Among 25 dysmorphic and mentally retarded subjects carrying apparently balanced de novo translocations, four had deletions at translocation breakpoints and two had deletions elsewhere in the genome. Seventeen of 19 complex rearrangements were clarified by HR-CGH. A small supernumerary marker chromosome occurring with low frequency and the breakpoint of a mosaic r(18) case could not be clarified. Three of 19 other abnormalities could not be confirmed by HR-CGH. One was a Williams syndrome deletion and two were DiGeorge syndrome deletions, which were apparently below the resolution of HR-CGH. However, we were able to confirm Angelman and Prader-Willi syndrome deletions, which are about 3-5 Mb. We conclude that HR-CGH should be used for the evaluation of (1) dysmorphic and mentally retarded subjects where normal karyotyping has failed to show abnormalities, (2) dysmorphic and mentally retarded subjects carrying apparently balanced de novo translocations, (3) apparently balanced de novo translocations detected prenatally, and (4) for clarification of complex structural rearrangements.


Keywords: comparative genomic hybridisation; chromosome analysis; chromosome aberrations; dysmorphism PMID:11694545

  1. Assessing the significance of conserved genomic aberrations using high resolution genomic microarrays.

    PubMed

    Guttman, Mitchell; Mies, Carolyn; Dudycz-Sulicz, Katarzyna; Diskin, Sharon J; Baldwin, Don A; Stoeckert, Christian J; Grant, Gregory R

    2007-08-01

    Genomic aberrations recurrent in a particular cancer type can be important prognostic markers for tumor progression. Typically in early tumorigenesis, cells incur a breakdown of the DNA replication machinery that results in an accumulation of genomic aberrations in the form of duplications, deletions, translocations, and other genomic alterations. Microarray methods allow for finer mapping of these aberrations than has previously been possible; however, data processing and analysis methods have not taken full advantage of this higher resolution. Attention has primarily been given to analysis on the single sample level, where multiple adjacent probes are necessarily used as replicates for the local region containing their target sequences. However, regions of concordant aberration can be short enough to be detected by only one, or very few, array elements. We describe a method called Multiple Sample Analysis for assessing the significance of concordant genomic aberrations across multiple experiments that does not require a-priori definition of aberration calls for each sample. If there are multiple samples, representing a class, then by exploiting the replication across samples our method can detect concordant aberrations at much higher resolution than can be derived from current single sample approaches. Additionally, this method provides a meaningful approach to addressing population-based questions such as determining important regions for a cancer subtype of interest or determining regions of copy number variation in a population. Multiple Sample Analysis also provides single sample aberration calls in the locations of significant concordance, producing high resolution calls per sample, in concordant regions. The approach is demonstrated on a dataset representing a challenging but important resource: breast tumors that have been formalin-fixed, paraffin-embedded, archived, and subsequently UV-laser capture microdissected and hybridized to two-channel BAC arrays

  2. High Resolution QTL Map Of Body Conformation Traits From Genome-Wide Association Analysis In Contemporary U.S. Holstein Cows

    USDA-ARS?s Scientific Manuscript database

    A QTL map of 1,005 SNP markers affecting 18 body conformation traits (top 100 effects per trait) was constructed based on a genome-wide association analysis of 1,654 contemporary U.S. Holstein cows genotyped with the BovineSNP50 (45,878 SNPs). The top 100 effects for each trait explained 38-56% of t...

  3. High Resolution QTL Map Of Net Merit Component Traits And Calving Traits From Genome-Wide Association Analysis In Contemporary U.S. Holstein Cows

    USDA-ARS?s Scientific Manuscript database

    A QTL map of 725 SNPs affecting 13 dairy traits (top 100 effects per trait) was constructed based on a genome-wide association analysis of 1,654 contemporary U.S. Holstein cows genotyped with 45,878 SNPs. The 13 traits were net merit (NM$), its 8 component traits and 4 calving traits. The top 100 ef...

  4. Genome Survey Sequencing of Luffa Cylindrica L. and Microsatellite High Resolution Melting (SSR-HRM) Analysis for Genetic Relationship of Luffa Genotypes.

    PubMed

    An, Jianyu; Yin, Mengqi; Zhang, Qin; Gong, Dongting; Jia, Xiaowen; Guan, Yajing; Hu, Jin

    2017-09-11

    Luffa cylindrica (L.) Roem. is an economically important vegetable crop in China. However, the genomic information on this species is currently unknown. In this study, for the first time, a genome survey of L. cylindrica was carried out using next-generation sequencing (NGS) technology. In total, 43.40 Gb sequence data of L. cylindrica, about 54.94× coverage of the estimated genome size of 789.97 Mb, were obtained from HiSeq 2500 sequencing, in which the guanine plus cytosine (GC) content was calculated to be 37.90%. The heterozygosity of genome sequences was only 0.24%. In total, 1,913,731 contigs (>200 bp) with 525 bp N50 length and 1,410,117 scaffolds (>200 bp) with 885.01 Mb total length were obtained. From the initial assembled L. cylindrica genome, 431,234 microsatellites (SSRs) (≥5 repeats) were identified. The motif types of SSR repeats included 62.88% di-nucleotide, 31.03% tri-nucleotide, 4.59% tetra-nucleotide, 0.96% penta-nucleotide and 0.54% hexa-nucleotide. Eighty genomic SSR markers were developed, and 51/80 primers could be used in both "Zheda 23" and "Zheda 83". Nineteen SSRs were used to investigate the genetic diversity among 32 accessions through SSR-HRM analysis. The unweighted pair group method analysis (UPGMA) dendrogram tree was built by calculating the SSR-HRM raw data. SSR-HRM could be effectively used for genotype relationship analysis of Luffa species.

  5. Integrated high-resolution array CGH and SKY analysis of homozygous deletions and other genomic alterations present in malignant mesothelioma cell lines.

    PubMed

    Klorin, Geula; Rozenblum, Ester; Glebov, Oleg; Walker, Robert L; Park, Yoonsoo; Meltzer, Paul S; Kirsch, Ilan R; Kaye, Frederic J; Roschke, Anna V

    2013-05-01

    High-resolution oligonucleotide array comparative genomic hybridization (aCGH) and spectral karyotyping (SKY) were applied to a panel of malignant mesothelioma (MMt) cell lines. SKY has not been applied to MMt before, and complete karyotypes are reported based on the integration of SKY and aCGH results. A whole genome search for homozygous deletions (HDs) produced the largest set of recurrent and non-recurrent HDs for MMt (52 recurrent HDs in 10 genomic regions; 36 non-recurrent HDs). For the first time, LINGO2, RBFOX1/A2BP1, RPL29, DUSP7, and CCSER1/FAM190A were found to be homozygously deleted in MMt, and some of these genes could be new tumor suppressor genes for MMt. Integration of SKY and aCGH data allowed reconstruction of chromosomal rearrangements that led to the formation of HDs. Our data imply that only with acquisition of structural and/or numerical karyotypic instability can MMt cells attain a complete loss of tumor suppressor genes located in 9p21.3, which is the most frequently homozygously deleted region. Tetraploidization is a late event in the karyotypic progression of MMt cells, after HDs in the 9p21.3 region have already been acquired.

  6. Integrated high-resolution array CGH and SKY analysis of homozygous deletions and other genomic alterations present in malignant mesothelioma cell lines

    PubMed Central

    Klorin, Geula; Rozenblum, Ester; Glebov, Oleg; Walker, Robert L.; Park, Yoonsoo; Meltzer, Paul S.; Kirsch, Ilan R.; Kaye, Frederic J.

    2014-01-01

    High-resolution oligonucleotide array comparative genomic hybridization (aCGH) and spectral karyotyping (SKY) were applied to a panel of malignant mesothelioma (MMt) cell lines. SKY has not been applied to MMt before, and complete karyotypes are reported based on the integration of SKY and aCGH results. A whole genome search for homozygous deletions (HDs) produced the largest set of recurrent and non-recurrent HDs for MMt (52 recurrent HDs in 10 genomic regions; 36 non-recurrent HDs). For the first time, LINGO2, RBFOX1/A2BP1, RPL29, DUSP7, and CCSER1/FAM190A were found to be homozygously deleted in MMt, and some of these genes could be new tumor suppressor genes for MMt. Integration of SKY and aCGH data allowed reconstruction of chromosomal rearrangements that led to the formation of HDs. Our data imply that only with acquisition of structural and/or numerical karyotypic instability can MMt cells attain a complete loss of tumor suppressor genes located in 9p21.3, which is the most frequently homozygously deleted region. Tetraploidization is a late event in the karyotypic progression of MMt cells, after HDs in the 9p21.3 region have already been acquired. PMID:23830731

  7. Genome-Wide Analysis of Nucleosome Positions, Occupancy, and Accessibility in Yeast: Nucleosome Mapping, High-Resolution Histone ChIP, and NCAM.

    PubMed

    Rodriguez, Jairo; McKnight, Jeffrey N; Tsukiyama, Toshio

    2014-10-01

    Because histones bind DNA very tightly, the location on DNA and the level of occupancy of a given DNA sequence by nucleosomes can profoundly affect accessibility of non-histone proteins to chromatin, affecting virtually all DNA-dependent processes, such as transcription, DNA repair, DNA replication and recombination. Therefore, it is often necessary to determine positions and occupancy of nucleosomes to understand how DNA-dependent processes are regulated. Recent technological advances made such analyses feasible on a genome-wide scale at high resolution. In addition, we have recently developed a method to measure nuclease accessibility of nucleosomes on a global scale. This unit describes methods to map nucleosome positions, to determine nucleosome density, and to determine nuclease accessibility of nucleosomes using deep sequencing.

  8. Genome-wide analysis of human global and transcription-coupled excision repair of UV damage at single-nucleotide resolution

    PubMed Central

    Hu, Jinchuan; Adar, Sheera; Selby, Christopher P.

    2015-01-01

    We developed a method for genome-wide mapping of DNA excision repair named XR-seq (excision repair sequencing). Human nucleotide excision repair generates two incisions surrounding the site of damage, creating an ∼30-mer. In XR-seq, this fragment is isolated and subjected to high-throughput sequencing. We used XR-seq to produce stranded, nucleotide-resolution maps of repair of two UV-induced DNA damages in human cells: cyclobutane pyrimidine dimers (CPDs) and (6-4) pyrimidine–pyrimidone photoproducts [(6-4)PPs]. In wild-type cells, CPD repair was highly associated with transcription, specifically with the template strand. Experiments in cells defective in either transcription-coupled excision repair or general excision repair isolated the contribution of each pathway to the overall repair pattern and showed that transcription-coupled repair of both photoproducts occurs exclusively on the template strand. XR-seq maps capture transcription-coupled repair at sites of divergent gene promoters and bidirectional enhancer RNA (eRNA) production at enhancers. XR-seq data also uncovered the repair characteristics and novel sequence preferences of CPDs and (6-4)PPs. XR-seq and the resulting repair maps will facilitate studies of the effects of genomic location, chromatin context, transcription, and replication on DNA repair in human cells. PMID:25934506

  9. GenomeFingerprinter: the genome fingerprint and the universal genome fingerprint analysis for systematic comparative genomics.

    PubMed

    Ai, Yuncan; Ai, Hannan; Meng, Fanmei; Zhao, Lei

    2013-01-01

    No attention has been paid on comparing a set of genome sequences crossing genetic components and biological categories with far divergence over large size range. We define it as the systematic comparative genomics and aim to develop the methodology. First, we create a method, GenomeFingerprinter, to unambiguously produce a set of three-dimensional coordinates from a sequence, followed by one three-dimensional plot and six two-dimensional trajectory projections, to illustrate the genome fingerprint of a given genome sequence. Second, we develop a set of concepts and tools, and thereby establish a method called the universal genome fingerprint analysis (UGFA). Particularly, we define the total genetic component configuration (TGCC) (including chromosome, plasmid, and phage) for describing a strain as a systematic unit, the universal genome fingerprint map (UGFM) of TGCC for differentiating strains as a universal system, and the systematic comparative genomics (SCG) for comparing a set of genomes crossing genetic components and biological categories. Third, we construct a method of quantitative analysis to compare two genomes by using the outcome dataset of genome fingerprint analysis. Specifically, we define the geometric center and its geometric mean for a given genome fingerprint map, followed by the Euclidean distance, the differentiate rate, and the weighted differentiate rate to quantitatively describe the difference between two genomes of comparison. Moreover, we demonstrate the applications through case studies on various genome sequences, giving tremendous insights into the critical issues in microbial genomics and taxonomy. We have created a method, GenomeFingerprinter, for rapidly computing, geometrically visualizing, intuitively comparing a set of genomes at genome fingerprint level, and hence established a method called the universal genome fingerprint analysis, as well as developed a method of quantitative analysis of the outcome dataset. These have set

  10. GenomeFingerprinter: The Genome Fingerprint and the Universal Genome Fingerprint Analysis for Systematic Comparative Genomics

    PubMed Central

    Ai, Yuncan; Ai, Hannan; Meng, Fanmei; Zhao, Lei

    2013-01-01

    Background No attention has been paid on comparing a set of genome sequences crossing genetic components and biological categories with far divergence over large size range. We define it as the systematic comparative genomics and aim to develop the methodology. Results First, we create a method, GenomeFingerprinter, to unambiguously produce a set of three-dimensional coordinates from a sequence, followed by one three-dimensional plot and six two-dimensional trajectory projections, to illustrate the genome fingerprint of a given genome sequence. Second, we develop a set of concepts and tools, and thereby establish a method called the universal genome fingerprint analysis (UGFA). Particularly, we define the total genetic component configuration (TGCC) (including chromosome, plasmid, and phage) for describing a strain as a systematic unit, the universal genome fingerprint map (UGFM) of TGCC for differentiating strains as a universal system, and the systematic comparative genomics (SCG) for comparing a set of genomes crossing genetic components and biological categories. Third, we construct a method of quantitative analysis to compare two genomes by using the outcome dataset of genome fingerprint analysis. Specifically, we define the geometric center and its geometric mean for a given genome fingerprint map, followed by the Euclidean distance, the differentiate rate, and the weighted differentiate rate to quantitatively describe the difference between two genomes of comparison. Moreover, we demonstrate the applications through case studies on various genome sequences, giving tremendous insights into the critical issues in microbial genomics and taxonomy. Conclusions We have created a method, GenomeFingerprinter, for rapidly computing, geometrically visualizing, intuitively comparing a set of genomes at genome fingerprint level, and hence established a method called the universal genome fingerprint analysis, as well as developed a method of quantitative analysis of the

  11. Superfine resolution acoustooptic spectrum analysis

    NASA Technical Reports Server (NTRS)

    Ansari, Homayoon; Lesh, James R.

    1991-01-01

    High resolution spectrum analysis of RF signals is required in applications such as the search for extraterrestrial intelligence, RF interference monitoring, or general purpose decomposition of signals. Sub-Hertz resolution in three-dimensional acoustooptic spectrum analysis is theoretically and experimentally demonstrated. The operation of a two-dimensional acoustooptic spectrum analyzer is extended to include time integration over a sequence of CCD frames.

  12. Superfine resolution acoustooptic spectrum analysis

    NASA Technical Reports Server (NTRS)

    Ansari, Homayoon; Lesh, James R.

    1991-01-01

    High resolution spectrum analysis of RF signals is required in applications such as the search for extraterrestrial intelligence, RF interference monitoring, or general purpose decomposition of signals. Sub-Hertz resolution in three-dimensional acoustooptic spectrum analysis is theoretically and experimentally demonstrated. The operation of a two-dimensional acoustooptic spectrum analyzer is extended to include time integration over a sequence of CCD frames.

  13. Resolution analysis of bistatic SAR

    NASA Astrophysics Data System (ADS)

    Garza, Guillermo; Qiao, Zhijun

    2011-06-01

    In this paper, we analyze the resolution of bistatic synthetic aperture radar (BISAR) imaging for stationary objects. In particular, we analyze the resolution of images reconstructed by the method of a filtered backprojection inversion, an inversion method which is derived from a scalar wave equation model. In this context we are able to account for the effects of antenna beam patterns and arbitrary flight trajectories. The analysis is done by examining the data collection manifold for different experiment geometries and system parameters.

  14. High-resolution genomic microarrays for X-linked mental retardation.

    PubMed

    Lugtenberg, Dorien; Veltman, Joris A; van Bokhoven, Hans

    2007-09-01

    Developments in genomic microarray technology have revolutionized the study of human genomic copy number variation. This has significantly affected many areas in human genetics, including the field of X-linked mental retardation (XLMR). Chromosome X-specific bacterial artificial chromosomes microarrays have been developed to specifically test this chromosome with a resolution of approximately 100 kilobases. Application of these microarrays in X-linked mental retardation studies has resulted in the identification of novel X-linked mental retardation genes, copy number variation at known X-linked mental retardation genes, and copy number variations harboring as yet unidentified X-linked mental retardation genes. Further enhancements in genomic microarray analysis will soon allow the reliable analysis of all copy number variations throughout this chromosome at the kilobase or single exon resolution. In this review, we describe the developments in this field and specifically highlight the impact of these microarray studies in the field of X-linked mental retardation.

  15. Genome-wide single nucleotide polymorphism-based assay for high-resolution epidemiological analysis of the methicillin-resistant Staphylococcus aureus hospital clone EMRSA-15.

    PubMed

    Holmes, A; McAllister, G; McAdam, P R; Hsien Choi, S; Girvan, K; Robb, A; Edwards, G; Templeton, K; Fitzgerald, J R

    2014-02-01

    The EMRSA-15 clone is a major cause of nosocomial methicillin-resistant Staphylococcus aureus (MRSA) infections in the UK and elsewhere but existing typing methodologies have limited capacity to discriminate closely related strains, and are often poorly reproducible between laboratories. Here, we report the design, development and validation of a genome-wide single nucleotide polymorphism (SNP) typing method and compare it to established methods for typing of EMRSA-15. In order to identify discriminatory SNPs, the genomes of 17 EMRSA-15 strains, selected to represent the breadth of genotypic and phenotypic diversity of EMRSA-15 isolates in Scotland, were determined and phylogenetic reconstruction was carried out. In addition to 17 phylogenetically informative SNPs, five binary markers were included to form the basis of an EMRSA-15 genotyping assay. The SNP-based typing assay was as discriminatory as pulsed-field gel electrophoresis, and significantly more discriminatory than staphylococcal protein A (spa) typing for typing of a representative panel of diverse EMRSA-15 strains, isolates from two EMRSA-15 hospital outbreak investigations, and a panel of bacteraemia isolates obtained in healthcare facilities in the east of Scotland during a 12-month period. The assay is a rapid, and reproducible approach for epidemiological analysis of EMRSA-15 clinical isolates in Scotland. Unlike established methods the DNA sequence-based method is ideally suited for inter-laboratory comparison of identified genotypes, and its flexibility lends itself to supplementation with additional SNPs or markers for the identification of novel S. aureus strains in other regions of the world.

  16. Understanding and utilizing crop genome diversity via high-resolution genotyping.

    PubMed

    Voss-Fels, Kai; Snowdon, Rod J

    2016-04-01

    High-resolution genome analysis technologies provide an unprecedented level of insight into structural diversity across crop genomes. Low-cost discovery of sequence variation has become accessible for all crops since the development of next-generation DNA sequencing technologies, using diverse methods ranging from genome-scale resequencing or skim sequencing, reduced-representation genotyping-by-sequencing, transcriptome sequencing or sequence capture approaches. High-density, high-throughput genotyping arrays generated using the resulting sequence data are today available for the assessment of genomewide single nucleotide polymorphisms in all major crop species. Besides their application in genetic mapping or genomewide association studies for dissection of complex agronomic traits, high-density genotyping arrays are highly suitable for genomic selection strategies. They also enable description of crop diversity at an unprecedented chromosome-scale resolution. Application of population genetics parameters to genomewide diversity data sets enables dissection of linkage disequilibrium to characterize loci underlying selective sweeps. High-throughput genotyping platforms simultaneously open the way for targeted diversity enrichment, allowing rejuvenation of low-diversity chromosome regions in strongly selected breeding pools to potentially reverse the influence of linkage drag. Numerous recent examples are presented which demonstrate the power of next-generation genomics for high-resolution analysis of crop diversity on a subgenomic and chromosomal scale. Such studies give deep insight into the history of crop evolution and selection, while simultaneously identifying novel diversity to improve yield and heterosis.

  17. Screening of genomic imbalances in glioblastoma multiforme using high-resolution comparative genomic hybridization.

    PubMed

    Vranová, Vladimíra; Necesalová, Eva; Kuglík, Petr; Cejpek, Pavel; Pesáková, Martina; Budínská, Eva; Relichová, Jirina; Veselská, Renata

    2007-02-01

    Comparative genomic hybridization (CGH) is a molecular cytogenetic technique that allows the genome-wide analysis of DNA sequence copy number differences. We applied conventional CGH and the recently developed high-resolution CGH (HR-CGH) to tumour samples from 18 patients with glioblastoma multiforme (GBM) in order to compare the sensitivity of CGH and HR-CGH in the screening of chromosomal abnormalities. The abnormalities were studied in topologically different central and peripheral tumour parts. A total of 78 different changes were observed using CGH (0-16 per tumour, median 3.5) and 154 using HR-CGH (0-21 per tumour, median 6). Using HR-CGH, losses were more frequent than gains. The representation of the most prominent changes revealed by both methods was similar and was comprised of the amplification of 7q12 and 12q13-q15, the gain of 7, 3q and 19, and the loss of 10, 9p, and 13q. However, HR-CGH detected certain other abnormalities (the loss of 6, 14q, 15q and 18q, and the gain of 19), which were rarely revealed by CGH. Using HR-CGH, the numbers and types of chromosomal changes detected in the central and peripheral parts of GBM were almost the same. The loss of chromosomes 10 and 9p and the gain of chromosomes 7 and 19 were the most frequent chromosomal alterations in both tumour parts. Our results from the GBM analysis show that HR-CGH technology can reveal new, recurrent genetic alterations involving the genes known to participate in tumorigenesis and in the progression of several human malignancies, thus allowing for a more accurate genetic characterization of these tumours.

  18. High-resolution, genome-wide mapping of chromatin modifications by GMAT.

    PubMed

    Roh, Tae-Young; Zhao, Keji

    2008-01-01

    One major postgenomic challenge is to characterize the epigenomes that control genome functions. The epigenomes are mainly defined by the specific association of nonhistone proteins with chromatin and the covalent modifications of chromatin, including DNA methylation and posttranslational histone modifications. The in vivo protein-binding and chromatin-modification patterns can be revealed by the chromatin immunoprecipitation assay (ChIP). By combining the ChIP assays and the serial analysis of gene expression (SAGE) protocols, we have developed an unbiased and high-resolution genome-wide mapping technique (GMAT) to determine the genome-wide protein-targeting and chromatin-modification patterns. GMAT has been successfully applied to mapping the target sites of the histone acetyltransferase, Gcn5p, in yeast and to the discovery of the histone acetylation islands as an epigenetic mark for functional regulatory elements in the human genome.

  19. Comparative analysis and visualization of multiple collinear genomes

    PubMed Central

    2012-01-01

    Background Genome browsers are a common tool used by biologists to visualize genomic features including genes, polymorphisms, and many others. However, existing genome browsers and visualization tools are not well-suited to perform meaningful comparative analysis among a large number of genomes. With the increasing quantity and availability of genomic data, there is an increased burden to provide useful visualization and analysis tools for comparison of multiple collinear genomes such as the large panels of model organisms which are the basis for much of the current genetic research. Results We have developed a novel web-based tool for visualizing and analyzing multiple collinear genomes. Our tool illustrates genome-sequence similarity through a mosaic of intervals representing local phylogeny, subspecific origin, and haplotype identity. Comparative analysis is facilitated through reordering and clustering of tracks, which can vary throughout the genome. In addition, we provide local phylogenetic trees as an alternate visualization to assess local variations. Conclusions Unlike previous genome browsers and viewers, ours allows for simultaneous and comparative analysis. Our browser provides intuitive selection and interactive navigation about features of interest. Dynamic visualizations adjust to scale and data content making analysis at variable resolutions and of multiple data sets more informative. We demonstrate our genome browser for an extensive set of genomic data sets composed of almost 200 distinct mouse laboratory strains. PMID:22536897

  20. High-Resolution Fine Mapping and Fluorescence in Situ Hybridization Analysis of sun, a Locus Controlling Tomato Fruit Shape, Reveals a Region of the Tomato Genome Prone to DNA Rearrangements

    PubMed Central

    van der Knaap, E.; Sanyal, A.; Jackson, S. A.; Tanksley, S. D.

    2004-01-01

    The locus sun on the short arm of tomato chromosome 7 controls morphology of the fruit. Alleles from wild relatives impart a round shape, while alleles from certain cultivated varieties impart an oval shape typical of roma-type tomatoes. We fine mapped the locus in two populations and investigated the genome organization of the region spanning and flanking sun. The first high-resolution genetic map of the sun locus was constructed using a nearly isogenic F2 population derived from a cross between Lycopersicon pennellii introgression line IL7-4 and L. esculentum cv Sun1642. The mapping combined with results from pachytene FISH experiments demonstrated that the top of chromosome 7 is inverted in L. pennellii accession LA716. sun was located close to the chromosomal breakpoint and within the inversion, thereby precluding map-based cloning of the gene using this population. The fruit-shape locus was subsequently fine mapped in a population derived from a cross between L. esculentum Sun1642 and L. pimpinellifolium LA1589. Chromosome walking using clones identified from several large genomic insert libraries resulted in two noncontiguous contigs flanking sun. Fiber-FISH analysis showed that distance between the two contigs measured 68 kb in L. esculentum Sun1642 and 38 kb in L. pimpinellifolium LA1589, respectively. The sun locus mapped between the two contigs, suggesting that allelic variation at this locus may be due to an insertion/deletion event. The results demonstrate that sun is located in a highly dynamic region of the tomato genome. PMID:15611181

  1. High-resolution linkage and quantitative trait locus mapping aided by genome survey sequencing: building up an integrative genomic framework for a bivalve mollusc.

    PubMed

    Jiao, Wenqian; Fu, Xiaoteng; Dou, Jinzhuang; Li, Hengde; Su, Hailin; Mao, Junxia; Yu, Qian; Zhang, Lingling; Hu, Xiaoli; Huang, Xiaoting; Wang, Yangfan; Wang, Shi; Bao, Zhenmin

    2014-02-01

    Genetic linkage maps are indispensable tools in genetic and genomic studies. Recent development of genotyping-by-sequencing (GBS) methods holds great promise for constructing high-resolution linkage maps in organisms lacking extensive genomic resources. In the present study, linkage mapping was conducted for a bivalve mollusc (Chlamys farreri) using a newly developed GBS method-2b-restriction site-associated DNA (2b-RAD). Genome survey sequencing was performed to generate a preliminary reference genome that was utilized to facilitate linkage and quantitative trait locus (QTL) mapping in C. farreri. A high-resolution linkage map was constructed with a marker density (3806) that has, to our knowledge, never been achieved in any other molluscs. The linkage map covered nearly the whole genome (99.5%) with a resolution of 0.41 cM. QTL mapping and association analysis congruously revealed two growth-related QTLs and one potential sex-determination region. An important candidate QTL gene named PROP1, which functions in the regulation of growth hormone production in vertebrates, was identified from the growth-related QTL region detected on the linkage group LG3. We demonstrate that this linkage map can serve as an important platform for improving genome assembly and unifying multiple genomic resources. Our study, therefore, exemplifies how to build up an integrative genomic framework in a non-model organism.

  2. Integration of complete chloroplast genome sequences with small amplicon datasets improves phylogenetic resolution in Acacia.

    PubMed

    Williams, Anna V; Miller, Joseph T; Small, Ian; Nevill, Paul G; Boykin, Laura M

    2016-03-01

    Combining whole genome data with previously obtained amplicon sequences has the potential to increase the resolution of phylogenetic analyses, particularly at low taxonomic levels or where recent divergence, rapid speciation or slow genome evolution has resulted in limited sequence variation. However, the integration of these types of data for large scale phylogenetic studies has rarely been investigated. Here we conduct a phylogenetic analysis of the whole chloroplast genome and two nuclear ribosomal loci for 65 Acacia species from across the most recent Acacia phylogeny. We then combine this data with previously generated amplicon sequences (four chloroplast loci and two nuclear ribosomal loci) for 508 Acacia species. We use several phylogenetic methods, including maximum likelihood bootstrapping (with and without constraint) and ExaBayes, in order to determine the success of combining a dataset of 4000bp with one of 189,000bp. The results of our study indicate that the inclusion of whole genome data gave a far better resolved and well supported representation of the phylogenetic relationships within Acacia than using only amplicon sequences, with the greatest support observed when using a whole genome phylogeny as a constraint on the amplicon sequences. Our study therefore provides methods for optimal integration of genomic and amplicon sequences.

  3. High resolution genome wide binding event finding and motif discovery reveals transcription factor spatial binding constraints.

    PubMed

    Guo, Yuchun; Mahony, Shaun; Gifford, David K

    2012-01-01

    An essential component of genome function is the syntax of genomic regulatory elements that determine how diverse transcription factors interact to orchestrate a program of regulatory control. A precise characterization of in vivo spacing constraints between key transcription factors would reveal key aspects of this genomic regulatory language. To discover novel transcription factor spatial binding constraints in vivo, we developed a new integrative computational method, genome wide event finding and motif discovery (GEM). GEM resolves ChIP data into explanatory motifs and binding events at high spatial resolution by linking binding event discovery and motif discovery with positional priors in the context of a generative probabilistic model of ChIP data and genome sequence. GEM analysis of 63 transcription factors in 214 ENCODE human ChIP-Seq experiments recovers more known factor motifs than other contemporary methods, and discovers six new motifs for factors with unknown binding specificity. GEM's adaptive learning of binding-event read distributions allows it to further improve upon previous methods for processing ChIP-Seq and ChIP-exo data to yield unsurpassed spatial resolution and discovery of closely spaced binding events of the same factor. In a systematic analysis of in vivo sequence-specific transcription factor binding using GEM, we have found hundreds of spatial binding constraints between factors. GEM found 37 examples of factor binding constraints in mouse ES cells, including strong distance-specific constraints between Klf4 and other key regulatory factors. In human ENCODE data, GEM found 390 examples of spatially constrained pair-wise binding, including such novel pairs as c-Fos:c-Jun/USF1, CTCF/Egr1, and HNF4A/FOXA1. The discovery of new factor-factor spatial constraints in ChIP data is significant because it proposes testable models for regulatory factor interactions that will help elucidate genome function and the implementation of combinatorial

  4. Genome-wide analysis of synonymous single nucleotide polymorphisms in Mycobacterium tuberculosis complex organisms: resolution of genetic relationships among closely related microbial strains.

    PubMed Central

    Gutacker, Michaela M; Smoot, James C; Migliaccio, Cristi A Lux; Ricklefs, Stacy M; Hua, Su; Cousins, Debby V; Graviss, Edward A; Shashkina, Elena; Kreiswirth, Barry N; Musser, James M

    2002-01-01

    Several human pathogens (e.g., Bacillus anthracis, Yersinia pestis, Bordetella pertussis, Plasmodium falciparum, and Mycobacterium tuberculosis) have very restricted unselected allelic variation in structural genes, which hinders study of the genetic relationships among strains and strain-trait correlations. To address this problem in a representative pathogen, 432 M. tuberculosis complex strains from global sources were genotyped on the basis of 230 synonymous (silent) single nucleotide polymorphisms (sSNPs) identified by comparison of four genome sequences. Eight major clusters of related genotypes were identified in M. tuberculosis sensu stricto, including a single cluster representing organisms responsible for several large outbreaks in the United States and Asia. All M. tuberculosis sensu stricto isolates of previously unknown phylogenetic position could be rapidly and unambiguously assigned to one of the eight major clusters, thus providing a facile strategy for identifying organisms that are clonally related by descent. Common clones of M. tuberculosis sensu stricto and M. bovis are distinct, deeply branching genotypic complexes whose extant members did not emerge directly from one another in the recent past. sSNP genotyping rapidly delineates relationships among closely related strains of pathogenic microbes and allows construction of genetic frameworks for examining the distribution of biomedically relevant traits such as virulence, transmissibility, and host range. PMID:12524330

  5. Biotin-Genomic Run-On (Bio-GRO): A High-Resolution Method for the Analysis of Nascent Transcription in Yeast.

    PubMed

    Jordán-Pla, Antonio; Miguel, Ana; Serna, Eva; Pelechano, Vicent; Pérez-Ortín, José E

    2016-01-01

    Transcription is a highly complex biological process, with extensive layers of regulation, some of which remain to be fully unveiled and understood. To be able to discern the particular contributions of the several transcription steps it is crucial to understand RNA polymerase dynamics and regulation throughout the transcription cycle. Here we describe a new nonradioactive run-on based method that maps elongating RNA polymerases along the genome. In contrast with alternative methodologies for the measurement of nascent transcription, the BioGRO method is designed to minimize technical noise that arises from two of the most common sources that affect this type of strategies: contamination with mature RNA and amplification-based technical biasing. The method is strand-specific, compatible with commercial microarrays, and has been successfully applied to both yeasts Saccharomyces cerevisiae and Candida albicans. BioGRO profiling provides powerful insights not only into the biogenesis and regulation of canonical gene transcription but also into the noncoding and antisense transcriptomes.

  6. Proteogenomic analysis of Mycobacterium tuberculosis by high resolution mass spectrometry.

    PubMed

    Kelkar, Dhanashree S; Kumar, Dhirendra; Kumar, Praveen; Balakrishnan, Lavanya; Muthusamy, Babylakshmi; Yadav, Amit Kumar; Shrivastava, Priyanka; Marimuthu, Arivusudar; Anand, Sridhar; Sundaram, Hema; Kingsbury, Reena; Harsha, H C; Nair, Bipin; Prasad, T S Keshava; Chauhan, Devendra Singh; Katoch, Kiran; Katoch, Vishwa Mohan; Kumar, Prahlad; Chaerkady, Raghothama; Ramachandran, Srinivasan; Dash, Debasis; Pandey, Akhilesh

    2011-12-01

    The genome sequencing of H37Rv strain of Mycobacterium tuberculosis was completed in 1998 followed by the whole genome sequencing of a clinical isolate, CDC1551 in 2002. Since then, the genomic sequences of a number of other strains have become available making it one of the better studied pathogenic bacterial species at the genomic level. However, annotation of its genome remains challenging because of high GC content and dissimilarity to other model prokaryotes. To this end, we carried out an in-depth proteogenomic analysis of the M. tuberculosis H37Rv strain using Fourier transform mass spectrometry with high resolution at both MS and tandem MS levels. In all, we identified 3176 proteins from Mycobacterium tuberculosis representing ~80% of its total predicted gene count. In addition to protein database search, we carried out a genome database search, which led to identification of ~250 novel peptides. Based on these novel genome search-specific peptides, we discovered 41 novel protein coding genes in the H37Rv genome. Using peptide evidence and alternative gene prediction tools, we also corrected 79 gene models. Finally, mass spectrometric data from N terminus-derived peptides confirmed 727 existing annotations for translational start sites while correcting those for 33 proteins. We report creation of a high confidence set of protein coding regions in Mycobacterium tuberculosis genome obtained by high resolution tandem mass-spectrometry at both precursor and fragment detection steps for the first time. This proteogenomic approach should be generally applicable to other organisms whose genomes have already been sequenced for obtaining a more accurate catalogue of protein-coding genes.

  7. High-resolution genetic maps of Eucalyptus improve Eucalyptus grandis genome assembly.

    PubMed

    Bartholomé, Jérôme; Mandrou, Eric; Mabiala, André; Jenkins, Jerry; Nabihoudine, Ibouniyamine; Klopp, Christophe; Schmutz, Jeremy; Plomion, Christophe; Gion, Jean-Marc

    2015-06-01

    Genetic maps are key tools in genetic research as they constitute the framework for many applications, such as quantitative trait locus analysis, and support the assembly of genome sequences. The resequencing of the two parents of a cross between Eucalyptus urophylla and Eucalyptus grandis was used to design a single nucleotide polymorphism (SNP) array of 6000 markers evenly distributed along the E. grandis genome. The genotyping of 1025 offspring enabled the construction of two high-resolution genetic maps containing 1832 and 1773 markers with an average marker interval of 0.45 and 0.5 cM for E. grandis and E. urophylla, respectively. The comparison between genetic maps and the reference genome highlighted 85% of collinear regions. A total of 43 noncollinear regions and 13 nonsynthetic regions were detected and corrected in the new genome assembly. This improved version contains 4943 scaffolds totalling 691.3 Mb of which 88.6% were captured by the 11 chromosomes. The mapping data were also used to investigate the effect of population size and number of markers on linkage mapping accuracy. This study provides the most reliable linkage maps for Eucalyptus and version 2.0 of the E. grandis genome.

  8. Global transcript structure resolution of high gene density genomes through multi-platform data integration

    PubMed Central

    O'Grady, Tina; Wang, Xia; Höner zu Bentrup, Kerstin; Baddoo, Melody; Concha, Monica; Flemington, Erik K.

    2016-01-01

    Annotation of herpesvirus genomes has traditionally been undertaken through the detection of open reading frames and other genomic motifs, supplemented with sequencing of individual cDNAs. Second generation sequencing and high-density microarray studies have revealed vastly greater herpesvirus transcriptome complexity than is captured by existing annotation. The pervasive nature of overlapping transcription throughout herpesvirus genomes, however, poses substantial problems in resolving transcript structures using these methods alone. We present an approach that combines the unique attributes of Pacific Biosciences Iso-Seq long-read, Illumina short-read and deepCAGE (Cap Analysis of Gene Expression) sequencing to globally resolve polyadenylated isoform structures in replicating Epstein-Barr virus (EBV). Our method, Transcriptome Resolution through Integration of Multi-platform Data (TRIMD), identifies nearly 300 novel EBV transcripts, quadrupling the size of the annotated viral transcriptome. These findings illustrate an array of mechanisms through which EBV achieves functional diversity in its relatively small, compact genome including programmed alternative splicing (e.g. across the IR1 repeats), alternative promoter usage by LMP2 and other latency-associated transcripts, intergenic splicing at the BZLF2 locus, and antisense transcription and pervasive readthrough transcription throughout the genome. PMID:27407110

  9. An Analysis of Adenovirus Genomes Using Whole Genome Software Tools

    PubMed Central

    Mahadevan, Padmanabhan

    2016-01-01

    The evolution of sequencing technology has lead to an enormous increase in the number of genomes that have been sequenced. This is especially true in the field of virus genomics. In order to extract meaningful biological information from these genomes, whole genome data mining software tools must be utilized. Hundreds of tools have been developed to analyze biological sequence data. However, only some of these tools are user-friendly to biologists. Several of these tools that have been successfully used to analyze adenovirus genomes are described here. These include Artemis, EMBOSS, pDRAW, zPicture, CoreGenes, GeneOrder, and PipMaker. These tools provide functionalities such as visualization, restriction enzyme analysis, alignment, and proteome comparisons that are extremely useful in the bioinformatics analysis of adenovirus genomes. PMID:28293072

  10. Genomic Resolution of Outbreak-Associated Legionella pneumophila Serogroup 1 Isolates from New York State

    PubMed Central

    Raphael, Brian H.; Baker, Deborah J.; Nazarian, Elizabeth; Lapierre, Pascal; Bopp, Dianna; Kozak-Muiznieks, Natalia A.; Morrison, Shatavia S.; Lucas, Claressa E.; Mercante, Jeffrey W.; Musser, Kimberlee A.

    2016-01-01

    ABSTRACT A total of 30 Legionella pneumophila serogroup 1 isolates representing 10 separate legionellosis laboratory investigations (“outbreaks”) that occurred in New York State between 2004 and 2012 were selected for evaluation of whole-genome sequencing (WGS) approaches for molecular subtyping of this organism. Clinical and environmental isolates were available for each outbreak and were initially examined by pulsed-field gel electrophoresis (PFGE). Sequence-based typing alleles were extracted from WGS data yielding complete sequence types (ST) for isolates representing 8 out of the 10 outbreaks evaluated in this study. Isolates from separate outbreaks sharing the same ST also contained the fewest differences in core genome single nucleotide polymorphisms (SNPs) and the greatest proportion of identical allele sequences in a whole-genome multilocus sequence typing (wgMLST) scheme. Both core SNP and wgMLST analyses distinguished isolates from separate outbreaks, including those from two outbreaks sharing indistinguishable PFGE profiles. Isolates from a hospital-associated outbreak spanning multiple years shared indistinguishable PFGE profiles but displayed differences in their genome sequences, suggesting the presence of multiple environmental sources. Finally, the rtx gene demonstrated differences in the repeat region sequence among ST1 isolates from different outbreaks, suggesting that variation in this gene may be useful for targeted molecular subtyping approaches for L. pneumophila. This study demonstrates the utility of various genome sequence analysis approaches for L. pneumophila for environmental source attribution studies while furthering the understanding of Legionella ecology. IMPORTANCE We demonstrate that whole-genome sequencing helps to improve resolution of Legionella pneumophila isolated during laboratory investigations of legionellosis compared to traditional subtyping methods. These data can be important in confirming the environmental sources

  11. Malignant canine mammary tumours: Preliminary genomic insights using oligonucleotide array comparative genomic hybridisation analysis.

    PubMed

    Santos, Marta; Dias-Pereira, Patrícia; Williams, Christina; Lopes, Carlos; Breen, Matthew

    2017-03-28

    Neoplastic mammary disease in female dogs represents a major health concern for dog owners and veterinarians, but the genomic basis of the disease is poorly understood. In this study, we performed high resolution oligonucleotide array comparative genomic hybridisation (oaCGH) to assess genome wide DNA copy number changes in 10 malignant canine mammary tumours from seven female dogs, including multiple tumours collected at one time from each of three female dogs. In all but two tumours, genomic imbalances were detected, with losses being more common than gains. Canine chromosomes 9, 22, 26, 27, 34 and X were most frequently affected. Dissimilar oaCGH ratio profiles were observed in multiple tumours from the same dogs, providing preliminary evidence for probable independent pathogenesis. Analysis of adjacent samples of one tumour revealed regional differences in the number of genomic imbalances, suggesting heterogeneity within tumours.

  12. A dictionary based informational genome analysis

    PubMed Central

    2012-01-01

    Background In the post-genomic era several methods of computational genomics are emerging to understand how the whole information is structured within genomes. Literature of last five years accounts for several alignment-free methods, arisen as alternative metrics for dissimilarity of biological sequences. Among the others, recent approaches are based on empirical frequencies of DNA k-mers in whole genomes. Results Any set of words (factors) occurring in a genome provides a genomic dictionary. About sixty genomes were analyzed by means of informational indexes based on genomic dictionaries, where a systemic view replaces a local sequence analysis. A software prototype applying a methodology here outlined carried out some computations on genomic data. We computed informational indexes, built the genomic dictionaries with different sizes, along with frequency distributions. The software performed three main tasks: computation of informational indexes, storage of these in a database, index analysis and visualization. The validation was done by investigating genomes of various organisms. A systematic analysis of genomic repeats of several lengths, which is of vivid interest in biology (for example to compute excessively represented functional sequences, such as promoters), was discussed, and suggested a method to define synthetic genetic networks. Conclusions We introduced a methodology based on dictionaries, and an efficient motif-finding software application for comparative genomics. This approach could be extended along many investigation lines, namely exported in other contexts of computational genomics, as a basis for discrimination of genomic pathologies. PMID:22985068

  13. BPGA- an ultra-fast pan-genome analysis pipeline.

    PubMed

    Chaudhari, Narendrakumar M; Gupta, Vinod Kumar; Dutta, Chitra

    2016-04-13

    Recent advances in ultra-high-throughput sequencing technology and metagenomics have led to a paradigm shift in microbial genomics from few genome comparisons to large-scale pan-genome studies at different scales of phylogenetic resolution. Pan-genome studies provide a framework for estimating the genomic diversity of the dataset, determining core (conserved), accessory (dispensable) and unique (strain-specific) gene pool of a species, tracing horizontal gene-flux across strains and providing insight into species evolution. The existing pan genome software tools suffer from various limitations like limited datasets, difficult installation/requirements, inadequate functional features etc. Here we present an ultra-fast computational pipeline BPGA (Bacterial Pan Genome Analysis tool) with seven functional modules. In addition to the routine pan genome analyses, BPGA introduces a number of novel features for downstream analyses like core/pan/MLST (Multi Locus Sequence Typing) phylogeny, exclusive presence/absence of genes in specific strains, subset analysis, atypical G + C content analysis and KEGG &COG mapping of core, accessory and unique genes. Other notable features include minimum running prerequisites, freedom to select the gene clustering method, ultra-fast execution, user friendly command line interface and high-quality graphics outputs. The performance of BPGA has been evaluated using a dataset of complete genome sequences of 28 Streptococcus pyogenes strains.

  14. BPGA- an ultra-fast pan-genome analysis pipeline

    PubMed Central

    Chaudhari, Narendrakumar M.; Gupta, Vinod Kumar; Dutta, Chitra

    2016-01-01

    Recent advances in ultra-high-throughput sequencing technology and metagenomics have led to a paradigm shift in microbial genomics from few genome comparisons to large-scale pan-genome studies at different scales of phylogenetic resolution. Pan-genome studies provide a framework for estimating the genomic diversity of the dataset, determining core (conserved), accessory (dispensable) and unique (strain-specific) gene pool of a species, tracing horizontal gene-flux across strains and providing insight into species evolution. The existing pan genome software tools suffer from various limitations like limited datasets, difficult installation/requirements, inadequate functional features etc. Here we present an ultra-fast computational pipeline BPGA (Bacterial Pan Genome Analysis tool) with seven functional modules. In addition to the routine pan genome analyses, BPGA introduces a number of novel features for downstream analyses like core/pan/MLST (Multi Locus Sequence Typing) phylogeny, exclusive presence/absence of genes in specific strains, subset analysis, atypical G + C content analysis and KEGG & COG mapping of core, accessory and unique genes. Other notable features include minimum running prerequisites, freedom to select the gene clustering method, ultra-fast execution, user friendly command line interface and high-quality graphics outputs. The performance of BPGA has been evaluated using a dataset of complete genome sequences of 28 Streptococcus pyogenes strains. PMID:27071527

  15. A High-Resolution Map of Segmental DNA Copy Number Variation in the Mouse Genome

    PubMed Central

    Graubert, Timothy A; Selzer, Rebecca R; Richmond, Todd A; Eis, Peggy S; Shannon, William D; Li, Xia; McLeod, Howard L; Cheverud, James M; Ley, Timothy J

    2007-01-01

    Submicroscopic (less than 2 Mb) segmental DNA copy number changes are a recently recognized source of genetic variability between individuals. The biological consequences of copy number variants (CNVs) are largely undefined. In some cases, CNVs that cause gene dosage effects have been implicated in phenotypic variation. CNVs have been detected in diverse species, including mice and humans. Published studies in mice have been limited by resolution and strain selection. We chose to study 21 well-characterized inbred mouse strains that are the focus of an international effort to measure, catalog, and disseminate phenotype data. We performed comparative genomic hybridization using long oligomer arrays to characterize CNVs in these strains. This technique increased the resolution of CNV detection by more than an order of magnitude over previous methodologies. The CNVs range in size from 21 to 2,002 kb. Clustering strains by CNV profile recapitulates aspects of the known ancestry of these strains. Most of the CNVs (77.5%) contain annotated genes, and many (47.5%) colocalize with previously mapped segmental duplications in the mouse genome. We demonstrate that this technique can identify copy number differences associated with known polymorphic traits. The phenotype of previously uncharacterized strains can be predicted based on their copy number at these loci. Annotation of CNVs in the mouse genome combined with sequence-based analysis provides an important resource that will help define the genetic basis of complex traits. PMID:17206864

  16. Competitive PCR-High Resolution Melting Analysis (C-PCR-HRMA) for large genomic rearrangements (LGRs) detection: A new approach to assess quantitative status of BRCA1 gene in a reference laboratory.

    PubMed

    Minucci, Angelo; De Paolis, Elisa; Concolino, Paola; De Bonis, Maria; Rizza, Roberta; Canu, Giulia; Scaglione, Giovanni Luca; Mignone, Flavio; Scambia, Giovanni; Zuppi, Cecilia; Capoluongo, Ettore

    2017-07-01

    Evaluation of copy number variation (CNV) in BRCA1/2 genes, due to large genomic rearrangements (LGRs), is a mandatory analysis in hereditary breast and ovarian cancers families, if no pathogenic variants are found by sequencing. LGRs cannot be detected by conventional methods and several alternative methods have been developed. Since these approaches are expensive and time consuming, identification of alternative screening methods for LGRs detection is needed in order to reduce and optimize the diagnostic procedure. The aim of this study was to investigate a Competitive PCR-High Resolution Melting Analysis (C-PCR-HRMA) as molecular tool to detect recurrent BRCA1 LGRs. C-PCR-HRMA was performed on exons 3, 14, 18, 19, 20 and 21 of the BRCA1 gene; exons 4, 6 and 7 of the ALB gene were used as reference fragments. This study showed that it is possible to identify recurrent BRCA1 LGRs, by melting peak height ratio between target (BRCA1) and reference (ALB) fragments. Furthermore, we underline that a peculiar amplicon-melting profile is associated to a specific BRCA1 LGR. All C-PCR-HRMA results were confirmed by Multiplex ligation-dependent probe amplification. C-PCR-HRMA has proved to be an innovative, efficient and fast method for BRCA1 LGRs detection. Given the sensitivity, specificity and ease of use, c-PCR-HRMA can be considered an attractive and powerful alternative to other methods for BRCA1 CNVs screening, improving molecular strategies for BRCA testing in the context of Massive Parallel Sequencing. Copyright © 2017 Elsevier B.V. All rights reserved.

  17. Evaluation of high-resolution microarray platforms for genomic profiling of bone tumours

    PubMed Central

    2010-01-01

    Background Several high-density oligonucleotide microarray platforms are available for genome-wide single nucleotide polymorphism (SNP) detection and microarray-based comparative genomic hybridisation (array CGH), which may be used to detect copy number aberrations in human tumours. As part of the EuroBoNeT network of excellence for research on bone tumours (eurobonet.eu), we have evaluated four different commercial high-resolution microarray platforms in order to identify the most appropriate technology for mapping DNA copy number aberrations in such tumours. Findings DNA from two different cytogenetically well-characterized bone sarcoma cell lines, representing a simple and a complex karyotype, respectively, was tested in duplicate on four high-resolution microarray platforms; Affymetrix Genome-Wide Human SNP Array 6.0, Agilent Human Genome CGH 244A, Illumina HumanExon510s-duo and Nimblegen HG18 CGH 385 k WG tiling v1.0. The data was analysed using the platform-specific analysis software, as well as a platform-independent analysis algorithm. DNA copy number was measured at six specific chromosomes or chromosomal regions, and compared with the expected ratio based on available cytogenetic information. All platforms performed well in terms of reproducibility and were able to delimit and score small amplifications and deletions at similar resolution, but Agilent microarrays showed better linearity and dynamic range. The platform-specific analysis software provided with each platform identified in general correct copy numbers, whereas using a platform-independent analysis algorithm, correct copy numbers were determined mainly for Agilent and Affymetrix microarrays. Conclusions All platforms performed reasonably well, but Agilent microarrays showed better dynamic range, and like Affymetrix microarrays performed well with the platform-independent analysis software, implying more robust data. Bone tumours like osteosarcomas are heterogeneous tumours with complex

  18. HIGH-RESOLUTION GENOMIC ARRAYS FACILITATE DETECTION OF NOVEL CRYPTIC CHROMOSOMAL LESIONS IN MYELODYSPLASTIC SYNDROMES

    PubMed Central

    O’Keefe, Christine L.; Tiu, Ramon; Gondek, Lukasz P.; Powers, Jennifer; Theil, Karl S.; Kalaycio, Matt; Lichtin, Alan; Sekeres, Mikkael A.; Maciejewski, Jaroslaw P.

    2008-01-01

    Objective Unbalanced chromosomal aberrations are common in myelodysplastic syndromes, and have prognostic implications. An increased frequency of cytogenetic changes may reflect an inherent chromosomal instability due to failure of DNA repair. Therefore, it is likely that chromosomal defects in myelodysplastic syndromes may be more frequent than predicted by metaphase cytogenetics and new cryptic lesions may be revealed by precise analysis methods. Methods We used a novel high-resolution karyotyping technique, array-based comparative genomic hybridization, to investigate the frequency of cryptic chromosomal lesions in a cohort of 38 well-characterized myelodysplastic syndromes patients; results were confirmed by microsatellite quantitative PCR or single nucleotide polymorphism analysis. Results As compared to metaphase karyotyping, chromosomal abnormalities detected by array-based analysis were encountered more frequently and in a higher proportion of patients. For example, chromosomal defects were found in patients with a normal karyotype by traditional cytogenetics. In addition to verifying common abnormalities, previously cryptic defects were found in new regions of the genome. Cryptic changes often overlapped chromosomes and regions frequently identified as abnormal by metaphase cytogenetics. Conclusion The results underscore the instability of the myelodysplastic syndromes genome and highlight the utility of array-based karyotyping to study cryptic chromosomal changes which may provide new diagnostic information. PMID:17258073

  19. Toward high-resolution population genomics using archaeological samples

    PubMed Central

    Morozova, Irina; Flegontov, Pavel; Mikheyev, Alexander S.; Bruskin, Sergey; Asgharian, Hosseinali; Ponomarenko, Petr; Klyuchnikov, Vladimir; ArunKumar, GaneshPrasad; Prokhortchouk, Egor; Gankin, Yuriy; Rogaev, Evgeny; Nikolsky, Yuri; Baranova, Ancha; Elhaik, Eran; Tatarinova, Tatiana V.

    2016-01-01

    The term ‘ancient DNA’ (aDNA) is coming of age, with over 1,200 hits in the PubMed database, beginning in the early 1980s with the studies of ‘molecular paleontology’. Rooted in cloning and limited sequencing of DNA from ancient remains during the pre-PCR era, the field has made incredible progress since the introduction of PCR and next-generation sequencing. Over the last decade, aDNA analysis ushered in a new era in genomics and became the method of choice for reconstructing the history of organisms, their biogeography, and migration routes, with applications in evolutionary biology, population genetics, archaeogenetics, paleo-epidemiology, and many other areas. This change was brought by development of new strategies for coping with the challenges in studying aDNA due to damage and fragmentation, scarce samples, significant historical gaps, and limited applicability of population genetics methods. In this review, we describe the state-of-the-art achievements in aDNA studies, with particular focus on human evolution and demographic history. We present the current experimental and theoretical procedures for handling and analysing highly degraded aDNA. We also review the challenges in the rapidly growing field of ancient epigenomics. Advancement of aDNA tools and methods signifies a new era in population genetics and evolutionary medicine research. PMID:27436340

  20. Toward high-resolution population genomics using archaeological samples.

    PubMed

    Morozova, Irina; Flegontov, Pavel; Mikheyev, Alexander S; Bruskin, Sergey; Asgharian, Hosseinali; Ponomarenko, Petr; Klyuchnikov, Vladimir; ArunKumar, GaneshPrasad; Prokhortchouk, Egor; Gankin, Yuriy; Rogaev, Evgeny; Nikolsky, Yuri; Baranova, Ancha; Elhaik, Eran; Tatarinova, Tatiana V

    2016-08-01

    The term 'ancient DNA' (aDNA) is coming of age, with over 1,200 hits in the PubMed database, beginning in the early 1980s with the studies of 'molecular paleontology'. Rooted in cloning and limited sequencing of DNA from ancient remains during the pre-PCR era, the field has made incredible progress since the introduction of PCR and next-generation sequencing. Over the last decade, aDNA analysis ushered in a new era in genomics and became the method of choice for reconstructing the history of organisms, their biogeography, and migration routes, with applications in evolutionary biology, population genetics, archaeogenetics, paleo-epidemiology, and many other areas. This change was brought by development of new strategies for coping with the challenges in studying aDNA due to damage and fragmentation, scarce samples, significant historical gaps, and limited applicability of population genetics methods. In this review, we describe the state-of-the-art achievements in aDNA studies, with particular focus on human evolution and demographic history. We present the current experimental and theoretical procedures for handling and analysing highly degraded aDNA. We also review the challenges in the rapidly growing field of ancient epigenomics. Advancement of aDNA tools and methods signifies a new era in population genetics and evolutionary medicine research.

  1. High resolution genetic mapping by genome sequencing reveals genome duplication and tetraploid genetic structure of the diploid Miscanthus sinensis.

    PubMed

    Ma, Xue-Feng; Jensen, Elaine; Alexandrov, Nickolai; Troukhan, Maxim; Zhang, Liping; Thomas-Jones, Sian; Farrar, Kerrie; Clifton-Brown, John; Donnison, Iain; Swaller, Timothy; Flavell, Richard

    2012-01-01

    We have created a high-resolution linkage map of Miscanthus sinensis, using genotyping-by-sequencing (GBS), identifying all 19 linkage groups for the first time. The result is technically significant since Miscanthus has a very large and highly heterozygous genome, but has no or limited genomics information to date. The composite linkage map containing markers from both parental linkage maps is composed of 3,745 SNP markers spanning 2,396 cM on 19 linkage groups with a 0.64 cM average resolution. Comparative genomics analyses of the M. sinensis composite linkage map to the genomes of sorghum, maize, rice, and Brachypodium distachyon indicate that sorghum has the closest syntenic relationship to Miscanthus compared to other species. The comparative results revealed that each pair of the 19 M. sinensis linkages aligned to one sorghum chromosome, except for LG8, which mapped to two sorghum chromosomes (4 and 7), presumably due to a chromosome fusion event after genome duplication. The data also revealed several other chromosome rearrangements relative to sorghum, including two telomere-centromere inversions of the sorghum syntenic chromosome 7 in LG8 of M. sinensis and two paracentric inversions of sorghum syntenic chromosome 4 in LG7 and LG8 of M. sinensis. The results clearly demonstrate, for the first time, that the diploid M. sinensis is tetraploid origin consisting of two sub-genomes. This complete and high resolution composite linkage map will not only serve as a useful resource for novel QTL discoveries, but also enable informed deployment of the wealth of existing genomics resources of other species to the improvement of Miscanthus as a high biomass energy crop. In addition, it has utility as a reference for genome sequence assembly for the forthcoming whole genome sequencing of the Miscanthus genus.

  2. High-Resolution Comparative Genomic Hybridization of Inflammatory Breast Cancer and Identification of Candidate Genes

    PubMed Central

    Adelaïde, José; Ferrari, Anthony; Tarpin, Carole; Charafe-Jauffret, Emmanuelle; Charpin, Colette; Houvenaeghel, Gilles; Jacquemier, Jocelyne; Bidaut, Ghislain; Birnbaum, Daniel; Viens, Patrice; Chaffanet, Max; Bertucci, François

    2011-01-01

    Background Inflammatory breast cancer (IBC) is an aggressive form of BC poorly defined at the molecular level. We compared the molecular portraits of 63 IBC and 134 non-IBC (nIBC) clinical samples. Methodology/Findings Genomic imbalances of 49 IBCs and 124 nIBCs were determined using high-resolution array-comparative genomic hybridization, and mRNA expression profiles of 197 samples using whole-genome microarrays. Genomic profiles of IBCs were as heterogeneous as those of nIBCs, and globally relatively close. However, IBCs showed more frequent “complex” patterns and a higher percentage of genes with CNAs per sample. The number of altered regions was similar in both types, although some regions were altered more frequently and/or with higher amplitude in IBCs. Many genes were similarly altered in both types; however, more genes displayed recurrent amplifications in IBCs. The percentage of genes whose mRNA expression correlated with CNAs was similar in both types for the gained genes, but ∼7-fold lower in IBCs for the lost genes. Integrated analysis identified 24 potential candidate IBC-specific genes. Their combined expression accurately distinguished IBCs and nIBCS in an independent validation set, and retained an independent prognostic value in a series of 1,781 nIBCs, reinforcing the hypothesis for a link with IBC aggressiveness. Consistent with the hyperproliferative and invasive phenotype of IBC these genes are notably involved in protein translation, cell cycle, RNA processing and transcription, metabolism, and cell migration. Conclusions Our results suggest a higher genomic instability of IBC. We established the first repertory of DNA copy number alterations in this tumor, and provided a list of genes that may contribute to its aggressiveness and represent novel therapeutic targets. PMID:21339811

  3. Increasing phylogenetic resolution at low taxonomic levels using massively parallel sequencing of chloroplast genomes

    PubMed Central

    2009-01-01

    Background Molecular evolutionary studies share the common goal of elucidating historical relationships, and the common challenge of adequately sampling taxa and characters. Particularly at low taxonomic levels, recent divergence, rapid radiations, and conservative genome evolution yield limited sequence variation, and dense taxon sampling is often desirable. Recent advances in massively parallel sequencing make it possible to rapidly obtain large amounts of sequence data, and multiplexing makes extensive sampling of megabase sequences feasible. Is it possible to efficiently apply massively parallel sequencing to increase phylogenetic resolution at low taxonomic levels? Results We reconstruct the infrageneric phylogeny of Pinus from 37 nearly-complete chloroplast genomes (average 109 kilobases each of an approximately 120 kilobase genome) generated using multiplexed massively parallel sequencing. 30/33 ingroup nodes resolved with ≥ 95% bootstrap support; this is a substantial improvement relative to prior studies, and shows massively parallel sequencing-based strategies can produce sufficient high quality sequence to reach support levels originally proposed for the phylogenetic bootstrap. Resampling simulations show that at least the entire plastome is necessary to fully resolve Pinus, particularly in rapidly radiating clades. Meta-analysis of 99 published infrageneric phylogenies shows that whole plastome analysis should provide similar gains across a range of plant genera. A disproportionate amount of phylogenetic information resides in two loci (ycf1, ycf2), highlighting their unusual evolutionary properties. Conclusion Plastome sequencing is now an efficient option for increasing phylogenetic resolution at lower taxonomic levels in plant phylogenetic and population genetic analyses. With continuing improvements in sequencing capacity, the strategies herein should revolutionize efforts requiring dense taxon and character sampling, such as phylogeographic

  4. Chromosomes in the flow to simplify genome analysis.

    PubMed

    Doležel, Jaroslav; Vrána, Jan; Safář, Jan; Bartoš, Jan; Kubaláková, Marie; Simková, Hana

    2012-08-01

    Nuclear genomes of human, animals, and plants are organized into subunits called chromosomes. When isolated into aqueous suspension, mitotic chromosomes can be classified using flow cytometry according to light scatter and fluorescence parameters. Chromosomes of interest can be purified by flow sorting if they can be resolved from other chromosomes in a karyotype. The analysis and sorting are carried out at rates of 10(2)-10(4) chromosomes per second, and for complex genomes such as wheat the flow sorting technology has been ground-breaking in reducing genome complexity for genome sequencing. The high sample rate provides an attractive approach for karyotype analysis (flow karyotyping) and the purification of chromosomes in large numbers. In characterizing the chromosome complement of an organism, the high number that can be studied using flow cytometry allows for a statistically accurate analysis. Chromosome sorting plays a particularly important role in the analysis of nuclear genome structure and the analysis of particular and aberrant chromosomes. Other attractive but not well-explored features include the analysis of chromosomal proteins, chromosome ultrastructure, and high-resolution mapping using FISH. Recent results demonstrate that chromosome flow sorting can be coupled seamlessly with DNA array and next-generation sequencing technologies for high-throughput analyses. The main advantages are targeting the analysis to a genome region of interest and a significant reduction in sample complexity. As flow sorters can also sort single copies of chromosomes, shotgun sequencing DNA amplified from them enables the production of haplotype-resolved genome sequences. This review explains the principles of flow cytometric chromosome analysis and sorting (flow cytogenetics), discusses the major uses of this technology in genome analysis, and outlines future directions.

  5. High-resolution genome-wide mapping of histone modifications.

    PubMed

    Roh, Tae-young; Ngau, Wing Chi; Cui, Kairong; Landsman, David; Zhao, Keji

    2004-08-01

    The expression patterns of eukaryotic genomes are controlled by their chromatin structure, consisting of nucleosome subunits in which DNA of approximately 146 bp is wrapped around a core of 8 histone molecules. Post-translational histone modifications play an essential role in modifying chromatin structure. Here we apply a combination of SAGE and chromatin immunoprecipitation (ChIP) protocols to determine the distribution of hyperacetylated histones H3 and H4 in the Saccharomyces cerevisiae genome. We call this approach genome-wide mapping technique (GMAT). Using GMAT, we find that the highest acetylation levels are detected in the 5' end of a gene's coding region, but not in the promoter. Furthermore, we show that the histone acetyltransferase, GCN5p, regulates H3 acetylation in the promoter and 5' end of the coding regions. These findings indicate that GMAT should find valuable applications in mapping target sites of chromatin-modifying enzymes.

  6. Mitochondrial Genome Sequences and Structures Aid in the Resolution of Piroplasmida phylogeny.

    PubMed

    Schreeg, Megan E; Marr, Henry S; Tarigo, Jaime L; Cohn, Leah A; Bird, David M; Scholl, Elizabeth H; Levy, Michael G; Wiegmann, Brian M; Birkenheuer, Adam J

    2016-01-01

    The taxonomy of the order Piroplasmida, which includes a number of clinically and economically relevant organisms, is a hotly debated topic amongst parasitologists. Three genera (Babesia, Theileria, and Cytauxzoon) are recognized based on parasite life cycle characteristics, but molecular phylogenetic analyses of 18S sequences have suggested the presence of five or more distinct Piroplasmida lineages. Despite these important advancements, a few studies have been unable to define the taxonomic relationships of some organisms (e.g. C. felis and T. equi) with respect to other Piroplasmida. Additional evidence from mitochondrial genome sequences and synteny should aid in the inference of Piroplasmida phylogeny and resolution of taxonomic uncertainties. In this study, we have amplified, sequenced, and annotated seven previously uncharacterized mitochondrial genomes (Babesia canis, Babesia vogeli, Babesia rossi, Babesia sp. Coco, Babesia conradae, Babesia microti-like sp., and Cytauxzoon felis) and identified additional ribosomal fragments in ten previously characterized mitochondrial genomes. Phylogenetic analysis of concatenated mitochondrial and 18S sequences as well as cox1 amino acid sequence identified five distinct Piroplasmida groups, each of which possesses a unique mitochondrial genome structure. Specifically, our results confirm the existence of four previously identified clades (B. microti group, Babesia sensu stricto, Theileria equi, and a Babesia sensu latu group that includes B. conradae) while supporting the integration of Theileria and Cytauxzoon species into a single fifth taxon. Although known biological characteristics of Piroplasmida corroborate the proposed phylogeny, more investigation into parasite life cycles is warranted to further understand the evolution of the Piroplasmida. Our results provide an evolutionary framework for comparative biology of these important animal and human pathogens and help focus renewed efforts toward understanding the

  7. Mitochondrial Genome Sequences and Structures Aid in the Resolution of Piroplasmida phylogeny

    PubMed Central

    Marr, Henry S.; Tarigo, Jaime L.; Cohn, Leah A.; Bird, David M.; Scholl, Elizabeth H.; Levy, Michael G.; Wiegmann, Brian M.; Birkenheuer, Adam J.

    2016-01-01

    The taxonomy of the order Piroplasmida, which includes a number of clinically and economically relevant organisms, is a hotly debated topic amongst parasitologists. Three genera (Babesia, Theileria, and Cytauxzoon) are recognized based on parasite life cycle characteristics, but molecular phylogenetic analyses of 18S sequences have suggested the presence of five or more distinct Piroplasmida lineages. Despite these important advancements, a few studies have been unable to define the taxonomic relationships of some organisms (e.g. C. felis and T. equi) with respect to other Piroplasmida. Additional evidence from mitochondrial genome sequences and synteny should aid in the inference of Piroplasmida phylogeny and resolution of taxonomic uncertainties. In this study, we have amplified, sequenced, and annotated seven previously uncharacterized mitochondrial genomes (Babesia canis, Babesia vogeli, Babesia rossi, Babesia sp. Coco, Babesia conradae, Babesia microti-like sp., and Cytauxzoon felis) and identified additional ribosomal fragments in ten previously characterized mitochondrial genomes. Phylogenetic analysis of concatenated mitochondrial and 18S sequences as well as cox1 amino acid sequence identified five distinct Piroplasmida groups, each of which possesses a unique mitochondrial genome structure. Specifically, our results confirm the existence of four previously identified clades (B. microti group, Babesia sensu stricto, Theileria equi, and a Babesia sensu latu group that includes B. conradae) while supporting the integration of Theileria and Cytauxzoon species into a single fifth taxon. Although known biological characteristics of Piroplasmida corroborate the proposed phylogeny, more investigation into parasite life cycles is warranted to further understand the evolution of the Piroplasmida. Our results provide an evolutionary framework for comparative biology of these important animal and human pathogens and help focus renewed efforts toward understanding the

  8. Large-scale genomic analysis of ovarian carcinomas.

    PubMed

    Gorringe, Kylie L; Campbell, Ian G

    2009-04-01

    Epithelial ovarian cancers are typified by frequent genomic aberrations that have been difficult to unravel. Recently, high-resolution array technologies have provided the first glimpse of the remarkable complexity of these aberrations with some ovarian cancers containing hundreds of copy number breakpoints, micro-deletions and amplifications. Many of these alterations contain cancer-related genes suggesting that the majority is disease-associated and not just the product of random genomic instability. Future developments such as next-generation sequencing and integrated analysis of data from multiple array platforms on large numbers of samples are poised to revolutionize our understanding of this complex disease.

  9. Genomic location analysis by ChIP-Seq

    PubMed Central

    Barski, Artem; Zhao, Keji

    2013-01-01

    The interaction of a multitude of transcription factors and other chromatin proteins with the genome can influence gene expression and subsequently cell differentiation and function. Thus systematic identification of binding targets of transcription factors is key to unraveling gene regulation networks. The recent development of ChIP-Seq has revolutionized mapping of DNA-protein interactions. Now protein binding can be mapped in a truly genome-wide manner with extremely high resolution. This review discusses ChIP-Seq technology, its possible pitfalls, data analysis and several early applications of the ChIP-Seq technology. PMID:19173299

  10. Bridging the Resolution Gap in Structural Modeling of 3D Genome Organization

    PubMed Central

    Marti-Renom, Marc A.; Mirny, Leonid A.

    2011-01-01

    Over the last decade, and especially after the advent of fluorescent in situ hybridization imaging and chromosome conformation capture methods, the availability of experimental data on genome three-dimensional organization has dramatically increased. We now have access to unprecedented details of how genomes organize within the interphase nucleus. Development of new computational approaches to leverage this data has already resulted in the first three-dimensional structures of genomic domains and genomes. Such approaches expand our knowledge of the chromatin folding principles, which has been classically studied using polymer physics and molecular simulations. Our outlook describes computational approaches for integrating experimental data with polymer physics, thereby bridging the resolution gap for structural determination of genomes and genomic domains. PMID:21779160

  11. High-resolution genomic profiling of human papillomavirus-associated vulval neoplasia

    PubMed Central

    Purdie, K J; Harwood, C A; Gibbon, K; Chaplin, T; Young, B D; Cazier, J B; Singh, N; Leigh, I M; Proby, C M

    2010-01-01

    Background: The incidence of human papillomavirus-associated vulval neoplasia is increasing worldwide; yet the associated genetic changes remain poorly understood. Methods: We have used single-nucleotide polymorphism microarray analysis to perform the first high-resolution investigation of genome-wide allelic imbalance in vulval neoplasia. Our sample series comprised 21 high-grade vulval intraepithelial neoplasia and 6 vulval squamous cell carcinomas, with paired non-lesional samples used to adjust for normal copy number variation. Results: Overall the most common recurrent aberrations were gains at 1p and 20, with the most frequent deletions observed at 2q, 3p and 10. Copy-neutral loss of heterozygosity at 6p was a recurrent event in vulval intraepithelial neoplasia. The pattern of genetic alterations differed from the characteristic changes we previously identified in cutaneous squamous cell carcinomas. Vulval neoplasia samples did not exhibit gain at 5p, a frequent recurrent aberration in a series of cervical tumours analysed elsewhere using an identical protocol. Conclusion: This series of 27 vulval samples comprises the largest systematic genome-wide analysis of vulval neoplasia performed to date. Despite shared papillomavirus status and regional proximity, our data suggest that the frequency of certain genetic alterations may differ in vulval and cervical tumours. PMID:20234371

  12. High-resolution analysis of Mammalian DNA replication units.

    PubMed

    Chagin, Vadim O; Reinhart, Marius; Cardoso, M Cristina

    2015-01-01

    Genomic DNA of a eukaryotic cell is replicated once during the S-phase of the cell cycle to precisely maintain the complete genetic information. In the course of S-phase, semiconservative DNA synthesis is sequentially initiated and performed at thousands of discrete patches of the DNA helix termed replicons. At any given moment of S-phase, multiple replicons are active in parallel in different parts of the genome. In the last decades, tools and methods to visualize DNA synthesis inside cells have been developed. Pulse labeling with nucleotides as well as detecting components of the replication machinery yielded an overall picture of multiple discrete sites of active DNA synthesis termed replication foci (RFi) and forming spatiotemporal patterns within the cell nucleus. Recent advances in fluorescence microscopy and digital imaging in combination with computational image analysis allow a comprehensive quantitative analysis of RFi and provide valuable insights into the organization of the genomic DNA replication process and also of the genome itself. In this chapter, we describe in detail protocols for the visualization and quantification of RFi at different levels of optical and physical resolution.

  13. Comparative genomic analysis of esophageal cancers.

    PubMed

    Caygill, Christine P J; Gatenby, Piers A C; Herceg, Zdenko; Lima, Sheila C S; Pinto, Luis F R; Watson, Anthony; Wu, Ming-Shiang

    2014-09-01

    The following, from the 12th OESO World Conference: Cancers of the Esophagus, includes commentaries on comparative genomic analysis of esophageal cancers: genomic polymorphisms, the genetic and epigenetic drivers in esophageal cancers, and the collection of data in the UK Barrett's Oesophagus Registry.

  14. Applications of the polymerase chain reaction to genome analysis

    SciTech Connect

    Rose, E.A. )

    1991-01-01

    The objectives of the Human Genome Project are to create high-resolution genetic and physical maps, and ultimately to determine the complete nucleotide sequence of the human genome. The result of this initiative will be to localize the estimated 50,000-100,000 human genes, and acquire information that will enable development of a better understanding of the relationship between genome structure and function. To achieve these goals, new methodologies that provide more rapid, efficient, and cost effective means of genomic analysis will be required. From both conceptual and practical perspectives, the polymerase chain reaction (PCR) represents a fundamental technology for genome mapping and sequencing. The availability of PCR has allowed definition of a technically credible form that the final composite map of the human genome will take, as described in the sequence-tagged site proposal. Moreover, applications of PCR have provided efficient approaches for identifying, isolating, mapping, and sequencing DNA, many of which are amenable to automation. The versatility and power provided by PCR have encouraged its involvement in almost every aspect of human genome research, with new applications of PCR being developed on a continual basis.

  15. High-resolution genomic profiling of chronic lymphocytic leukemia reveals new recurrent genomic alterations.

    PubMed

    Edelmann, Jennifer; Holzmann, Karlheinz; Miller, Florian; Winkler, Dirk; Bühler, Andreas; Zenz, Thorsten; Bullinger, Lars; Kühn, Michael W M; Gerhardinger, Andreas; Bloehdorn, Johannes; Radtke, Ina; Su, Xiaoping; Ma, Jing; Pounds, Stanley; Hallek, Michael; Lichter, Peter; Korbel, Jan; Busch, Raymonde; Mertens, Daniel; Downing, James R; Stilgenbauer, Stephan; Döhner, Hartmut

    2012-12-06

    To identify genomic alterations in chronic lymphocytic leukemia (CLL), we performed single-nucleotide polymorphism-array analysis using Affymetrix Version 6.0 on 353 samples from untreated patients entered in the CLL8 treatment trial. Based on paired-sample analysis (n = 144), a mean of 1.8 copy number alterations per patient were identified; approximately 60% of patients carried no copy number alterations other than those detected by fluorescence in situ hybridization analysis. Copy-neutral loss-of-heterozygosity was detected in 6% of CLL patients and was found most frequently on 13q, 17p, and 11q. Minimally deleted regions were refined on 13q14 (deleted in 61% of patients) to the DLEU1 and DLEU2 genes, on 11q22.3 (27% of patients) to ATM, on 2p16.1-2p15 (gained in 7% of patients) to a 1.9-Mb fragment containing 9 genes, and on 8q24.21 (5% of patients) to a segment 486 kb proximal to the MYC locus. 13q deletions exhibited proximal and distal breakpoint cluster regions. Among the most common novel lesions were deletions at 15q15.1 (4% of patients), with the smallest deletion (70.48 kb) found in the MGA locus. Sequence analysis of MGA in 59 samples revealed a truncating mutation in one CLL patient lacking a 15q deletion. MNT at 17p13.3, which in addition to MGA and MYC encodes for the network of MAX-interacting proteins, was also deleted recurrently.

  16. Genome sequence and analysis of Lactobacillus helveticus

    PubMed Central

    Cremonesi, Paola; Chessa, Stefania; Castiglioni, Bianca

    2013-01-01

    The microbiological characterization of lactobacilli is historically well developed, but the genomic analysis is recent. Because of the widespread use of Lactobacillus helveticus in cheese technology, information concerning the heterogeneity in this species is accumulating rapidly. Recently, the genome of five L. helveticus strains was sequenced to completion and compared with other genomically characterized lactobacilli. The genomic analysis of the first sequenced strain, L. helveticus DPC 4571, isolated from cheese and selected for its characteristics of rapid lysis and high proteolytic activity, has revealed a plethora of genes with industrial potential including those responsible for key metabolic functions such as proteolysis, lipolysis, and cell lysis. These genes and their derived enzymes can facilitate the production of cheese and cheese derivatives with potential for use as ingredients in consumer foods. In addition, L. helveticus has the potential to produce peptides with a biological function, such as angiotensin converting enzyme (ACE) inhibitory activity, in fermented dairy products, demonstrating the therapeutic value of this species. A most intriguing feature of the genome of L. helveticus is the remarkable similarity in gene content with many intestinal lactobacilli. Comparative genomics has allowed the identification of key gene sets that facilitate a variety of lifestyles including adaptation to food matrices or the gastrointestinal tract. As genome sequence and functional genomic information continues to explode, key features of the genomes of L. helveticus strains continue to be discovered, answering many questions but also raising many new ones. PMID:23335916

  17. High-resolution SAR ATR performance analysis

    NASA Astrophysics Data System (ADS)

    Douglas, Joel; Burke, Monica; Ettinger, Gil J.

    2004-09-01

    High resolution Synthetic Aperture Radar (SAR) imagery (e.g., four inch or better resolution) contains features not seen in one foot or lower resolution imagery, due to the isolation of the scatterers into separate resolution cells. These features provide the potential for additional discrimination power for Automatic Target Recognition (ATR) systems. In this paper, we analyze the performance of the Real-Time MSTAR (RT-MSTAR) system as a function of image resolution. Performance is measured both in terms of the probability of correct identification on military targets, and also in terms of confuser rejection. The analysis demonstrates two factors that significantly enhance performance. First, use of the high resolution imagery results in much higher probability of correct identification, as demonstrated using Lynx SAR imagery at 4" and 12". Second, incorporating models of the confusers, when available, greatly reduces false alarms, even at higher resolutions. Several new areas of work emerge, including making use of higher-level feature information available in the imagery, and rapid creation of models for vehicles that pose particular confuser rejection challenges.

  18. Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions

    PubMed Central

    Sims, Gregory E.; Jun, Se-Ran; Wu, Guohong A.; Kim, Sung-Hou

    2009-01-01

    For comparison of whole-genome (genic + nongenic) sequences, multiple sequence alignment of a few selected genes is not appropriate. One approach is to use an alignment-free method in which feature (or l-mer) frequency profiles (FFP) of whole genomes are used for comparison—a variation of a text or book comparison method, using word frequency profiles. In this approach it is critical to identify the optimal resolution range of l-mers for the given set of genomes compared. The optimum FFP method is applicable for comparing whole genomes or large genomic regions even when there are no common genes with high homology. We outline the method in 3 stages: (i) We first show how the optimal resolution range can be determined with English books which have been transformed into long character strings by removing all punctuation and spaces. (ii) Next, we test the robustness of the optimized FFP method at the nucleotide level, using a mutation model with a wide range of base substitutions and rearrangements. (iii) Finally, to illustrate the utility of the method, phylogenies are reconstructed from concatenated mammalian intronic genomes; the FFP derived intronic genome topologies for each l within the optimal range are all very similar. The topology agrees with the established mammalian phylogeny revealing that intron regions contain a similar level of phylogenic signal as do coding regions. PMID:19188606

  19. Genome Sequencing and Analysis Conference IV

    SciTech Connect

    Not Available

    1993-12-31

    J. Craig Venter and C. Thomas Caskey co-chaired Genome Sequencing and Analysis Conference IV held at Hilton Head, South Carolina from September 26--30, 1992. Venter opened the conference by noting that approximately 400 researchers from 16 nations were present four times as many participants as at Genome Sequencing Conference I in 1989. Venter also introduced the Data Fair, a new component of the conference allowing exchange and on-site computer analysis of unpublished sequence data.

  20. Whole-Genome Sequencing in Outbreak Analysis

    PubMed Central

    Turner, Stephen D.; Riley, Margaret F.; Petri, William A.; Hewlett, Erik L.

    2015-01-01

    SUMMARY In addition to the ever-present concern of medical professionals about epidemics of infectious diseases, the relative ease of access and low cost of obtaining, producing, and disseminating pathogenic organisms or biological toxins mean that bioterrorism activity should also be considered when facing a disease outbreak. Utilization of whole-genome sequencing (WGS) in outbreak analysis facilitates the rapid and accurate identification of virulence factors of the pathogen and can be used to identify the path of disease transmission within a population and provide information on the probable source. Molecular tools such as WGS are being refined and advanced at a rapid pace to provide robust and higher-resolution methods for identifying, comparing, and classifying pathogenic organisms. If these methods of pathogen characterization are properly applied, they will enable an improved public health response whether a disease outbreak was initiated by natural events or by accidental or deliberate human activity. The current application of next-generation sequencing (NGS) technology to microbial WGS and microbial forensics is reviewed. PMID:25876885

  1. Comparative Genome Analysis of Enterobacter cloacae

    PubMed Central

    Liu, Wing-Yee; Wong, Chi-Fat; Chung, Karl Ming-Kar; Jiang, Jing-Wei; Leung, Frederick Chi-Ching

    2013-01-01

    The Enterobacter cloacae species includes an extremely diverse group of bacteria that are associated with plants, soil and humans. Publication of the complete genome sequence of the plant growth-promoting endophytic E. cloacae subsp. cloacae ENHKU01 provided an opportunity to perform the first comparative genome analysis between strains of this dynamic species. Examination of the pan-genome of E. cloacae showed that the conserved core genome retains the general physiological and survival genes of the species, while genomic factors in plasmids and variable regions determine the virulence of the human pathogenic E. cloacae strain; additionally, the diversity of fimbriae contributes to variation in colonization and host determination of different E. cloacae strains. Comparative genome analysis further illustrated that E. cloacae strains possess multiple mechanisms for antagonistic action against other microorganisms, which involve the production of siderophores and various antimicrobial compounds, such as bacteriocins, chitinases and antibiotic resistance proteins. The presence of Type VI secretion systems is expected to provide further fitness advantages for E. cloacae in microbial competition, thus allowing it to survive in different environments. Competition assays were performed to support our observations in genomic analysis, where E. cloacae subsp. cloacae ENHKU01 demonstrated antagonistic activities against a wide range of plant pathogenic fungal and bacterial species. PMID:24069314

  2. Zooming in on the human-mouse comparative map: genome conservation re-examined on a high-resolution scale.

    PubMed

    Carver, E A; Stubbs, L

    1997-12-01

    Over the past decade, conservation of genetic linkage groups has been shown in mammals and used to great advantage, fueling significant exchanges of gene mapping and functional information especially between the genomes of humans and mice. As human physical maps increase in resolution from chromosome bands to nucleotide sequence, comparative alignments of mouse and human regions have revealed striking similarities and surprising differences between the genomes of these two best-mapped mammalian species. Whereas, at present, very few mouse and human regions have been compared on the physical level, existing studies provide intriguing insights to genome evolution, including the observation of recent duplications and deletions of genes that may play significant roles in defining some of the biological differences between the two species. Although high-resolution conserved marker-based maps are currently available only for human and mouse, a variety of new methods and resources are speeding the development of comparative maps of additional organisms. These advances mark the first step toward establishment of the human genome as a reference map for vertebrate species, providing evolutionary and functional annotation to human sequence and vast new resources for genetic analysis of a variety of commercially, medically, and ecologically important animal models.

  3. A high-resolution radiation hybrid map of the bovine genome

    USDA-ARS?s Scientific Manuscript database

    We are building high-resolution radiation hybrid maps of all 29 bovine autosomes and chromosome X, using a 58,000-marker genotyping assay, and a 12,000-rad whole-genome radiation hybrid (RH) panel. To accommodate the large number of markers, and to automate the map building procedure, a software pip...

  4. Comparative analysis of methods for genome-wide nucleosome cartography.

    PubMed

    Quintales, Luis; Vázquez, Enrique; Antequera, Francisco

    2015-07-01

    Nucleosomes contribute to compacting the genome into the nucleus and regulate the physical access of regulatory proteins to DNA either directly or through the epigenetic modifications of the histone tails. Precise mapping of nucleosome positioning across the genome is, therefore, essential to understanding the genome regulation. In recent years, several experimental protocols have been developed for this purpose that include the enzymatic digestion, chemical cleavage or immunoprecipitation of chromatin followed by next-generation sequencing of the resulting DNA fragments. Here, we compare the performance and resolution of these methods from the initial biochemical steps through the alignment of the millions of short-sequence reads to a reference genome to the final computational analysis to generate genome-wide maps of nucleosome occupancy. Because of the lack of a unified protocol to process data sets obtained through the different approaches, we have developed a new computational tool (NUCwave), which facilitates their analysis, comparison and assessment and will enable researchers to choose the most suitable method for any particular purpose. NUCwave is freely available at http://nucleosome.usal.es/nucwave along with a step-by-step protocol for its use. © The Author 2014. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  5. A full-coverage, high-resolution human chromosome 22 genomic microarray for clinical and research applications.

    PubMed

    Buckley, Patrick G; Mantripragada, Kiran K; Benetkiewicz, Magdalena; Tapia-Páez, Isabel; Diaz De Ståhl, Teresita; Rosenquist, Magnus; Ali, Haider; Jarbo, Caroline; De Bustos, Cecilía; Hirvelä, Carina; Sinder Wilén, Birgitta; Fransson, Ingegerd; Thyr, Charlotte; Johnsson, Britt-Inger; Bruder, Carl E G; Menzel, Uwe; Hergersberg, Martin; Mandahl, Nils; Blennow, Elisabeth; Wedell, Anna; Beare, David M; Collins, John E; Dunham, Ian; Albertson, Donna; Pinkel, Daniel; Bastian, Boris C; Faruqi, A Fawad; Lasken, Roger S; Ichimura, Koichi; Collins, V Peter; Dumanski, Jan P

    2002-12-01

    We have constructed the first comprehensive microarray representing a human chromosome for analysis of DNA copy number variation. This chromosome 22 array covers 34.7 Mb, representing 1.1% of the genome, with an average resolution of 75 kb. To demonstrate the utility of the array, we have applied it to profile acral melanoma, dermatofibrosarcoma, DiGeorge syndrome and neurofibromatosis 2. We accurately diagnosed homozygous/heterozygous deletions, amplifications/gains, IGLV/IGLC locus instability, and breakpoints of an imbalanced translocation. We further identified the 14-3-3 eta isoform as a candidate tumor suppressor in glioblastoma. Two significant methodological advances in array construction were also developed and validated. These include a strictly sequence defined, repeat-free, and non-redundant strategy for array preparation. This approach allows an increase in array resolution and analysis of any locus; disregarding common repeats, genomic clone availability and sequence redundancy. In addition, we report that the application of phi29 DNA polymerase is advantageous in microarray preparation. A broad spectrum of issues in medical research and diagnostics can be approached using the array. This well annotated and gene-rich autosome contains numerous uncharacterized disease genes. It is therefore crucial to associate these genes to specific 22q-related conditions and this array will be instrumental towards this goal. Furthermore, comprehensive epigenetic profiling of 22q-located genes and high-resolution analysis of replication timing across the entire chromosome can be studied using our array.

  6. High-Resolution Genetic Map for Understanding the Effect of Genome-Wide Recombination Rate on Nucleotide Diversity in Watermelon

    PubMed Central

    Reddy, Umesh K.; Nimmakayala, Padma; Levi, Amnon; Abburi, Venkata Lakshmi; Saminathan, Thangasamy; Tomason, Yan. R.; Vajja, Gopinath; Reddy, Rishi; Abburi, Lavanya; Wehner, Todd C.; Ronin, Yefim; Karol, Abraham

    2014-01-01

    We used genotyping by sequencing to identify a set of 10,480 single nucleotide polymorphism (SNP) markers for constructing a high-resolution genetic map of 1096 cM for watermelon. We assessed the genome-wide variation in recombination rate (GWRR) across the map and found an association between GWRR and genome-wide nucleotide diversity. Collinearity between the map and the genome-wide reference sequence for watermelon was studied to identify inconsistency and chromosome rearrangements. We assessed genome-wide nucleotide diversity, linkage disequilibrium (LD), and selective sweep for wild, semi-wild, and domesticated accessions of Citrullus lanatus var. lanatus to track signals of domestication. Principal component analysis combined with chromosome-wide phylogenetic study based on 1563 SNPs obtained after LD pruning with minor allele frequency of 0.05 resolved the differences between semi-wild and wild accessions as well as relationships among worldwide sweet watermelon. Population structure analysis revealed predominant ancestries for wild, semi-wild, and domesticated watermelons as well as admixture of various ancestries that were important for domestication. Sliding window analysis of Tajima’s D across various chromosomes was used to resolve selective sweep. LD decay was estimated for various chromosomes. We identified a strong selective sweep on chromosome 3 consisting of important genes that might have had a role in sweet watermelon domestication. PMID:25227227

  7. Comparative genomic analysis of sixty mycobacteriophage genomes: Genome clustering, gene acquisition and gene size

    PubMed Central

    Hatfull, Graham F.; Jacobs-Sera, Deborah; Lawrence, Jeffrey G.; Pope, Welkin H.; Russell, Daniel A.; Ko, Ching-Chung; Weber, Rebecca J.; Patel, Manisha C.; Germane, Katherine L.; Edgar, Robert H.; Hoyte, Natasha N.; Bowman, Charles A.; Tantoco, Anthony T.; Paladin, Elizabeth C.; Myers, Marlana S.; Smith, Alexis L.; Grace, Molly S.; Pham, Thuy T.; O'Brien, Matthew B.; Vogelsberger, Amy M.; Hryckowian, Andrew J.; Wynalek, Jessica L.; Donis-Keller, Helen; Bogel, Matt W.; Peebles, Craig L.; Cresawn, Steve G.; Hendrix, Roger W.

    2010-01-01

    Mycobacteriophages are viruses that infect mycobacterial hosts. Expansion of a collection of sequenced phage genomes to a total of sixty – all infecting a common bacterial host – provides further insight into their diversity and evolution. Of the sixty phage genomes, 55 can be grouped into nine clusters according to their nucleotide sequence similarities, five of which can be further divided into subclusters; five genomes do not cluster with other phages. The sequence diversity between genomes within a cluster varies greatly; for example, the six genomes in cluster D share more than 97.5% average nucleotide similarity with each other. In contrast, similarity between the two genomes in Cluster I is barely detectable by diagonal plot analysis. The total of 6,858 predicted ORFs have been grouped into 1523 phamilies (phams) of related sequences, 46% of which possess only a single member. Only 18.8% of the phams have sequence similarity to non-mycobacteriophage database entries and fewer than 10% of all phams can be assigned functions based on database searching or synteny. Genome clustering facilitates the identification of genes that are in greatest genetic flux and are more likely to have been exchanged horizontally in relatively recent evolutionary time. Although mycobacteriophage genes exhibit smaller average size than genes of their host (205 residues compared to 315), phage genes in higher flux average only ∼100 amino acids, suggesting that the primary units of genetic exchange correspond to single protein domains. PMID:20064525

  8. A Distance Measure for Genome Phylogenetic Analysis

    NASA Astrophysics Data System (ADS)

    Cao, Minh Duc; Allison, Lloyd; Dix, Trevor

    Phylogenetic analyses of species based on single genes or parts of the genomes are often inconsistent because of factors such as variable rates of evolution and horizontal gene transfer. The availability of more and more sequenced genomes allows phylogeny construction from complete genomes that is less sensitive to such inconsistency. For such long sequences, construction methods like maximum parsimony and maximum likelihood are often not possible due to their intensive computational requirement. Another class of tree construction methods, namely distance-based methods, require a measure of distances between any two genomes. Some measures such as evolutionary edit distance of gene order and gene content are computational expensive or do not perform well when the gene content of the organisms are similar. This study presents an information theoretic measure of genetic distances between genomes based on the biological compression algorithm expert model. We demonstrate that our distance measure can be applied to reconstruct the consensus phylogenetic tree of a number of Plasmodium parasites from their genomes, the statistical bias of which would mislead conventional analysis methods. Our approach is also used to successfully construct a plausible evolutionary tree for the γ-Proteobacteria group whose genomes are known to contain many horizontally transferred genes.

  9. Multiplexed chromosome conformation capture sequencing for rapid genome-scale high-resolution detection of long-range chromatin interactions.

    PubMed

    Stadhouders, Ralph; Kolovos, Petros; Brouwer, Rutger; Zuin, Jessica; van den Heuvel, Anita; Kockx, Christel; Palstra, Robert-Jan; Wendt, Kerstin S; Grosveld, Frank; van Ijcken, Wilfred; Soler, Eric

    2013-03-01

    Chromosome conformation capture (3C) technology is a powerful and increasingly popular tool for analyzing the spatial organization of genomes. Several 3C variants have been developed (e.g., 4C, 5C, ChIA-PET, Hi-C), allowing large-scale mapping of long-range genomic interactions. Here we describe multiplexed 3C sequencing (3C-seq), a 4C variant coupled to next-generation sequencing, allowing genome-scale detection of long-range interactions with candidate regions. Compared with several other available techniques, 3C-seq offers a superior resolution (typically single restriction fragment resolution; approximately 1-8 kb on average) and can be applied in a semi-high-throughput fashion. It allows the assessment of long-range interactions of up to 192 genes or regions of interest in parallel by multiplexing library sequencing. This renders multiplexed 3C-seq an inexpensive, quick (total hands-on time of 2 weeks) and efficient method that is ideal for the in-depth analysis of complex genetic loci. The preparation of multiplexed 3C-seq libraries can be performed by any investigator with basic skills in molecular biology techniques. Data analysis requires basic expertise in bioinformatics and in Linux and Python environments. The protocol describes all materials, critical steps and bioinformatics tools required for successful application of 3C-seq technology.

  10. Genomic analysis of Fusarium verticillioides.

    PubMed

    Brown, D W; Butchko, R A E; Proctor, R H

    2008-09-01

    Fusarium verticillioides (teleomorph Gibberella moniliformis) can be either an endophyte of maize, causing no visible disease, or a pathogen-causing disease of ears, stalks, roots and seedlings. At any stage, this fungus can synthesize fumonisins, a family of mycotoxins structurally similar to the sphingolipid sphinganine. Ingestion of fumonisin-contaminated maize has been associated with a number of animal diseases, including cancer in rodents, and exposure has been correlated with human oesophageal cancer in some regions of the world, and some evidence suggests that fumonisins are a risk factor for neural tube defects. A primary goal of the authors' laboratory is to eliminate fumonisin contamination of maize and maize products. Understanding how and why these toxins are made and the F. verticillioides-maize disease process will allow one to develop novel strategies to limit tissue destruction (rot) and fumonisin production. To meet this goal, genomic sequence data, expressed sequence tags (ESTs) and microarrays are being used to identify F. verticillioides genes involved in the biosynthesis of toxins and plant pathogenesis. This paper describes the current status of F. verticillioides genomic resources and three approaches being used to mine microarray data from a wild-type strain cultured in liquid fumonisin production medium for 12, 24, 48, 72, 96 and 120h. Taken together, these approaches demonstrate the power of microarray technology to provide information on different biological processes.

  11. Comprehensive genome characterization of solitary fibrous tumors using high-resolution array-based comparative genomic hybridization.

    PubMed

    Bertucci, François; Bouvier-Labit, Corinne; Finetti, Pascal; Adélaïde, José; Metellus, Philippe; Mokhtari, Karima; Decouvelaere, Anne-Valérie; Miquel, Catherine; Jouvet, Anne; Figarella-Branger, Dominique; Pedeutour, Florence; Chaffanet, Max; Birnbaum, Daniel

    2013-02-01

    Solitary fibrous tumors (SFTs) are rare spindle cell tumors with limited therapeutic options. Their molecular basis is poorly known. No consistent cytogenetic abnormality has been reported. We used high-resolution whole-genome array-based comparative genomic hybridization (Agilent 244K oligonucleotide chips) to profile 47 samples, meningeal in >75% of cases. Few copy number aberrations (CNAs) were observed. Sixty-eight percent of samples did not show any gene CNA after exclusion of probes located in regions with referenced copy number variation (CNV). Only low-level CNAs were observed. The genomic profiles were very homogeneous among samples. No molecular class was revealed by clustering of DNA copy numbers. All cases displayed a "simplex" profile. No recurrent CNA was identified. Imbalances occurring in >20%, such as the gain of 8p11.23-11.22 region, contained known CNVs. The 13q14.11-13q31.1 region (lost in 4% of cases) was the largest altered region and contained the lowest percentage of genes with referenced CNVs. A total of 425 genes without CNV showed copy number transition in at least one sample, but only but only 1 in at least 10% of samples. The genomic profiles of meningeal and extra-meningeal cases did not show any differences.

  12. Sequencing and Analysis of Neanderthal Genomic DNA

    PubMed Central

    Noonan, James P.; Coop, Graham; Kudaravalli, Sridhar; Smith, Doug; Krause, Johannes; Alessi, Joe; Chen, Feng; Platt, Darren; Pääbo, Svante; Pritchard, Jonathan K.; Rubin, Edward M.

    2008-01-01

    Our knowledge of Neanderthals is based on a limited number of remains and artifacts from which we must make inferences about their biology, behavior, and relationship to ourselves. Here, we describe the characterization of these extinct hominids from a new perspective, based on the development of a Neanderthal metagenomic library and its high-throughput sequencing and analysis. Several lines of evidence indicate that the 65,250 base pairs of hominid sequence so far identified in the library are of Neanderthal origin, the strongest being the ascertainment of sequence identities between Neanderthal and chimpanzee at sites where the human genomic sequence is different. These results enabled us to calculate the human-Neanderthal divergence time based on multiple randomly distributed autosomal loci. Our analyses suggest that on average the Neanderthal genomic sequence we obtained and the reference human genome sequence share a most recent common ancestor ~706,000 years ago, and that the human and Neanderthal ancestral populations split ~370,000 years ago, before the emergence of anatomically modern humans. Our finding that the Neanderthal and human genomes are at least 99.5% identical led us to develop and successfully implement a targeted method for recovering specific ancient DNA sequences from metagenomic libraries. This initial analysis of the Neanderthal genome advances our understanding of the evolutionary relationship of Homo sapiens and Homo neanderthalensis and signifies the dawn of Neanderthal genomics. PMID:17110569

  13. Bioinformatics for analysis of poxvirus genomes.

    PubMed

    Da Silva, Melissa; Upton, Chris

    2012-01-01

    In recent years, there have been numerous unprecedented technological advances in the field of molecular biology; these include DNA sequencing, mass spectrometry of proteins, and microarray analysis of mRNA transcripts. Perhaps, however, it is the area of genomics, which has now generated the complete genome sequences of more than 100 poxviruses, that has had the greatest impact on the average virology researcher because the DNA sequence data is in constant use in many different ways by almost all molecular virologists. As this data resource grows, so does the importance of the availability of databases and software tools to enable the bench virologist to work with and make use of this (valuable/expensive) DNA sequence information. Thus, providing researchers with intuitive software to first select and reformat genomics data from large databases, second, to compare/analyze genomics data, and third, to view and interpret large and complex sets of results has become pivotal in enabling progress to be made in modern virology. This chapter is directed at the bench virologist and describes the software required for a number of common bioinformatics techniques that are useful for comparing and analyzing poxvirus genomes. In a number of examples, we also highlight the Viral Orthologous Clusters database system and integrated tools that we developed for the management and analysis of complete viral genomes.

  14. Comparative Genome Analysis in the Integrated Microbial Genomes(IMG) System

    SciTech Connect

    Kyrpides, Nikos C.; Markowitz, Victor M.

    2006-03-01

    Comparative genome analysis is critical for the effectiveexploration of a rapidly growing number of complete and draft sequencesfor microbial genomes. The Integrated Microbial Genomes (IMG) system(img.jgi.doe.gov) has been developed as a community resource thatprovides support for comparative analysis of microbial genomes in anintegrated context. IMG allows users to navigate the multidimensionalmicrobial genome data space and focus their analysis on a subset ofgenes, genomes, and functions of interest. IMG provides graphicalviewers, summaries and occurrence profile tools for comparing genes,pathways and functions (terms) across specific genomes. Genes can befurther examined using gene neighborhoods and compared with sequencealignment tools.

  15. Genomic signal analysis of pathogen variability

    NASA Astrophysics Data System (ADS)

    Cristea, Paul Dan

    2006-02-01

    The paper presents results in the study of pathogen variability by using genomic signals. The conversion of symbolic nucleotide sequences into digital signals offers the possibility to apply signal processing methods to the analysis of genomic data. The method is particularly well suited to characterize small size genomic sequences, such as those found in viruses and bacteria, being a promising tool in tracking the variability of pathogens, especially in the context of developing drug resistance. The paper is based on data downloaded from GenBank [32], and comprises results on the variability of the eight segments of the influenza type A, subtype H5N1, virus genome, and of the Hemagglutinin (HA) gene, for the H1, H2, H3, H4, H5 and H16 types. Data from human and avian virus isolates are used.

  16. Comparative genomic analysis of the genus Enterococcus.

    PubMed

    Zhong, Zhi; Zhang, Wenyi; Song, Yuqin; Liu, Wenjun; Xu, Haiyan; Xi, Xiaoxia; Menghe, Bilige; Zhang, Heping; Sun, Zhihong

    2017-03-01

    As important lactic acid bacteria, Enterococcus species are widely used in the production of fermented food. However, as some strains of Enterococcus are opportunistic pathogens, their safety has not been generally accepted. In recent years, a large number of new species have been described and classified within the genus Enterococcus, so a better understanding of the genetic relationships and evolution of Enterococcus species is needed. In this study, the genomes of 29 type strains of Enterococcus species were sequenced. In combination with eight complete genome sequences from the Genbank database, the whole genomes of 37 strains of Enterococcus were comparatively analyzed. The average length of Enterococcus genomes was 3.20Mb and the average GC content was 37.99%. The core- and pan- genomes were defined based on the genomes of the 37 strains of Enterococcus. The core-genome contained 605 genes, a large proportion of which were associated with carbohydrate metabolism, protein metabolism, DNA and RNA metabolism. The phylogenetic tree showed that habitat is very important in the evolution of Enterococcus. The genetic relationships were closer in strains that come from similar habitats. According to the topology of the time tree, we found that humans and mammals may be the original hosts of Enterococcus, and then species from humans and mammals made a host-shift to plants, birds, food and other environments. However, it was just an evolutionary scenario, and more data and efforts were needed to prove this postulation. The comparative genomic analysis provided a snapshot of the evolution and genetic diversity of the genus Enterococcus, which paves the way for follow-up studies on its taxonomy and functional genomics. Copyright © 2017 Elsevier GmbH. All rights reserved.

  17. JBrowse: a dynamic web platform for genome visualization and analysis.

    PubMed

    Buels, Robert; Yao, Eric; Diesh, Colin M; Hayes, Richard D; Munoz-Torres, Monica; Helt, Gregg; Goodstein, David M; Elsik, Christine G; Lewis, Suzanna E; Stein, Lincoln; Holmes, Ian H

    2016-04-12

    JBrowse is a fast and full-featured genome browser built with JavaScript and HTML5. It is easily embedded into websites or apps but can also be served as a standalone web page. Overall improvements to speed and scalability are accompanied by specific enhancements that support complex interactive queries on large track sets. Analysis functions can readily be added using the plugin framework; most visual aspects of tracks can also be customized, along with clicks, mouseovers, menus, and popup boxes. JBrowse can also be used to browse local annotation files offline and to generate high-resolution figures for publication. JBrowse is a mature web application suitable for genome visualization and analysis.

  18. Random forests for genomic data analysis.

    PubMed

    Chen, Xi; Ishwaran, Hemant

    2012-06-01

    Random forests (RF) is a popular tree-based ensemble machine learning tool that is highly data adaptive, applies to "large p, small n" problems, and is able to account for correlation as well as interactions among features. This makes RF particularly appealing for high-dimensional genomic data analysis. In this article, we systematically review the applications and recent progresses of RF for genomic data, including prediction and classification, variable selection, pathway analysis, genetic association and epistasis detection, and unsupervised learning. Copyright © 2012 Elsevier Inc. All rights reserved.

  19. Resolution of Disease Phenotypes Resulting from Multilocus Genomic Variation

    PubMed Central

    Posey, Jennifer E.; Harel, Tamar; Liu, Pengfei; Rosenfeld, Jill A.; James, Regis A.; Coban Akdemir, Zeynep H.; Walkiewicz, Magdalena; Bi, Weimin; Xiao, Rui; Ding, Yan; Xia, Fan; Beaudet, Arthur L.; Muzny, Donna M.; Gibbs, Richard A.; Boerwinkle, Eric; Eng, Christine M.; Sutton, V. Reid; Shaw, Chad A.; Plon, Sharon E.; Yang, Yaping; Lupski, James R.

    2017-01-01

    BACKGROUND Whole-exome sequencing can provide insight into the relationship between observed clinical phenotypes and underlying genotypes. METHODS We conducted a retrospective analysis of data from a series of 7374 consecutive unrelated patients who had been referred to a clinical diagnostic laboratory for whole-exome sequencing; our goal was to determine the frequency and clinical characteristics of patients for whom more than one molecular diagnosis was reported. The phenotypic similarity between molecularly diagnosed pairs of diseases was calculated with the use of terms from the Human Phenotype Ontology. RESULTS A molecular diagnosis was rendered for 2076 of 7374 patients (28.2%); among these patients, 101 (4.9%) had diagnoses that involved two or more disease loci. We also analyzed parental samples, when available, and found that de novo variants accounted for 67.8% (61 of 90) of pathogenic variants in autosomal dominant disease genes and 51.7% (15 of 29) of pathogenic variants in X-linked disease genes; both variants were de novo in 44.7% (17 of 38) of patients with two monoallelic variants. Causal copy-number variants were found in 12 patients (11.9%) with multiple diagnoses. Phenotypic similarity scores were significantly lower among patients in whom the phenotype resulted from two distinct mendelian disorders that affected different organ systems (50 patients) than among patients with disorders that had overlapping phenotypic features (30 patients) (median score, 0.21 vs. 0.36; P = 1.77×10−7). CONCLUSIONS In our study, we found multiple molecular diagnoses in 4.9% of cases in which whole-exome sequencing was informative. Our results show that structured clinical ontologies can be used to determine the degree of overlap between two mendelian diseases in the same patient; the diseases can be distinct or overlapping. Distinct disease phenotypes affect different organ systems, whereas overlapping disease phenotypes are more likely to be caused by two genes

  20. Resolution of Disease Phenotypes Resulting from Multilocus Genomic Variation.

    PubMed

    Posey, Jennifer E; Harel, Tamar; Liu, Pengfei; Rosenfeld, Jill A; James, Regis A; Coban Akdemir, Zeynep H; Walkiewicz, Magdalena; Bi, Weimin; Xiao, Rui; Ding, Yan; Xia, Fan; Beaudet, Arthur L; Muzny, Donna M; Gibbs, Richard A; Boerwinkle, Eric; Eng, Christine M; Sutton, V Reid; Shaw, Chad A; Plon, Sharon E; Yang, Yaping; Lupski, James R

    2017-01-05

    Whole-exome sequencing can provide insight into the relationship between observed clinical phenotypes and underlying genotypes. We conducted a retrospective analysis of data from a series of 7374 consecutive unrelated patients who had been referred to a clinical diagnostic laboratory for whole-exome sequencing; our goal was to determine the frequency and clinical characteristics of patients for whom more than one molecular diagnosis was reported. The phenotypic similarity between molecularly diagnosed pairs of diseases was calculated with the use of terms from the Human Phenotype Ontology. A molecular diagnosis was rendered for 2076 of 7374 patients (28.2%); among these patients, 101 (4.9%) had diagnoses that involved two or more disease loci. We also analyzed parental samples, when available, and found that de novo variants accounted for 67.8% (61 of 90) of pathogenic variants in autosomal dominant disease genes and 51.7% (15 of 29) of pathogenic variants in X-linked disease genes; both variants were de novo in 44.7% (17 of 38) of patients with two monoallelic variants. Causal copy-number variants were found in 12 patients (11.9%) with multiple diagnoses. Phenotypic similarity scores were significantly lower among patients in whom the phenotype resulted from two distinct mendelian disorders that affected different organ systems (50 patients) than among patients with disorders that had overlapping phenotypic features (30 patients) (median score, 0.21 vs. 0.36; P=1.77×10(-7)). In our study, we found multiple molecular diagnoses in 4.9% of cases in which whole-exome sequencing was informative. Our results show that structured clinical ontologies can be used to determine the degree of overlap between two mendelian diseases in the same patient; the diseases can be distinct or overlapping. Distinct disease phenotypes affect different organ systems, whereas overlapping disease phenotypes are more likely to be caused by two genes encoding proteins that interact within

  1. MAGI: Methylation analysis using genome information.

    PubMed

    Baumann, Douglas D; Doerge, R W

    2014-05-01

    By incorporating annotation information into the analysis of next-generation sequencing DNA methylation data, we provide an improvement in performance over current testing procedures. Methylation analysis using genome information (MAGI) is applicable for both unreplicated and replicated data, and provides an effective analysis for studies with low sequencing depth. When compared with current tests, the annotation-informed tests provide an increase in statistical power and offer a significance-based interpretation of differential methylation.

  2. Whole genome sequence analysis of Mycobacterium suricattae.

    PubMed

    Dippenaar, Anzaan; Parsons, Sven David Charles; Sampson, Samantha Leigh; van der Merwe, Ruben Gerhard; Drewe, Julian Ashley; Abdallah, Abdallah Musa; Siame, Kabengele Keith; Gey van Pittius, Nicolaas Claudius; van Helden, Paul David; Pain, Arnab; Warren, Robin Mark

    2015-12-01

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi.

  3. Genome size determination in peronosporales (Oomycota) by Feulgen image analysis.

    PubMed

    Voglmayr, H; Greilhuber, J

    1998-12-01

    Genome size was determined, by nuclear Feulgen staining and image analysis, in 46 accessions of 31 species of Peronosporales (Oomycota), including important plant pathogens such as Bremia lactucae, Plasmopara viticola, Pseudoperonospora cubensis, and Pseudoperonospora humuli. The 1C DNA contents ranged from 0.046 (45. 6 Mb) to 0.163 pg (159.9 Mb). This is 0.041- to 0.144-fold that of Glycine max (soybean, 1C = 1.134 pg), which was used as an internal standard for genome size determination. The linearity of Feulgen absorbance photometry method over this range was demonstrated by calibration of Aspergillus species (1C = 31-38 Mb) against Glycine, which revealed differences of less than 6% compared to the published CHEF data. The low coefficients of variation (usually between 5 and 10%), repeatability of the results, and compatibility with CHEF data prove the resolution power of Feulgen image analysis. The applicability and limitations of Feulgen photometry are discussed in relation to other methods of genome size determination (CHEF gel electrophoresis, reassociation kinetics, genomic reconstruction) that have been previously applied to Oomycota. Copyright 1998 Academic Press.

  4. AcCNET (Accessory Genome Constellation Network): comparative genomics software for accessory genome analysis using bipartite networks.

    PubMed

    Lanza, Val F; Baquero, Fernando; de la Cruz, Fernando; Coque, Teresa M

    2017-01-15

    AcCNET (Accessory genome Constellation Network) is a Perl application that aims to compare accessory genomes of a large number of genomic units, both at qualitative and quantitative levels. Using the proteomes extracted from the analysed genomes, AcCNET creates a bipartite network compatible with standard network analysis platforms. AcCNET allows merging phylogenetic and functional information about the concerned genomes, thus improving the capability of current methods of network analysis. The AcCNET bipartite network opens a new perspective to explore the pangenome of bacterial species, focusing on the accessory genome behind the idiosyncrasy of a particular strain and/or population.

  5. A factor analysis model for functional genomics

    PubMed Central

    Kustra, Rafal; Shioda, Romy; Zhu, Mu

    2006-01-01

    Background Expression array data are used to predict biological functions of uncharacterized genes by comparing their expression profiles to those of characterized genes. While biologically plausible, this is both statistically and computationally challenging. Typical approaches are computationally expensive and ignore correlations among expression profiles and functional categories. Results We propose a factor analysis model (FAM) for functional genomics and give a two-step algorithm, using genome-wide expression data for yeast and a subset of Gene-Ontology Biological Process functional annotations. We show that the predictive performance of our method is comparable to the current best approach while our total computation time was faster by a factor of 4000. We discuss the unique challenges in performance evaluation of algorithms used for genome-wide functions genomics. Finally, we discuss extensions to our method that can incorporate the inherent correlation structure of the functional categories to further improve predictive performance. Conclusion Our factor analysis model is a computationally efficient technique for functional genomics and provides a clear and unified statistical framework with potential for incorporating important gene ontology information to improve predictions. PMID:16630343

  6. HilbertCurve: an R/Bioconductor package for high-resolution visualization of genomic data.

    PubMed

    Gu, Zuguang; Eils, Roland; Schlesner, Matthias

    2016-08-01

    : Hilbert curves enable high-resolution visualization of genomic data on a chromosome- or genome-wide scale. Here we present the HilbertCurve package that provides an easy-to-use interface for mapping genomic data to Hilbert curves. The package transforms the curve as a virtual axis, thereby hiding the details of the curve construction from the user. HilbertCurve supports multiple-layer overlay that makes it a powerful tool to correlate the spatial distribution of multiple feature types. The HilbertCurve package and documentation are freely available from the Bioconductor project: http://www.bioconductor.org/packages/devel/bioc/html/HilbertCurve.html m.schlesner@dkfz.de Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  7. AGAPE (Automated Genome Analysis PipelinE) for pan-genome analysis of Saccharomyces cerevisiae.

    PubMed

    Song, Giltae; Dickins, Benjamin J A; Demeter, Janos; Engel, Stacia; Gallagher, Jennifer; Choe, Kisurb; Dunn, Barbara; Snyder, Michael; Cherry, J Michael

    2015-01-01

    The characterization and public release of genome sequences from thousands of organisms is expanding the scope for genetic variation studies. However, understanding the phenotypic consequences of genetic variation remains a challenge in eukaryotes due to the complexity of the genotype-phenotype map. One approach to this is the intensive study of model systems for which diverse sources of information can be accumulated and integrated. Saccharomyces cerevisiae is an extensively studied model organism, with well-known protein functions and thoroughly curated phenotype data. To develop and expand the available resources linking genomic variation with function in yeast, we aim to model the pan-genome of S. cerevisiae. To initiate the yeast pan-genome, we newly sequenced or re-sequenced the genomes of 25 strains that are commonly used in the yeast research community using advanced sequencing technology at high quality. We also developed a pipeline for automated pan-genome analysis, which integrates the steps of assembly, annotation, and variation calling. To assign strain-specific functional annotations, we identified genes that were not present in the reference genome. We classified these according to their presence or absence across strains and characterized each group of genes with known functional and phenotypic features. The functional roles of novel genes not found in the reference genome and associated with strains or groups of strains appear to be consistent with anticipated adaptations in specific lineages. As more S. cerevisiae strain genomes are released, our analysis can be used to collate genome data and relate it to lineage-specific patterns of genome evolution. Our new tool set will enhance our understanding of genomic and functional evolution in S. cerevisiae, and will be available to the yeast genetics and molecular biology community.

  8. Blot hybridisation analysis of genomic DNA.

    PubMed Central

    Vandenplas, S; Wiid, I; Grobler-Rabie, A; Brebner, K; Ricketts, M; Wållis, G; Bester, A; Boyd, C; Måthew, C

    1984-01-01

    Restriction endonuclease analysis of specific gene sequences is proving to be a valuable technique for characterisation and diagnosis of inherited disorders. This paper describes detailed protocols for isolation, restriction, and blot hybridisation of genomic DNA. Problems and alternatives in the procedure are discussed and a troubleshooting guide has been provided to help rectify faults. Images PMID:6086927

  9. Tools for sea urchin genomic analysis.

    PubMed

    Cameron, R Andrew

    2014-01-01

    The Sea Urchin Genome Project Web site, SpBase ( http://SpBase.org ), in association with a suite of publicly available sequence comparison tools provides a platform from which to analyze genes and genomic sequences of sea urchin. This information system is specifically designed to support laboratory bench studies in cell and molecular biology. In particular these tools and datasets have supported the description of the gene regulatory networks of the purple sea urchin S. purpuratus. This chapter details methods to undertake in the first steps to find genes and noncoding regulatory sequences for further analysis.

  10. The Genomic HyperBrowser: an analysis web server for genome-scale data.

    PubMed

    Sandve, Geir K; Gundersen, Sveinung; Johansen, Morten; Glad, Ingrid K; Gunathasan, Krishanthi; Holden, Lars; Holden, Marit; Liestøl, Knut; Nygård, Ståle; Nygaard, Vegard; Paulsen, Jonas; Rydbeck, Halfdan; Trengereid, Kai; Clancy, Trevor; Drabløs, Finn; Ferkingstad, Egil; Kalas, Matús; Lien, Tonje; Rye, Morten B; Frigessi, Arnoldo; Hovig, Eivind

    2013-07-01

    The immense increase in availability of genomic scale datasets, such as those provided by the ENCODE and Roadmap Epigenomics projects, presents unprecedented opportunities for individual researchers to pose novel falsifiable biological questions. With this opportunity, however, researchers are faced with the challenge of how to best analyze and interpret their genome-scale datasets. A powerful way of representing genome-scale data is as feature-specific coordinates relative to reference genome assemblies, i.e. as genomic tracks. The Genomic HyperBrowser (http://hyperbrowser.uio.no) is an open-ended web server for the analysis of genomic track data. Through the provision of several highly customizable components for processing and statistical analysis of genomic tracks, the HyperBrowser opens for a range of genomic investigations, related to, e.g., gene regulation, disease association or epigenetic modifications of the genome.

  11. Integrative bayesian network analysis of genomic data.

    PubMed

    Ni, Yang; Stingo, Francesco C; Baladandayuthapani, Veerabhadran

    2014-01-01

    Rapid development of genome-wide profiling technologies has made it possible to conduct integrative analysis on genomic data from multiple platforms. In this study, we develop a novel integrative Bayesian network approach to investigate the relationships between genetic and epigenetic alterations as well as how these mutations affect a patient's clinical outcome. We take a Bayesian network approach that admits a convenient decomposition of the joint distribution into local distributions. Exploiting the prior biological knowledge about regulatory mechanisms, we model each local distribution as linear regressions. This allows us to analyze multi-platform genome-wide data in a computationally efficient manner. We illustrate the performance of our approach through simulation studies. Our methods are motivated by and applied to a multi-platform glioblastoma dataset, from which we reveal several biologically relevant relationships that have been validated in the literature as well as new genes that could potentially be novel biomarkers for cancer progression.

  12. Comparative genome analysis of Basidiomycete fungi

    SciTech Connect

    Riley, Robert; Salamov, Asaf; Henrissat, Bernard; Nagy, Laszlo; Brown, Daren; Held, Benjamin; Baker, Scott; Blanchette, Robert; Boussau, Bastien; Doty, Sharon L.; Fagnan, Kirsten; Floudas, Dimitris; Levasseur, Anthony; Manning, Gerard; Martin, Francis; Morin, Emmanuelle; Otillar, Robert; Pisabarro, Antonio; Walton, Jonathan; Wolfe, Ken; Hibbett, David; Grigoriev, Igor

    2013-08-07

    Fungi of the phylum Basidiomycota (basidiomycetes), make up some 37percent of the described fungi, and are important in forestry, agriculture, medicine, and bioenergy. This diverse phylum includes symbionts, pathogens, and saprotrophs including the majority of wood decaying and ectomycorrhizal species. To better understand the genetic diversity of this phylum we compared the genomes of 35 basidiomycetes including 6 newly sequenced genomes. These genomes span extremes of genome size, gene number, and repeat content. Analysis of core genes reveals that some 48percent of basidiomycete proteins are unique to the phylum with nearly half of those (22percent) found in only one organism. Correlations between lifestyle and certain gene families are evident. Phylogenetic patterns of plant biomass-degrading genes in Agaricomycotina suggest a continuum rather than a dichotomy between the white rot and brown rot modes of wood decay. Based on phylogenetically-informed PCA analysis of wood decay genes, we predict that that Botryobasidium botryosum and Jaapia argillacea have properties similar to white rot species, although neither has typical ligninolytic class II fungal peroxidases (PODs). This prediction is supported by growth assays in which both fungi exhibit wood decay with white rot-like characteristics. Based on this, we suggest that the white/brown rot dichotomy may be inadequate to describe the full range of wood decaying fungi. Analysis of the rate of discovery of proteins with no or few homologs suggests the value of continued sequencing of basidiomycete fungi.

  13. Image analysis in comparative genomic hybridization

    SciTech Connect

    Lundsteen, C.; Maahr, J.; Christensen, B.

    1995-01-01

    Comparative genomic hybridization (CGH) is a new technique by which genomic imbalances can be detected by combining in situ suppression hybridization of whole genomic DNA and image analysis. We have developed software for rapid, quantitative CGH image analysis by a modification and extension of the standard software used for routine karyotyping of G-banded metaphase spreads in the Magiscan chromosome analysis system. The DAPI-counterstained metaphase spread is karyotyped interactively. Corrections for image shifts between the DAPI, FITC, and TRITC images are done manually by moving the three images relative to each other. The fluorescence background is subtracted. A mean filter is applied to smooth the FITC and TRITC images before the fluorescence ratio between the individual FITC and TRITC-stained chromosomes is computed pixel by pixel inside the area of the chromosomes determined by the DAPI boundaries. Fluorescence intensity ratio profiles are generated, and peaks and valleys indicating possible gains and losses of test DNA are marked if they exceed ratios below 0.75 and above 1.25. By combining the analysis of several metaphase spreads, consistent findings of gains and losses in all or almost all spreads indicate chromosomal imbalance. Chromosomal imbalances are detected either by visual inspection of fluorescence ratio (FR) profiles or by a statistical approach that compares FR measurements of the individual case with measurements of normal chromosomes. The complete analysis of one metaphase can be carried out in approximately 10 minutes. 8 refs., 7 figs., 1 tab.

  14. Comparative Analysis of Genome Sequences with VISTA

    DOE Data Explorer

    Dubchak, Inna

    VISTA is a comprehensive suite of programs and databases developed by and hosted at the Genomics Division of Lawrence Berkeley National Laboratory. They provide information and tools designed to facilitate comparative analysis of genomic sequences. Users have two ways to interact with the suite of applications at the VISTA portal. They can submit their own sequences and alignments for analysis (VISTA servers) or examine pre-computed whole-genome alignments of different species. A key menu option is the Enhancer Browser and Database at http://enhancer.lbl.gov/. The VISTA Enhancer Browser is a central resource for experimentally validated human noncoding fragments with gene enhancer activity as assessed in transgenic mice. Most of these noncoding elements were selected for testing based on their extreme conservation with other vertebrates. The results of this enhancer screen are provided through this publicly available website. The browser also features relevant results by external contributors and a large collection of additional genome-wide conserved noncoding elements which are candidate enhancer sequences. The LBL developers invite external groups to submit computational predictions of developmental enhancers. As of 10/19/2009 the database contains information on 1109 in vivo tested elements - 508 elements with enhancer activity.

  15. A high-resolution whole-genome cattle-human comparative map reveals details of mammalian chromosome evolution.

    PubMed

    Everts-van der Wind, Annelie; Larkin, Denis M; Green, Cheryl A; Elliott, Janice S; Olmstead, Colleen A; Chiu, Readman; Schein, Jacqueline E; Marra, Marco A; Womack, James E; Lewin, Harris A

    2005-12-20

    Approximately 3,000 cattle bacterial artificial chromosome (BAC)-end sequences were added to the Illinois-Texas 5,000-rad RH (RH, radiation hybrid) map. The BAC-end sequences selected for mapping are approximately 1 Mbp apart on the human chromosomes as determined by blastn analysis. The map has 3,484 ordered markers, of which 3,204 are anchored in the human genome. Two hundred-and-one homologous synteny blocks (HSBs) were identified, of which 27 are previously undiscovered, 79 are extended, 26 were formed by previously unrecognized breakpoints in 18 previously defined HSBs, and 23 are the result of fusions. The comparative coverage relative to the human genome is approximately 91%, or 97% of the theoretical maximum. The positions of 64% of all cattle centromeres and telomeres were reassigned relative to their positions on the previous map, thus facilitating a more detailed comparative analysis of centromere and telomere evolution. As an example of the utility of the high-resolution map, 22 cattle BAC fingerprint contigs were directly anchored to cattle chromosome 19 [Bos taurus, (BTA) 19]. The order of markers on the cattle RH and fingerprint maps of BTA19 and the sequence-based map of human chromosome 17 [Homo sapiens, (HSA) 17] were found to be highly consistent, with only two minor ordering discrepancies between the RH map and fingerprint contigs. The high-resolution Illinois-Texas 5,000-rad RH and comparative maps will facilitate identification of candidate genes for economically important traits, the phylogenomic analysis of mammalian chromosomes, proofing of the BAC fingerprint map and, ultimately, aid the assembly of cattle whole-genome sequence.

  16. High-resolution copy number arrays in cancer and the problem of normal genome copy number variation.

    PubMed

    Gorringe, Kylie L; Campbell, Ian G

    2008-11-01

    High-resolution techniques for analysis of genome copy number (CN) enable the analysis of complex cancer somatic genetics. However, the analysis of these data is difficult, and failure to consider a number of issues in depth may result in false leads or unnecessary rejection of true positives. First, segmental duplications may falsely generate CN breakpoints in aneuploid samples. Second, even when tumor data were each normalized to matching lymphocyte DNA, we still observed copy number polymorphisms masquerading as somatic alterations due to allelic imbalance. We investigated a number of different solutions and determined that evaluating matching normal DNA, or at least using locally derived normal baseline data, were preferable to relying on current online databases because of poor cross-platform compatibility and the likelihood of excluding genuine small somatic alterations.

  17. High resolution analysis of satellite gradiometry

    NASA Technical Reports Server (NTRS)

    Colombo, O. L.

    1989-01-01

    Satellite gravity gradiometry is a technique now under development which, by the middle of the next decade, may be used for the high resolution charting from space of the gravity field of the earth and, afterwards, of other planets. Some data analysis schemes are reviewed for getting detailed gravity maps from gradiometry on both a global and a local basis. It also presents estimates of the likely accuracies of such maps, in terms of normalized spherical harmonics expansions, both using gradiometry alone and in combination with data from a Global Positioning System (GPS) receiver carried on the same spacecraft. It compares these accuracies with those of current and future maps obtained from other data (conventional tracking, satellite-satellite tracking, etc.), and also with the spectra of various signals of geophysical interest.

  18. Genomic signal analysis of Mycobacterium tuberculosis

    NASA Astrophysics Data System (ADS)

    Cristea, Paul Dan; Banica, Dorina; Tuduce, Rodica

    2007-02-01

    As previously shown the conversion of nucleotide sequences into digital signals offers the possibility to apply signal processing methods for the analysis of genomic data. Genomic Signal Analysis (GSA) has been used to analyze large scale features of DNA sequences, at the scale of whole chromosomes, including both coding and non-coding regions. The striking regularities of genomic signals reveal restrictions in the way nucleotides and pairs of nucleotides are distributed along nucleotide sequences. Structurally, a chromosome appears to be less of a "plain text", corresponding to certain semantic and grammar rules, but more of a "poem", satisfying additional symmetry restrictions that evoke the "rhythm" and "rhyme". Recurrent patterns in nucleotide sequences are reflected in simple mathematical regularities observed in genomic signals. GSA has also been used to track pathogen variability, especially concerning their resistance to drugs. Previous work has been dedicated to the study of HIV-1, Clade F and Avian Flu. The present paper applies GSA methodology to study Mycobacterium tuberculosis (MT) rpoB gene variability, relevant to its resistance to antibiotics. Isolates from 50 Romanian patients have been studied both by rapid LightCycler PCR and by sequencing of a segment of 190-250 nucleotides covering the region of interest. The variability is caused by SNPs occurring at specific sites along the gene strand, as well as by inclusions. Because of the mentioned symmetry restrictions, the GS variations tend to compensate. An important result is that MT can act as a vector for HIV virus, which is able to retrotranscribe its specific genes both into human and MT genomes.

  19. Differential retention and divergent resolution of duplicate genes following whole-genome duplication

    PubMed Central

    McGrath, Casey L.; Gout, Jean-Francois; Johri, Parul; Doak, Thomas G.

    2014-01-01

    The Paramecium aurelia complex is a group of 15 species that share at least three past whole-genome duplications (WGDs). The macronuclear genome sequences of P. biaurelia and P. sexaurelia are presented and compared to the published sequence of P. tetraurelia. Levels of duplicate-gene retention from the recent WGD differ by >10% across species, with P. sexaurelia losing significantly more genes than P. biaurelia or P. tetraurelia. In addition, historically high rates of gene conversion have homogenized WGD paralogs, probably extending the paralogs’ lifetimes. The probability of duplicate retention is positively correlated with GC content and expression level; ribosomal proteins, transcription factors, and intracellular signaling proteins are overrepresented among maintained duplicates. Finally, multiple sources of evidence indicate that P. sexaurelia diverged from the two other lineages immediately following, or perhaps concurrent with, the recent WGD, with approximately half of gene losses between P. tetraurelia and P. sexaurelia representing divergent gene resolutions (i.e., silencing of alternative paralogs), as expected for random duplicate loss between these species. Additionally, though P. biaurelia and P. tetraurelia diverged from each other much later, there are still more than 100 cases of divergent resolution between these two species. Taken together, these results indicate that divergent resolution of duplicate genes between lineages acts to reinforce reproductive isolation between species in the Paramecium aurelia complex. PMID:25085612

  20. DNA methylation detection: bisulfite genomic sequencing analysis.

    PubMed

    Li, Yuanyuan; Tollefsbol, Trygve O

    2011-01-01

    DNA methylation, which most commonly occurs at the C5 position of cytosines within CpG dinucleotides, plays a pivotal role in many biological procedures such as gene expression, embryonic development, cellular proliferation, differentiation, and chromosome stability. Aberrant DNA methylation is often associated with loss of DNA homeostasis and genomic instability leading to the development of human diseases such as cancer. The importance of DNA methylation creates an urgent demand for effective methods with high sensitivity and reliability to explore innovative diagnostic and therapeutic strategies. Bisulfite genomic sequencing developed by Frommer and colleagues was recognized as a revolution in DNA methylation analysis based on conversion of genomic DNA by using sodium bisulfite. Besides various merits of the bisulfite genomic sequencing method such as being highly qualitative and quantitative, it serves as a fundamental principle to many derived methods to better interpret the mystery of DNA methylation. Here, we present a protocol currently frequently used in our laboratory that has proven to yield optimal outcomes. We also discuss the potential technical problems and troubleshooting notes for a variety of applications in this field.

  1. Differentiation of Staphylococcus spp. by high-resolution melting analysis.

    PubMed

    Slany, Michal; Vanerkova, Martina; Nemcova, Eva; Zaloudikova, Barbora; Ruzicka, Filip; Freiberger, Tomas

    2010-12-01

    High-resolution melting analysis (HRMA) is a fast (post-PCR) high-throughput method to scan for sequence variations in a target gene. The aim of this study was to test the potential of HRMA to distinguish particular bacterial species of the Staphylococcus genus even when using a broad-range PCR within the 16S rRNA gene where sequence differences are minimal. Genomic DNA samples isolated from 12 reference staphylococcal strains (Staphylococcus aureus, Staphylococcus capitis, Staphylococcus caprae, Staphylococcus epidermidis, Staphylococcus haemolyticus, Staphylococcus hominis, Staphylococcus intermedius, Staphylococcus saprophyticus, Staphylococcus sciuri, Staphylococcus simulans, Staphylococcus warneri, and Staphylococcus xylosus) were subjected to a real-time PCR amplification of the 16S rRNA gene in the presence of fluorescent dye EvaGreen™, followed by HRMA. Melting profiles were used as molecular fingerprints for bacterial species differentiation. HRMA of S. saprophyticus and S. xylosus resulted in undistinguishable profiles because of their identical sequences in the analyzed 16S rRNA region. The remaining reference strains were fully differentiated either directly or via high-resolution plots obtained by heteroduplex formation between coamplified PCR products of the tested staphylococcal strain and phylogenetically unrelated strain.

  2. GRETINA commissioning and engineering run resolution analysis

    NASA Astrophysics Data System (ADS)

    Tarlow, Thomas; Beausang, Con; Ross, Tim; Hughes, Richard; Gell, Kristen; Good, Erin

    2012-10-01

    GRETINA, the first stage in the full Gamma Ray Energy Tracking Array (GRETA), consists of seven modules covering approximately 1 solid angle. Each module is made up of four large, highly-segmented germanium detectors capable of measuring the interaction points of individual gamma-rays. GRETINA has recently been assembled and commissioned in LBNL via a series of engineering and commissioning runs. Here we report on an analysis of data from the first engineering run (ER01) which was intended to probe the response of the data acquisition system to high multiplicity gamma-ray cascades. For this experiment the 122Sn(40Ar, 4n) reaction at a beam energy of 210 MeV was utilized to populate high spin states in 158Er. A variety of beam currents, targets and trigger conditions were utilized to test the acquisition. Here we report on the measured energy resolution, both with calibration and in-beam sources as well as a gamma-gamma coincidence analysis to confirm the known level scheme and the capability of the data acquisition system for high fold coincidence measurements. This work was partly supported by the US Department of Energy via grant numbers DE-FG52-09NA29454 and DE-FG02-05-ER41379.

  3. Analysis of the allohexaploid bread wheat genome (Triticum aestivum) using comparative whole genome shotgun sequencing

    USDA-ARS?s Scientific Manuscript database

    The large 17 Gb allopolyploid genome of bread wheat is a major challenge for genome analysis because it is composed of three closely- related and independently maintained genomes, with genes dispersed as small “islands” separated by vast tracts of repetitive DNA. We used a novel comparative genomi...

  4. Impact of copy number variations burden on coding genome in humans using integrated high resolution arrays.

    PubMed

    Veerappa, Avinash M; Lingaiah, Kusuma; Vishweswaraiah, Sangeetha; Murthy, Megha N; Suresh, Raviraj V; Manjegowda, Dinesh S; Ramachandra, Nallur B

    2014-12-16

    Copy number variations (CNVs) alter the transcriptional and translational levels of genes by disrupting the coding structure and this burden of CNVs seems to be a significant contributor to phenotypic variations. Therefore it was necessary to assess the complexities of CNV burden on the coding genome. A total of 1715 individuals from 12 populations were used for CNV analysis in the present investigation. Analysis was performed using Affymetrix Genome-Wide Human SNP Array 6·0 chip and CytoScan High-Density arrays. CNVs were more frequently observed in the coding region than in the non-coding region. CNVs were observed vastly more frequently in the coding region than the non-coding region. CNVs were found to be enriched in the regions containing functional genes (83-96%) compared with the regions containing pseudogenes (4-17%). CNVs across the genome of an individual showed multiple hits across many genes, whose proteins interact physically and function under the same pathway. We identified varying numbers of proteins and degrees of interactions within protein complexes of single individual genomes. This study represents the first draft of a population-specific CNV genes map as well as a cross-populational map. The complex relationship of CNVs on genes and their physically interacting partners unravels many complexities involved in phenotype expression. This study identifies four mechanisms contributing to the complexities caused by the presence of multiple CNVs across many genes in the coding part of the genome.

  5. CSMET: Comparative Genomic Motif Detection via Multi-Resolution Phylogenetic Shadowing

    PubMed Central

    Kolar, Mladen; Xing, Eric P.

    2008-01-01

    Functional turnover of transcription factor binding sites (TFBSs), such as whole-motif loss or gain, are common events during genome evolution. Conventional probabilistic phylogenetic shadowing methods model the evolution of genomes only at nucleotide level, and lack the ability to capture the evolutionary dynamics of functional turnover of aligned sequence entities. As a result, comparative genomic search of non-conserved motifs across evolutionarily related taxa remains a difficult challenge, especially in higher eukaryotes, where the cis-regulatory regions containing motifs can be long and divergent; existing methods rely heavily on specialized pattern-driven heuristic search or sampling algorithms, which can be difficult to generalize and hard to interpret based on phylogenetic principles. We propose a new method: Conditional Shadowing via Multi-resolution Evolutionary Trees, or CSMET, which uses a context-dependent probabilistic graphical model that allows aligned sites from different taxa in a multiple alignment to be modeled by either a background or an appropriate motif phylogeny conditioning on the functional specifications of each taxon. The functional specifications themselves are the output of a phylogeny which models the evolution not of individual nucleotides, but of the overall functionality (e.g., functional retention or loss) of the aligned sequence segments over lineages. Combining this method with a hidden Markov model that autocorrelates evolutionary rates on successive sites in the genome, CSMET offers a principled way to take into consideration lineage-specific evolution of TFBSs during motif detection, and a readily computable analytical form of the posterior distribution of motifs under TFBS turnover. On both simulated and real Drosophila cis-regulatory modules, CSMET outperforms other state-of-the-art comparative genomic motif finders. PMID:18535663

  6. PGSB/MIPS Plant Genome Information Resources and Concepts for the Analysis of Complex Grass Genomes.

    PubMed

    Spannagl, Manuel; Bader, Kai; Pfeifer, Matthias; Nussbaumer, Thomas; Mayer, Klaus F X

    2016-01-01

    PGSB (Plant Genome and Systems Biology; formerly MIPS-Munich Institute for Protein Sequences) has been involved in developing, implementing and maintaining plant genome databases for more than a decade. Genome databases and analysis resources have focused on individual genomes and aim to provide flexible and maintainable datasets for model plant genomes as a backbone against which experimental data, e.g., from high-throughput functional genomics, can be organized and analyzed. In addition, genomes from both model and crop plants form a scaffold for comparative genomics, assisted by specialized tools such as the CrowsNest viewer to explore conserved gene order (synteny) between related species on macro- and micro-levels.The genomes of many economically important Triticeae plants such as wheat, barley, and rye present a great challenge for sequence assembly and bioinformatic analysis due to their enormous complexity and large genome size. Novel concepts and strategies have been developed to deal with these difficulties and have been applied to the genomes of wheat, barley, rye, and other cereals. This includes the GenomeZipper concept, reference-guided exome assembly, and "chromosome genomics" based on flow cytometry sorted chromosomes.

  7. Motion analysis using 3D high-resolution frequency analysis.

    PubMed

    Ueda, Takaaki; Fujii, Kenta; Hirobayashi, Shigeki; Yoshizawa, Toshio; Misawa, Tadanobu

    2013-08-01

    The spatiotemporal spectra of a video that contains a moving object form a plane in the 3D frequency domain. This plane, which is described as the theoretical motion plane, reflects the velocity of the moving objects, which is calculated from the slope. However, if the resolution of the frequency analysis method is not high enough to obtain actual spectra from the object signal, the spatiotemporal spectra disperse away from the theoretical motion plane. In this paper, we propose a high-resolution frequency analysis method, described as 3D nonharmonic analysis (NHA), which is only weakly influenced by the analysis window. In addition, we estimate the motion vectors of objects in a video using the plane-clustering method, in conjunction with the least-squares method, for 3D NHA spatiotemporal spectra. We experimentally verify the accuracy of the 3D NHA and its usefulness for a sequence containing complex motions, such as cross-over motion, through comparison with 3D fast Fourier transform. The experimental results show that increasing the frequency resolution contributes to high-accuracy estimation of a motion plane.

  8. Genome analysis and genetic enhancement of tomato.

    PubMed

    Gupta, Vikrant; Mathur, Saloni; Solanke, Amolkumar U; Sharma, Manoj K; Kumar, Rahul; Vyas, Shailendra; Khurana, Paramjit; Khurana, Jitendra P; Tyagi, Akhilesh K; Sharma, Arun K

    2009-01-01

    The Solanaceae is an important family of vegetable crops, ornamentals and medicinal plants. Tomato has served as a model member of this family largely because of its enriched cytogenetic, genetic, as well as physical, maps. Mapping has helped in cloning several genes of importance such as Pto, responsible for resistance against bacterial speck disease, Mi-1.2 for resistance against nematodes, and fw2.2 QTL for fruit weight. A high-throughput genome-sequencing program has been initiated by an international consortium of 10 countries. Since heterochromatin has been found to be concentrated near centromeres, the consortium is focusing on sequencing only the gene-rich euchromatic region. Genomes of the members of Solanaceae show a significant degree of synteny, suggesting that the tomato genome sequence would help in the cloning of genes for important traits from other Solanaceae members as well. ESTs from a large number of cDNA libraries have been sequenced, and microarray chips, in conjunction with wide array of ripening mutants, have contributed immensely to the understanding of the fruit-ripening phenomenon. Work on the analysis of the tomato proteome has also been initiated. Transgenic tomato plants with improved abiotic stress tolerance, disease resistance and insect resistance, have been developed. Attempts have also been made to develop tomato as a bioreactor for various pharmaceutical proteins. However, control of fruit quality and ripening remains an active and challenging area of research. Such efforts should pave the way to improve not only tomato, but also other solanaceous crops.

  9. Ensemble analysis of adaptive compressed genome sequencing strategies.

    PubMed

    Taghavi, Zeinab

    2014-01-01

    Acquiring genomes at single-cell resolution has many applications such as in the study of microbiota. However, deep sequencing and assembly of all of millions of cells in a sample is prohibitively costly. A property that can come to rescue is that deep sequencing of every cell should not be necessary to capture all distinct genomes, as the majority of cells are biological replicates. Biologically important samples are often sparse in that sense. In this paper, we propose an adaptive compressed method, also known as distilled sensing, to capture all distinct genomes in a sparse microbial community with reduced sequencing effort. As opposed to group testing in which the number of distinct events is often constant and sparsity is equivalent to rarity of an event, sparsity in our case means scarcity of distinct events in comparison to the data size. Previously, we introduced the problem and proposed a distilled sensing solution based on the breadth first search strategy. We simulated the whole process which constrained our ability to study the behavior of the algorithm for the entire ensemble due to its computational intensity. In this paper, we modify our previous breadth first search strategy and introduce the depth first search strategy. Instead of simulating the entire process, which is intractable for a large number of experiments, we provide a dynamic programming algorithm to analyze the behavior of the method for the entire ensemble. The ensemble analysis algorithm recursively calculates the probability of capturing every distinct genome and also the expected total sequenced nucleotides for a given population profile. Our results suggest that the expected total sequenced nucleotides grows proportional to log of the number of cells and proportional linearly with the number of distinct genomes. The probability of missing a genome depends on its abundance and the ratio of its size over the maximum genome size in the sample. The modified resource allocation method

  10. Application of resequencing to rice genomics, functional genomics and evolutionary analysis

    PubMed Central

    2014-01-01

    Rice is a model system used for crop genomics studies. The completion of the rice genome draft sequences in 2002 not only accelerated functional genome studies, but also initiated a new era of resequencing rice genomes. Based on the reference genome in rice, next-generation sequencing (NGS) using the high-throughput sequencing system can efficiently accomplish whole genome resequencing of various genetic populations and diverse germplasm resources. Resequencing technology has been effectively utilized in evolutionary analysis, rice genomics and functional genomics studies. This technique is beneficial for both bridging the knowledge gap between genotype and phenotype and facilitating molecular breeding via gene design in rice. Here, we also discuss the limitation, application and future prospects of rice resequencing. PMID:25006357

  11. Peptidoglycan: a post-genomic analysis

    PubMed Central

    2012-01-01

    Background To derive post-genomic, neutral insight into the peptidoglycan (PG) distribution among organisms, we mined 1,644 genomes listed in the Carbohydrate-Active Enzymes database for the presence of a minimal 3-gene set that is necessary for PG metabolism. This gene set consists of one gene from the glycosyltransferase family GT28, one from family GT51 and at least one gene belonging to one of five glycoside hydrolase families (GH23, GH73, GH102, GH103 and GH104). Results None of the 103 Viruses or 101 Archaea examined possessed the minimal 3-gene set, but this set was detected in 1/42 of the Eukarya members (Micromonas sp., coding for GT28, GT51 and GH103) and in 1,260/1,398 (90.1%) of Bacteria, with a 100% positive predictive value for the presence of PG. Pearson correlation test showed that GT51 family genes were significantly associated with PG with a value of 0.963 and a p value less than 10-3. This result was confirmed by a phylogenetic comparative analysis showing that the GT51-encoding gene was significantly associated with PG with a Pagel’s score of 60 and 51 (percentage of error close to 0%). Phylogenetic analysis indicated that the GT51 gene history comprised eight loss and one gain events, and suggested a dynamic on-going process. Conclusions Genome analysis is a neutral approach to explore prospectively the presence of PG in uncultured, sequenced organisms with high predictive values. PMID:23249425

  12. High-resolution genomic screening in mantle cell lymphoma--specific changes correlate with genomic complexity, the proliferation signature and survival.

    PubMed

    Halldórsdóttir, Anna M; Sander, Birgitta; Göransson, Hanna; Isaksson, Anders; Kimby, Eva; Mansouri, Mahmoud; Rosenquist, Richard; Ehrencrona, Hans

    2011-02-01

    Mantle cell lymphoma (MCL) is characterized by the t(11;14)(q13;q32) and numerous copy number aberrations (CNAs). Recently, gene expression profiling defined a proliferation gene expression signature in MCL where high scores predict shorter survival. We investigated 31 MCL cases using high-density single nucleotide polymorphism arrays and correlated CNA patterns with the proliferation signature and with clinical data. Many recurrent CNAs typical of MCL were detected, including losses at 1p (55%), 8p (29%), 9q (29%), 11q (55%), 13q (42%) and 17p (32%), and gains at 3q (39%), 8q (26%), 15q (23%) and 18q (23%). A novel deleted region at 20q (16%) contained only one candidate gene, ZFP64, a putative tumor suppressor. Unsupervised clustering identified subgroups with different patterns of CNAs, including a subset (19%) characterized by the presence of 11q loss in all cases and by the absence of 13q loss, and 3q and 7p gains. Losses at 1p, 8p, 13q and 17p were associated with increased genomic complexity. High proliferation signature scores correlated with increased number of large (>15 Mbp) CNAs (P = 0.03) as well as copy number gains at 7p (P = 0.02) and losses at 9q (P = 0.04). Furthermore, large/complex 13q losses were associated with improved survival (P < 0.05) as were losses/copy number neutral LOH at 19p13 (P = 0.01). In summary, this high-resolution genomic analysis identified novel aberrations and revealed that several CNAs correlated with genomic complexity, the proliferation status and survival.

  13. Dynamics of genomic clones in breast cancer patient xenografts at single cell resolution

    PubMed Central

    Eirew, Peter; Steif, Adi; Khattra, Jaswinder; Ha, Gavin; Yap, Damian; Farahani, Hossein; Gelmon, Karen; Chia, Stephen; Mar, Colin; Wan, Adrian; Laks, Emma; Biele, Justina; Shumansky, Karey; Rosner, Jamie; McPherson, Andrew; Nielsen, Cydney; Roth, Andrew J. L.; Lefebvre, Calvin; Bashashati, Ali; de Souza, Camila; Siu, Celia; Aniba, Radhouane; Brimhall, Jazmine; Oloumi, Arusha; Osako, Tomo; Bruna, Alejandra; Sandoval, Jose; Algara, Teresa; Greenwood, Wendy; Leung, Kaston; Cheng, Hongwei; Xue, Hui; Wang, Yuzhuo; Lin, Dong; Mungall, Andrew J.; Moore, Richard; Zhao, Yongjun; Lorette, Julie; Nguyen, Long; Huntsman, David; Eaves, Connie J.; Hansen, Carl; Marra, Marco A.; Caldas, Carlos; Shah, Sohrab P.; Aparicio, Samuel

    2016-01-01

    Human cancers, including breast cancers, are comprised of clones differing in mutation content. Clones evolve dynamically in space and time following principles of Darwinian evolution1,2, underpinning important emergent features such as drug resistance and metastasis3–7. Human breast cancer xenoengraftment is used as a means of capturing and studying tumour biology, and breast tumour xenografts are generally assumed to be reasonable models of the originating tumours8–10. However the consequences and reproducibility of engraftment and propagation on the genomic clonal architecture of tumours has not been systematically examined at single cell resolution. Here we show by both deep genome and single cell sequencing methods, the clonal dynamics of initial engraftment and subsequent serial propagation of primary and metastatic human breast cancers in immunodeficient mice. In all 15 cases examined, clonal selection on engraftment was observed in both primary and metastatic breast tumours, varying in degree from extreme selective engraftment of minor (<5% of starting population) clones to moderate, polyclonal engraftment. Furthermore, ongoing clonal dynamics during serial passaging is a feature of tumours experiencing modest initial selection. Through single cell sequencing, we show that major mutation clusters estimated from tumour population sequencing relate predictably to the most abundant clonal genotypes, even in clonally complex and rapidly evolving cases. Finally, we show that similar clonal expansion patterns can emerge in independent grafts of the same starting tumour population, indicating that genomic aberrations can be reproducible determinants of evolutionary trajectories. Our results show that measurement of genomically defined clonal population dynamics will be highly informative for functional studies utilizing patient-derived breast cancer xenoengraftment. PMID:25470049

  14. Dynamics of genomic clones in breast cancer patient xenografts at single-cell resolution.

    PubMed

    Eirew, Peter; Steif, Adi; Khattra, Jaswinder; Ha, Gavin; Yap, Damian; Farahani, Hossein; Gelmon, Karen; Chia, Stephen; Mar, Colin; Wan, Adrian; Laks, Emma; Biele, Justina; Shumansky, Karey; Rosner, Jamie; McPherson, Andrew; Nielsen, Cydney; Roth, Andrew J L; Lefebvre, Calvin; Bashashati, Ali; de Souza, Camila; Siu, Celia; Aniba, Radhouane; Brimhall, Jazmine; Oloumi, Arusha; Osako, Tomo; Bruna, Alejandra; Sandoval, Jose L; Algara, Teresa; Greenwood, Wendy; Leung, Kaston; Cheng, Hongwei; Xue, Hui; Wang, Yuzhuo; Lin, Dong; Mungall, Andrew J; Moore, Richard; Zhao, Yongjun; Lorette, Julie; Nguyen, Long; Huntsman, David; Eaves, Connie J; Hansen, Carl; Marra, Marco A; Caldas, Carlos; Shah, Sohrab P; Aparicio, Samuel

    2015-02-19

    Human cancers, including breast cancers, comprise clones differing in mutation content. Clones evolve dynamically in space and time following principles of Darwinian evolution, underpinning important emergent features such as drug resistance and metastasis. Human breast cancer xenoengraftment is used as a means of capturing and studying tumour biology, and breast tumour xenografts are generally assumed to be reasonable models of the originating tumours. However, the consequences and reproducibility of engraftment and propagation on the genomic clonal architecture of tumours have not been systematically examined at single-cell resolution. Here we show, using deep-genome and single-cell sequencing methods, the clonal dynamics of initial engraftment and subsequent serial propagation of primary and metastatic human breast cancers in immunodeficient mice. In all 15 cases examined, clonal selection on engraftment was observed in both primary and metastatic breast tumours, varying in degree from extreme selective engraftment of minor (<5% of starting population) clones to moderate, polyclonal engraftment. Furthermore, ongoing clonal dynamics during serial passaging is a feature of tumours experiencing modest initial selection. Through single-cell sequencing, we show that major mutation clusters estimated from tumour population sequencing relate predictably to the most abundant clonal genotypes, even in clonally complex and rapidly evolving cases. Finally, we show that similar clonal expansion patterns can emerge in independent grafts of the same starting tumour population, indicating that genomic aberrations can be reproducible determinants of evolutionary trajectories. Our results show that measurement of genomically defined clonal population dynamics will be highly informative for functional studies using patient-derived breast cancer xenoengraftment.

  15. High range resolution micro-Doppler analysis

    NASA Astrophysics Data System (ADS)

    Cammenga, Zachary A.; Smith, Graeme E.; Baker, Christopher J.

    2015-05-01

    This paper addresses use of the micro-Doppler effect and the use of high range-resolution profiles to observe complex targets in complex target scenes. The combination of micro-Doppler and high range-resolution provides the ability to separate the motion of complex targets from one another. This ability leads to the differentiation of targets based on their micro-Doppler signatures. Without the high-range resolution, this would not be possible because the individual signatures would not be separable. This paper also addresses the use of the micro-Doppler information and high range-resolution profiles to generate an approximation of the scattering properties of a complex target. This approximation gives insight into the structure of the complex target and, critically, is created without using a pre-determined target model.

  16. Comparative Genomic Analysis of Meningitis- and Bacteremia-Causing Pneumococci Identifies a Common Core Genome

    PubMed Central

    Cornick, Jennifer E.; Chaguza, Chrispin; Yalcin, Feyruz; Harris, Simon R.; Gray, Katherine J.; Kiran, Anmol M.; Molyneux, Elizabeth; French, Neil; Faragher, Brian E.; Everett, Dean B.; Bentley, Stephen D.

    2015-01-01

    Streptococcus pneumoniae is a nasopharyngeal commensal that occasionally invades normally sterile sites to cause bloodstream infection and meningitis. Although the pneumococcal population structure and evolutionary genetics are well defined, it is not clear whether pneumococci that cause meningitis are genetically distinct from those that do not. Here, we used whole-genome sequencing of 140 isolates of S. pneumoniae recovered from bloodstream infection (n = 70) and meningitis (n = 70) to compare their genetic contents. By fitting a double-exponential decaying-function model, we show that these isolates share a core of 1,427 genes (95% confidence interval [CI], 1,425 to 1,435 genes) and that there is no difference in the core genome or accessory gene content from these disease manifestations. Gene presence/absence alone therefore does not explain the virulence behavior of pneumococci that reach the meninges. Our analysis, however, supports the requirement of a range of previously described virulence factors and vaccine candidates for both meningitis- and bacteremia-causing pneumococci. This high-resolution view suggests that, despite considerable competency for genetic exchange, all pneumococci are under considerable pressure to retain key components advantageous for colonization and transmission and that these components are essential for access to and survival in sterile sites. PMID:26259813

  17. Comparative Genomic Analysis of Meningitis- and Bacteremia-Causing Pneumococci Identifies a Common Core Genome.

    PubMed

    Kulohoma, Benard W; Cornick, Jennifer E; Chaguza, Chrispin; Yalcin, Feyruz; Harris, Simon R; Gray, Katherine J; Kiran, Anmol M; Molyneux, Elizabeth; French, Neil; Parkhill, Julian; Faragher, Brian E; Everett, Dean B; Bentley, Stephen D; Heyderman, Robert S

    2015-10-01

    Streptococcus pneumoniae is a nasopharyngeal commensal that occasionally invades normally sterile sites to cause bloodstream infection and meningitis. Although the pneumococcal population structure and evolutionary genetics are well defined, it is not clear whether pneumococci that cause meningitis are genetically distinct from those that do not. Here, we used whole-genome sequencing of 140 isolates of S. pneumoniae recovered from bloodstream infection (n = 70) and meningitis (n = 70) to compare their genetic contents. By fitting a double-exponential decaying-function model, we show that these isolates share a core of 1,427 genes (95% confidence interval [CI], 1,425 to 1,435 genes) and that there is no difference in the core genome or accessory gene content from these disease manifestations. Gene presence/absence alone therefore does not explain the virulence behavior of pneumococci that reach the meninges. Our analysis, however, supports the requirement of a range of previously described virulence factors and vaccine candidates for both meningitis- and bacteremia-causing pneumococci. This high-resolution view suggests that, despite considerable competency for genetic exchange, all pneumococci are under considerable pressure to retain key components advantageous for colonization and transmission and that these components are essential for access to and survival in sterile sites.

  18. Comparative genomic analysis of prion genes

    PubMed Central

    Premzl, Marko; Gamulin, Vera

    2007-01-01

    Background The homologues of human disease genes are expected to contribute to better understanding of physiological and pathogenic processes. We made use of the present availability of vertebrate genomic sequences, and we have conducted the most comprehensive comparative genomic analysis of the prion protein gene PRNP and its homologues, shadow of prion protein gene SPRN and doppel gene PRND, and prion testis-specific gene PRNT so far. Results While the SPRN and PRNP homologues are present in all vertebrates, PRND is known in tetrapods, and PRNT is present in primates. PRNT could be viewed as a TE-associated gene. Using human as the base sequence for genomic sequence comparisons (VISTA), we annotated numerous potential cis-elements. The conserved regions in SPRNs harbour the potential Sp1 sites in promoters (mammals, birds), C-rich intron splicing enhancers and PTB intron splicing silencers in introns (mammals, birds), and hsa-miR-34a sites in 3'-UTRs (eutherians). We showed the conserved PRNP upstream regions, which may be potential enhancers or silencers (primates, dog). In the PRNP 3'-UTRs, there are conserved cytoplasmic polyadenylation element sites (mammals, birds). The PRND core promoters include highly conserved CCAAT, CArG and TATA boxes (mammals). We deduced 42 new protein primary structures, and performed the first phylogenetic analysis of all vertebrate prion genes. Using the protein alignment which included 122 sequences, we constructed the neighbour-joining tree which showed four major clusters, including shadoos, shadoo2s and prion protein-likes (cluster 1), fish prion proteins (cluster 2), tetrapode prion proteins (cluster 3) and doppels (cluster 4). We showed that the entire prion protein conformationally plastic region is well conserved between eutherian prion proteins and shadoos (18–25% identity and 28–34% similarity), and there could be a potential structural compatibility between shadoos and the left-handed parallel beta-helical fold

  19. Genome-wide analysis of HPV integration in human cancers reveals recurrent, focal genomic instability

    PubMed Central

    Akagi, Keiko; Li, Jingfeng; Broutian, Tatevik R.; Padilla-Nash, Hesed; Xiao, Weihong; Jiang, Bo; Rocco, James W.; Teknos, Theodoros N.; Kumar, Bhavna; Wangsa, Danny; He, Dandan; Ried, Thomas; Symer, David E.; Gillison, Maura L.

    2014-01-01

    Genomic instability is a hallmark of human cancers, including the 5% caused by human papillomavirus (HPV). Here we report a striking association between HPV integration and adjacent host genomic structural variation in human cancer cell lines and primary tumors. Whole-genome sequencing revealed HPV integrants flanking and bridging extensive host genomic amplifications and rearrangements, including deletions, inversions, and chromosomal translocations. We present a model of “looping” by which HPV integrant-mediated DNA replication and recombination may result in viral–host DNA concatemers, frequently disrupting genes involved in oncogenesis and amplifying HPV oncogenes E6 and E7. Our high-resolution results shed new light on a catastrophic process, distinct from chromothripsis and other mutational processes, by which HPV directly promotes genomic instability. PMID:24201445

  20. Initial sequencing and comparative analysis of the mouse genome

    SciTech Connect

    Waterston, Robert H.; Lindblad-Toh, Kerstin; Birney, Ewan; Rogers, Jane; Abril, Josep F.; Agarwal, Pankaj; Agarwala, Richa; Ainscough, Rachel; Alexandersson, Marina; An, Peter; Antonarakis, Stylianos E.; Attwood, John; Baertsch, Robert; Bailey, Jonathon; Barlow, Karen; Beck, Stephan; Berry, Eric; Birren, Bruce; Bloom, Toby; Bork, Peer; Botcherby, Marc; Bray, Nicolas; Brent, Michael R.; Brown, Daniel G.; Brown, Stephen D.; Bult, Carol; Burton, John; Butler, Jonathan; Campbell, Robert D.; Carninci, Piero; Cawley, Simon; Chiaromonte, Francesca; Chinwalla, Asif T.; Church, Deanna M.; Clamp, Michele; Clee, Christopher; Collins, Francis S.; Cook, Lisa L.; Copley, Richard R.; Coulson, Alan; Couronne, Olivier; Cuff, James; Curwen, Val; Cutts, Tim; Daly, Mark; David, Robert; Davies, Joy; Delehaunty, Kimberly D.; Deri, Justin; Dermitzakis, Emmanouil T.; Dewey, Colin; Dickens, Nicholas J.; Diekhans, Mark; Dodge, Sheila; Dubchak, Inna; Dunn, Diane M.; Eddy, Sean R.; Elnitski, Laura; Emes, Richard D.; Eswara, Pallavi; Eyras, Eduardo; Felsenfeld, Adam; Fewell, Ginger A.; Flicek, Paul; Foley, Karen; Frankel, Wayne N.; Fulton, Lucinda A.; Fulton, Robert S.; Furey, Terrence S.; Gage, Diane; Gibbs, Richard A.; Glusman, Gustavo; Gnerre, Sante; Goldman, Nick; Goodstadt, Leo; Grafham, Darren; Graves, Tina A.; Green, Eric D.; Gregory, Simon; Guigo, Roderic; Guyer, Mark; Hardison, Ross C.; Haussler, David; Hayashizaki, Yoshihide; Hillier, LaDeana W.; Hinrichs, Angela; Hlavina, Wratko; Holzer, Timothy; Hsu, Fan; Hua, Axin; Hubbard, Tim; Hunt, Adrienne; Jackson, Ian; Jaffe, David B.; Johnson, L. Steven; Jones, Matthew; Jones, Thomas A.; Joy, Ann; Kamal, Michael; Karlsson, Elinor K.; Karolchik, Donna; Kasprzyk, Arkadiusz; Kawai, Jun; Keibler, Evan; Kells, Cristyn; Kent, W. James; Kirby, Andrew; Kolbe, Diana L.; Korf, Ian; Kucherlapati, Raju S.; Kulbokas III, Edward J.; Kulp, David; Landers, Tom; Leger, J.P.; Leonard, Steven; Letunic, Ivica; Levine, Rosie; et al.

    2002-12-15

    The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.

  1. Initial sequencing and comparative analysis of the mouse genome.

    PubMed

    Waterston, Robert H; Lindblad-Toh, Kerstin; Birney, Ewan; Rogers, Jane; Abril, Josep F; Agarwal, Pankaj; Agarwala, Richa; Ainscough, Rachel; Alexandersson, Marina; An, Peter; Antonarakis, Stylianos E; Attwood, John; Baertsch, Robert; Bailey, Jonathon; Barlow, Karen; Beck, Stephan; Berry, Eric; Birren, Bruce; Bloom, Toby; Bork, Peer; Botcherby, Marc; Bray, Nicolas; Brent, Michael R; Brown, Daniel G; Brown, Stephen D; Bult, Carol; Burton, John; Butler, Jonathan; Campbell, Robert D; Carninci, Piero; Cawley, Simon; Chiaromonte, Francesca; Chinwalla, Asif T; Church, Deanna M; Clamp, Michele; Clee, Christopher; Collins, Francis S; Cook, Lisa L; Copley, Richard R; Coulson, Alan; Couronne, Olivier; Cuff, James; Curwen, Val; Cutts, Tim; Daly, Mark; David, Robert; Davies, Joy; Delehaunty, Kimberly D; Deri, Justin; Dermitzakis, Emmanouil T; Dewey, Colin; Dickens, Nicholas J; Diekhans, Mark; Dodge, Sheila; Dubchak, Inna; Dunn, Diane M; Eddy, Sean R; Elnitski, Laura; Emes, Richard D; Eswara, Pallavi; Eyras, Eduardo; Felsenfeld, Adam; Fewell, Ginger A; Flicek, Paul; Foley, Karen; Frankel, Wayne N; Fulton, Lucinda A; Fulton, Robert S; Furey, Terrence S; Gage, Diane; Gibbs, Richard A; Glusman, Gustavo; Gnerre, Sante; Goldman, Nick; Goodstadt, Leo; Grafham, Darren; Graves, Tina A; Green, Eric D; Gregory, Simon; Guigó, Roderic; Guyer, Mark; Hardison, Ross C; Haussler, David; Hayashizaki, Yoshihide; Hillier, LaDeana W; Hinrichs, Angela; Hlavina, Wratko; Holzer, Timothy; Hsu, Fan; Hua, Axin; Hubbard, Tim; Hunt, Adrienne; Jackson, Ian; Jaffe, David B; Johnson, L Steven; Jones, Matthew; Jones, Thomas A; Joy, Ann; Kamal, Michael; Karlsson, Elinor K; Karolchik, Donna; Kasprzyk, Arkadiusz; Kawai, Jun; Keibler, Evan; Kells, Cristyn; Kent, W James; Kirby, Andrew; Kolbe, Diana L; Korf, Ian; Kucherlapati, Raju S; Kulbokas, Edward J; Kulp, David; Landers, Tom; Leger, J P; Leonard, Steven; Letunic, Ivica; Levine, Rosie; Li, Jia; Li, Ming; Lloyd, Christine; Lucas, Susan; Ma, Bin; Maglott, Donna R; Mardis, Elaine R; Matthews, Lucy; Mauceli, Evan; Mayer, John H; McCarthy, Megan; McCombie, W Richard; McLaren, Stuart; McLay, Kirsten; McPherson, John D; Meldrim, Jim; Meredith, Beverley; Mesirov, Jill P; Miller, Webb; Miner, Tracie L; Mongin, Emmanuel; Montgomery, Kate T; Morgan, Michael; Mott, Richard; Mullikin, James C; Muzny, Donna M; Nash, William E; Nelson, Joanne O; Nhan, Michael N; Nicol, Robert; Ning, Zemin; Nusbaum, Chad; O'Connor, Michael J; Okazaki, Yasushi; Oliver, Karen; Overton-Larty, Emma; Pachter, Lior; Parra, Genís; Pepin, Kymberlie H; Peterson, Jane; Pevzner, Pavel; Plumb, Robert; Pohl, Craig S; Poliakov, Alex; Ponce, Tracy C; Ponting, Chris P; Potter, Simon; Quail, Michael; Reymond, Alexandre; Roe, Bruce A; Roskin, Krishna M; Rubin, Edward M; Rust, Alistair G; Santos, Ralph; Sapojnikov, Victor; Schultz, Brian; Schultz, Jörg; Schwartz, Matthias S; Schwartz, Scott; Scott, Carol; Seaman, Steven; Searle, Steve; Sharpe, Ted; Sheridan, Andrew; Shownkeen, Ratna; Sims, Sarah; Singer, Jonathan B; Slater, Guy; Smit, Arian; Smith, Douglas R; Spencer, Brian; Stabenau, Arne; Stange-Thomann, Nicole; Sugnet, Charles; Suyama, Mikita; Tesler, Glenn; Thompson, Johanna; Torrents, David; Trevaskis, Evanne; Tromp, John; Ucla, Catherine; Ureta-Vidal, Abel; Vinson, Jade P; Von Niederhausern, Andrew C; Wade, Claire M; Wall, Melanie; Weber, Ryan J; Weiss, Robert B; Wendl, Michael C; West, Anthony P; Wetterstrand, Kris; Wheeler, Raymond; Whelan, Simon; Wierzbowski, Jamey; Willey, David; Williams, Sophie; Wilson, Richard K; Winter, Eitan; Worley, Kim C; Wyman, Dudley; Yang, Shan; Yang, Shiaw-Pyng; Zdobnov, Evgeny M; Zody, Michael C; Lander, Eric S

    2002-12-05

    The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.

  2. Genome-wide analysis correlates Ayurveda Prakriti

    PubMed Central

    Govindaraj, Periyasamy; Nizamuddin, Sheikh; Sharath, Anugula; Jyothi, Vuskamalla; Rotti, Harish; Raval, Ritu; Nayak, Jayakrishna; Bhat, Balakrishna K.; Prasanna, B. V.; Shintre, Pooja; Sule, Mayura; Joshi, Kalpana S.; Dedge, Amrish P.; Bharadwaj, Ramachandra; Gangadharan, G. G.; Nair, Sreekumaran; Gopinath, Puthiya M.; Patwardhan, Bhushan; Kondaiah, Paturu; Satyamoorthy, Kapaettu; Valiathan, Marthanda Varma Sankaran; Thangaraj, Kumarasamy

    2015-01-01

    The practice of Ayurveda, the traditional medicine of India, is based on the concept of three major constitutional types (Vata, Pitta and Kapha) defined as “Prakriti”. To the best of our knowledge, no study has convincingly correlated genomic variations with the classification of Prakriti. In the present study, we performed genome-wide SNP (single nucleotide polymorphism) analysis (Affymetrix, 6.0) of 262 well-classified male individuals (after screening 3416 subjects) belonging to three Prakritis. We found 52 SNPs (p ≤ 1 × 10−5) were significantly different between Prakritis, without any confounding effect of stratification, after 106 permutations. Principal component analysis (PCA) of these SNPs classified 262 individuals into their respective groups (Vata, Pitta and Kapha) irrespective of their ancestry, which represent its power in categorization. We further validated our finding with 297 Indian population samples with known ancestry. Subsequently, we found that PGM1 correlates with phenotype of Pitta as described in the ancient text of Caraka Samhita, suggesting that the phenotypic classification of India’s traditional medicine has a genetic basis; and its Prakriti-based practice in vogue for many centuries resonates with personalized medicine. PMID:26511157

  3. High resolution measurement of DUF1220 domain copy number from whole genome sequence data.

    PubMed

    Astling, David P; Heft, Ilea E; Jones, Kenneth L; Sikela, James M

    2017-08-14

    DUF1220 protein domains found primarily in Neuroblastoma BreakPoint Family (NBPF) genes show the greatest human lineage-specific increase in copy number of any coding region in the genome. There are 302 haploid copies of DUF1220 in hg38 (~160 of which are human-specific) and the majority of these can be divided into 6 different subtypes (referred to as clades). Copy number changes of specific DUF1220 clades have been associated in a dose-dependent manner with brain size variation (both evolutionarily and within the human population), cognitive aptitude, autism severity, and schizophrenia severity. However, no published methods can directly measure copies of DUF1220 with high accuracy and no method can distinguish between domains within a clade. Here we describe a novel method for measuring copies of DUF1220 domains and the NBPF genes in which they are found from whole genome sequence data. We have characterized the effect that various sequencing and alignment parameters and strategies have on the accuracy and precision of the method and defined the parameters that lead to optimal DUF1220 copy number measurement and resolution. We show that copy number estimates obtained using our read depth approach are highly correlated with those generated by ddPCR for three representative DUF1220 clades. By simulation, we demonstrate that our method provides sufficient resolution to analyze DUF1220 copy number variation at three levels: (1) DUF1220 clade copy number within individual genes and groups of genes (gene-specific clade groups) (2) genome wide DUF1220 clade copies and (3) gene copy number for DUF1220-encoding genes. To our knowledge, this is the first method to accurately measure copies of all six DUF1220 clades and the first method to provide gene specific resolution of these clades. This allows one to discriminate among the ~300 haploid human DUF1220 copies to an extent not possible with any other method. The result is a greatly enhanced capability to analyze the

  4. Enhancing genomics information retrieval through dimensional analysis.

    PubMed

    Hu, Qinmin; Huang, Jimmy Xiangji

    2013-06-01

    We propose a novel dimensional analysis approach to employing meta information in order to find the relationships within the unstructured or semi-structured document/passages for improving genomics information retrieval performance. First, we make use of the auxiliary information as three basic dimensions, namely "temporal", "journal", and "author". The reference section is treated as a commensurable quantity of the three basic dimensions. Then, the sample space and subspaces are built up and a set of events are defined to meet the basic requirement of dimensional homogeneity to be commensurable quantities. After that, the classic graph analysis algorithm in the Web environments is applied on each dimension respectively to calculate the importance of each dimension. Finally, we integrate all the dimension networks and re-rank the outputs for evaluation. Our experimental results show the proposed approach is superior and promising.

  5. Identification of transcribed sequences in Arabidopsis thaliana by using high-resolution genome tiling arrays

    NASA Technical Reports Server (NTRS)

    Stolc, Viktor; Samanta, Manoj Pratim; Tongprasit, Waraporn; Sethi, Himanshu; Liang, Shoudan; Nelson, David C.; Hegeman, Adrian; Nelson, Clark; Rancour, David; Bednarek, Sebastian; Ulrich, Eldon L.; Zhao, Qin; Wrobel, Russell L.; Newman, Craig S.; Fox, Brian G.; Phillips, George N Jr; Markley, John L.; Sussman, Michael R.

    2005-01-01

    Using a maskless photolithography method, we produced DNA oligonucleotide microarrays with probe sequences tiled throughout the genome of the plant Arabidopsis thaliana. RNA expression was determined for the complete nuclear, mitochondrial, and chloroplast genomes by tiling 5 million 36-mer probes. These probes were hybridized to labeled mRNA isolated from liquid grown T87 cells, an undifferentiated Arabidopsis cell culture line. Transcripts were detected from at least 60% of the nearly 26,330 annotated genes, which included 151 predicted genes that were not identified previously by a similar genome-wide hybridization study on four different cell lines. In comparison with previously published results with 25-mer tiling arrays produced by chromium masking-based photolithography technique, 36-mer oligonucleotide probes were found to be more useful in identifying intron-exon boundaries. Using two-dimensional HPLC tandem mass spectrometry, a small-scale proteomic analysis was performed with the same cells. A large amount of strongly hybridizing RNA was found in regions "antisense" to known genes. Similarity of antisense activities between the 25-mer and 36-mer data sets suggests that it is a reproducible and inherent property of the experiments. Transcription activities were also detected for many of the intergenic regions and the small RNAs, including tRNA, small nuclear RNA, small nucleolar RNA, and microRNA. Expression of tRNAs correlates with genome-wide amino acid usage.

  6. Identification of transcribed sequences in Arabidopsis thaliana by using high-resolution genome tiling arrays

    NASA Technical Reports Server (NTRS)

    Stolc, Viktor; Samanta, Manoj Pratim; Tongprasit, Waraporn; Sethi, Himanshu; Liang, Shoudan; Nelson, David C.; Hegeman, Adrian; Nelson, Clark; Rancour, David; Bednarek, Sebastian; hide

    2005-01-01

    Using a maskless photolithography method, we produced DNA oligonucleotide microarrays with probe sequences tiled throughout the genome of the plant Arabidopsis thaliana. RNA expression was determined for the complete nuclear, mitochondrial, and chloroplast genomes by tiling 5 million 36-mer probes. These probes were hybridized to labeled mRNA isolated from liquid grown T87 cells, an undifferentiated Arabidopsis cell culture line. Transcripts were detected from at least 60% of the nearly 26,330 annotated genes, which included 151 predicted genes that were not identified previously by a similar genome-wide hybridization study on four different cell lines. In comparison with previously published results with 25-mer tiling arrays produced by chromium masking-based photolithography technique, 36-mer oligonucleotide probes were found to be more useful in identifying intron-exon boundaries. Using two-dimensional HPLC tandem mass spectrometry, a small-scale proteomic analysis was performed with the same cells. A large amount of strongly hybridizing RNA was found in regions "antisense" to known genes. Similarity of antisense activities between the 25-mer and 36-mer data sets suggests that it is a reproducible and inherent property of the experiments. Transcription activities were also detected for many of the intergenic regions and the small RNAs, including tRNA, small nuclear RNA, small nucleolar RNA, and microRNA. Expression of tRNAs correlates with genome-wide amino acid usage.

  7. A proteogenomic analysis of Anopheles gambiae using high-resolution Fourier transform mass spectrometry.

    PubMed

    Chaerkady, Raghothama; Kelkar, Dhanashree S; Muthusamy, Babylakshmi; Kandasamy, Kumaran; Dwivedi, Sutopa B; Sahasrabuddhe, Nandini A; Kim, Min-Sik; Renuse, Santosh; Pinto, Sneha M; Sharma, Rakesh; Pawar, Harsh; Sekhar, Nirujogi Raja; Mohanty, Ajeet Kumar; Getnet, Derese; Yang, Yi; Zhong, Jun; Dash, Aditya P; MacCallum, Robert M; Delanghe, Bernard; Mlambo, Godfree; Kumar, Ashwani; Keshava Prasad, T S; Okulate, Mobolaji; Kumar, Nirbhay; Pandey, Akhilesh

    2011-11-01

    Anopheles gambiae is a major mosquito vector responsible for malaria transmission, whose genome sequence was reported in 2002. Genome annotation is a continuing effort, and many of the approximately 13,000 genes listed in VectorBase for Anopheles gambiae are predictions that have still not been validated by any other method. To identify protein-coding genes of An. gambiae based on its genomic sequence, we carried out a deep proteomic analysis using high-resolution Fourier transform mass spectrometry for both precursor and fragment ions. Based on peptide evidence, we were able to support or correct more than 6000 gene annotations including 80 novel gene structures and about 500 translational start sites. An additional validation by RT-PCR and cDNA sequencing was successfully performed for 105 selected genes. Our proteogenomic analysis led to the identification of 2682 genome search-specific peptides. Numerous cases of encoded proteins were documented in regions annotated as intergenic, introns, or untranslated regions. Using a database created to contain potential splice sites, we also identified 35 novel splice junctions. This is a first report to annotate the An. gambiae genome using high-accuracy mass spectrometry data as a complementary technology for genome annotation.

  8. Genome wide association study of spontaneous resolution of hepatitis C virus infection

    PubMed Central

    Duggal, Priya; Thio, Chloe L.; Wojcik, Genevieve L.; Goedert, James J.; Mangia, Alessandra; Latanich, Rachel; Kim, Arthur Y.; Lauer, Georg M.; Chung, Raymond T.; Peters, Marion G.; Kirk, Greg D.; Mehta, Shruti H.; Cox, Andrea L.; Khakoo, Salim I.; Alric, Laurent; Cramp, Matthew E.; Donfield, Sharyne M.; Edlin, Brian R.; Tobler, Leslie H; Busch, Michael P.; Alexander, Graeme; Rosen, Hugo R.; Gao, Xiaojiang; Abdel-Hamid, Mohamed; Apps, Richard; Carrington, Mary; Thomas, David L.

    2013-01-01

    Background Hepatitis C virus (HCV) infections occur worldwide and either spontaneously resolve or persist and markedly increase the person’s lifetime risk of cirrhosis and hepatocellular carcinoma. Although HCV persistence occurs more often in persons of African ancestry and in persons with a genetic variant near IL28B, the genetic basis is not well understood. Objective To evaluate the host genetic basis for spontaneous resolution of HCV infection. Design Two-stage genome wide association study (GWAS). Setting 13 international multicenter study sites. Patients 919 individuals with serum HCV antibodies but no HCV RNA (spontaneous resolution) and 1482 individuals with serum HCV antibodies and RNA (persistence). Measurements Frequencies of 792,721 SNPs. Results Differences in allele frequencies between persons with spontaneous resolution and persistence were identified on chromosomes 19q13.13 and 6p21.32. On chromosome 19, allele frequency differences localized near IL28B and included rs12979860 (overall per-allele OR = 0.45, P = 2.17 × 10−30) and 10 additional SNPs spanning 55,000 bases. On chromosome 6, allele frequency differences localized near genes for class II human leukocyte antigens (HLA) and included rs4273729 (overall per-allele OR= 0.59, P = 1.71 × 10−16) near DQB1*03:01 and an additional 116 SNPs spanning 1,090,000 base pairs. The associations in chromosomes 19 and 6 were independent, additive, and explain an estimated 14.9% (95% CI: 8.5–22.6%) of the variation in HCV resolution in those of European-Ancestry, and 15.8% (95% CI:4.4–31.0%) in individuals of African-Ancestry. Replication of the chromosome 6 SNP, rs4272729 in an additional 746 individuals confirmed the findings (p=0.015). Limitations Epigenetic effects were not studied. Conclusions IL28B and HLA class II are independently associated with spontaneous resolution of HCV infection and SNPs marking IL28B and DQB1*03:01 may explain ~15% of spontaneous resolution of HCV infection. PMID

  9. High Resolution Genomic Scans Reveal Genetic Architecture Controlling Alcohol Preference in Bidirectionally Selected Rat Model.

    PubMed

    Lo, Chiao-Ling; Lossie, Amy C; Liang, Tiebing; Liu, Yunlong; Xuei, Xiaoling; Lumeng, Lawrence; Zhou, Feng C; Muir, William M

    2016-08-01

    Investigations on the influence of nature vs. nurture on Alcoholism (Alcohol Use Disorder) in human have yet to provide a clear view on potential genomic etiologies. To address this issue, we sequenced a replicated animal model system bidirectionally-selected for alcohol preference (AP). This model is uniquely suited to map genetic effects with high reproducibility, and resolution. The origin of the rat lines (an 8-way cross) resulted in small haplotype blocks (HB) with a corresponding high level of resolution. We sequenced DNAs from 40 samples (10 per line of each replicate) to determine allele frequencies and HB. We achieved ~46X coverage per line and replicate. Excessive differentiation in the genomic architecture between lines, across replicates, termed signatures of selection (SS), were classified according to gene and region. We identified SS in 930 genes associated with AP. The majority (50%) of the SS were confined to single gene regions, the greatest numbers of which were in promoters (284) and intronic regions (169) with the least in exon's (4), suggesting that differences in AP were primarily due to alterations in regulatory regions. We confirmed previously identified genes and found many new genes associated with AP. Of those newly identified genes, several demonstrated neuronal function involved in synaptic memory and reward behavior, e.g. ion channels (Kcnf1, Kcnn3, Scn5a), excitatory receptors (Grin2a, Gria3, Grip1), neurotransmitters (Pomc), and synapses (Snap29). This study not only reveals the polygenic architecture of AP, but also emphasizes the importance of regulatory elements, consistent with other complex traits.

  10. High Resolution Genomic Scans Reveal Genetic Architecture Controlling Alcohol Preference in Bidirectionally Selected Rat Model

    PubMed Central

    Lo, Chiao-Ling; Liang, Tiebing; Liu, Yunlong; Lumeng, Lawrence; Zhou, Feng C.; Muir, William M.

    2016-01-01

    Investigations on the influence of nature vs. nurture on Alcoholism (Alcohol Use Disorder) in human have yet to provide a clear view on potential genomic etiologies. To address this issue, we sequenced a replicated animal model system bidirectionally-selected for alcohol preference (AP). This model is uniquely suited to map genetic effects with high reproducibility, and resolution. The origin of the rat lines (an 8-way cross) resulted in small haplotype blocks (HB) with a corresponding high level of resolution. We sequenced DNAs from 40 samples (10 per line of each replicate) to determine allele frequencies and HB. We achieved ~46X coverage per line and replicate. Excessive differentiation in the genomic architecture between lines, across replicates, termed signatures of selection (SS), were classified according to gene and region. We identified SS in 930 genes associated with AP. The majority (50%) of the SS were confined to single gene regions, the greatest numbers of which were in promoters (284) and intronic regions (169) with the least in exon's (4), suggesting that differences in AP were primarily due to alterations in regulatory regions. We confirmed previously identified genes and found many new genes associated with AP. Of those newly identified genes, several demonstrated neuronal function involved in synaptic memory and reward behavior, e.g. ion channels (Kcnf1, Kcnn3, Scn5a), excitatory receptors (Grin2a, Gria3, Grip1), neurotransmitters (Pomc), and synapses (Snap29). This study not only reveals the polygenic architecture of AP, but also emphasizes the importance of regulatory elements, consistent with other complex traits. PMID:27490364

  11. Optimizing cancer genome sequencing and analysis

    PubMed Central

    Griffith, Malachi; Miller, Christopher A.; Griffith, Obi L.; Krysiak, Kilannin; Skidmore, Zachary L.; Ramu, Avinash; Walker, Jason R.; Dang, Ha X.; Trani, Lee; Larson, David E.; Demeter, Ryan T.; Wendl, Michael C.; McMichael, Joshua F.; Austin, Rachel E.; Magrini, Vincent; McGrath, Sean D.; Ly, Amy; Kulkarni, Shashikant; Cordes, Matthew G.; Fronick, Catrina C.; Fulton, Robert S.; Maher, Christopher A.; Ding, Li; Klco, Jeffery M.; Mardis, Elaine R.; Ley, Timothy J.; Wilson, Richard K.

    2015-01-01

    Summary Tumors are typically sequenced to depths of 75–100× (exome) or 30–50× (whole genome). We demonstrate that current sequencing paradigms are inadequate for tumors that are impure, aneuploid or clonally heterogeneous. To reassess optimal sequencing strategies, we performed ultra-deep (up to ~312×) whole genome sequencing (WGS) and exome capture (up to ~433×) of a primary acute myeloid leukemia, its subsequent relapse, and a matched normal skin sample. We tested multiple alignment and variant calling algorithms and validated ~200,000 putative SNVs by sequencing them to depths of ~1,000×. Additional targeted sequencing provided over 10,000× coverage and ddPCR assays provided up to ~250,000× sampling of selected sites. We evaluated the effects of different library generation approaches, depth of sequencing, and analysis strategies on the ability to effectively characterize a complex tumor. This dataset, representing the most comprehensively sequenced tumor described to date, will serve as an invaluable community resource (dbGaP accession id phs000159). PMID:26645048

  12. DINGO: differential network analysis in genomics

    PubMed Central

    Ha, Min Jin; Baladandayuthapani, Veerabhadran; Do, Kim-Anh

    2015-01-01

    Motivation: Cancer progression and development are initiated by aberrations in various molecular networks through coordinated changes across multiple genes and pathways. It is important to understand how these networks change under different stress conditions and/or patient-specific groups to infer differential patterns of activation and inhibition. Existing methods are limited to correlation networks that are independently estimated from separate group-specific data and without due consideration of relationships that are conserved across multiple groups. Method: We propose a pathway-based differential network analysis in genomics (DINGO) model for estimating group-specific networks and making inference on the differential networks. DINGO jointly estimates the group-specific conditional dependencies by decomposing them into global and group-specific components. The delineation of these components allows for a more refined picture of the major driver and passenger events in the elucidation of cancer progression and development. Results: Simulation studies demonstrate that DINGO provides more accurate group-specific conditional dependencies than achieved by using separate estimation approaches. We apply DINGO to key signaling pathways in glioblastoma to build differential networks for long-term survivors and short-term survivors in The Cancer Genome Atlas. The hub genes found by mRNA expression, DNA copy number, methylation and microRNA expression reveal several important roles in glioblastoma progression. Availability and implementation: R Package at: odin.mdacc.tmc.edu/∼vbaladan. Contact: veera@mdanderson.org Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26148744

  13. Genome Data Exploration Using Correspondence Analysis

    PubMed Central

    Tekaia, Fredj

    2016-01-01

    Recent developments of sequencing technologies that allow the production of massive amounts of genomic and genotyping data have highlighted the need for synthetic data representation and pattern recognition methods that can mine and help discovering biologically meaningful knowledge included in such large data sets. Correspondence analysis (CA) is an exploratory descriptive method designed to analyze two-way data tables, including some measure of association between rows and columns. It constructs linear combinations of variables, known as factors. CA has been used for decades to study high-dimensional data, and remarkable inferences from large data tables were obtained by reducing the dimensionality to a few orthogonal factors that correspond to the largest amount of variability in the data. Herein, I review CA and highlight its use by considering examples in handling high-dimensional data that can be constructed from genomic and genetic studies. Examples in amino acid compositions of large sets of species (viruses, phages, yeast, and fungi) as well as an example related to pairwise shared orthologs in a set of yeast and fungal species, as obtained from their proteome comparisons, are considered. For the first time, results show striking segregations between yeasts and fungi as well as between viruses and phages. Distributions obtained from shared orthologs show clusters of yeast and fungal species corresponding to their phylogenetic relationships. A direct comparison with the principal component analysis method is discussed using a recently published example of genotyping data related to newly discovered traces of an ancient hominid that was compared to modern human populations in the search for ancestral similarities. CA offers more detailed results highlighting links between modern humans and the ancient hominid and their characterizations. Compared to the popular principal component analysis method, CA allows easier and more effective interpretation of results

  14. Genome Data Exploration Using Correspondence Analysis.

    PubMed

    Tekaia, Fredj

    2016-01-01

    Recent developments of sequencing technologies that allow the production of massive amounts of genomic and genotyping data have highlighted the need for synthetic data representation and pattern recognition methods that can mine and help discovering biologically meaningful knowledge included in such large data sets. Correspondence analysis (CA) is an exploratory descriptive method designed to analyze two-way data tables, including some measure of association between rows and columns. It constructs linear combinations of variables, known as factors. CA has been used for decades to study high-dimensional data, and remarkable inferences from large data tables were obtained by reducing the dimensionality to a few orthogonal factors that correspond to the largest amount of variability in the data. Herein, I review CA and highlight its use by considering examples in handling high-dimensional data that can be constructed from genomic and genetic studies. Examples in amino acid compositions of large sets of species (viruses, phages, yeast, and fungi) as well as an example related to pairwise shared orthologs in a set of yeast and fungal species, as obtained from their proteome comparisons, are considered. For the first time, results show striking segregations between yeasts and fungi as well as between viruses and phages. Distributions obtained from shared orthologs show clusters of yeast and fungal species corresponding to their phylogenetic relationships. A direct comparison with the principal component analysis method is discussed using a recently published example of genotyping data related to newly discovered traces of an ancient hominid that was compared to modern human populations in the search for ancestral similarities. CA offers more detailed results highlighting links between modern humans and the ancient hominid and their characterizations. Compared to the popular principal component analysis method, CA allows easier and more effective interpretation of results

  15. High Resolution Typing by Whole Genome Mapping Enables Discrimination of LA-MRSA (CC398) Strains and Identification of Transmission Events.

    PubMed

    Bosch, Thijs; Verkade, Erwin; van Luit, Martijn; Pot, Bruno; Vauterin, Paul; Burggrave, Ronald; Savelkoul, Paul; Kluytmans, Jan; Schouls, Leo

    2013-01-01

    After its emergence in 2003, a livestock-associated (LA-)MRSA clade (CC398) has caused an impressive increase in the number of isolates submitted for the Dutch national MRSA surveillance and now comprises 40% of all isolates. The currently used molecular typing techniques have limited discriminatory power for this MRSA clade, which hampers studies on the origin and transmission routes. Recently, a new molecular analysis technique named whole genome mapping was introduced. This method creates high-resolution, ordered whole genome restriction maps that may have potential for strain typing. In this study, we assessed and validated the capability of whole genome mapping to differentiate LA-MRSA isolates. Multiple validation experiments showed that whole genome mapping produced highly reproducible results. Assessment of the technique on two well-documented MRSA outbreaks showed that whole genome mapping was able to confirm one outbreak, but revealed major differences between the maps of a second, indicating that not all isolates belonged to this outbreak. Whole genome mapping of LA-MRSA isolates that were epidemiologically unlinked provided a much higher discriminatory power than spa-typing or MLVA. In contrast, maps created from LA-MRSA isolates obtained during a proven LA-MRSA outbreak were nearly indistinguishable showing that transmission of LA-MRSA can be detected by whole genome mapping. Finally, whole genome maps of LA-MRSA isolates originating from two unrelated veterinarians and their household members showed that veterinarians may carry and transmit different LA-MRSA strains at the same time. No such conclusions could be drawn based spa-typing and MLVA. Although PFGE seems to be suitable for molecular typing of LA-MRSA, WGM provides a much higher discriminatory power. Furthermore, whole genome mapping can provide a comparison with other maps within 2 days after the bacterial culture is received, making it suitable to investigate transmission events and

  16. Whole-Genome Mapping as a Novel High-Resolution Typing Tool for Legionella pneumophila.

    PubMed

    Bosch, Thijs; Euser, Sjoerd M; Landman, Fabian; Bruin, Jacob P; IJzerman, Ed P; den Boer, Jeroen W; Schouls, Leo M

    2015-10-01

    Legionella is the causative agent for Legionnaires' disease (LD) and is responsible for several large outbreaks in the world. More than 90% of LD cases are caused by Legionella pneumophila, and studies on the origin and transmission routes of this pathogen rely on adequate molecular characterization of isolates. Current typing of L. pneumophila mainly depends on sequence-based typing (SBT). However, studies have shown that in some outbreak situations, SBT does not have sufficient discriminatory power to distinguish between related and nonrelated L. pneumophila isolates. In this study, we used a novel high-resolution typing technique, called whole-genome mapping (WGM), to differentiate between epidemiologically related and nonrelated L. pneumophila isolates. Assessment of the method by various validation experiments showed highly reproducible results, and WGM was able to confirm two well-documented Dutch L. pneumophila outbreaks. Comparison of whole-genome maps of the two outbreaks together with WGMs of epidemiologically nonrelated L. pneumophila isolates showed major differences between the maps, and WGM yielded a higher discriminatory power than SBT. In conclusion, WGM can be a valuable alternative to perform outbreak investigations of L. pneumophila in real time since the turnaround time from culture to comparison of the L. pneumophila maps is less than 24 h.

  17. Genome-Wide Organization of GATA1 and TAL1 Determined at High Resolution

    PubMed Central

    Han, G. Celine; Vinayachandran, Vinesh; Bataille, Alain R.; Park, Bongsoo; Chan-Salis, Ka Yim; Keller, Cheryl A.; Long, Maria; Mahony, Shaun; Hardison, Ross C.

    2015-01-01

    Erythroid development and differentiation from multiprogenitor cells into red blood cells requires precise transcriptional regulation. Key erythroid transcription factors, GATA1 and TAL1, cooperate, along with other proteins, to regulate many aspects of this process. How GATA1 and TAL1 are juxtaposed along the DNA and their cognate DNA binding site across the mouse genome remains unclear. We applied high-resolution ChIP-exo (chromatin immunoprecipitation followed by 5′-to-3′ exonuclease treatment and then massively parallel DNA sequencing) to GATA1 and TAL1 to study their positional organization across the mouse genome during GATA1-dependent maturation. Two complementary methods, MultiGPS and peak pairing, were used to determine high-confidence binding locations by ChIP-exo. We identified ∼10,000 GATA1 and ∼15,000 TAL1 locations, which were essentially confirmed by ChIP-seq (chromatin immunoprecipitation followed by massively parallel DNA sequencing). Of these, ∼4,000 locations were bound by both GATA1 and TAL1. About three-quarters of them were tightly linked to a partial E-box located 7 or 8 bp upstream of a WGATAA motif. Both TAL1 and GATA1 generated distinct characteristic ChIP-exo peaks around WGATAA motifs that reflect their positional arrangement within a complex. We show that TAL1 and GATA1 form a precisely organized complex at a compound motif consisting of a TG 7 or 8 bp upstream of a WGATAA motif across thousands of genomic locations. PMID:26503782

  18. A high-resolution radiation hybrid map of rhesus macaque chromosome 5 identifies rearrangements in the genome assembly

    PubMed Central

    Karere, Genesio M.; Froenicke, Lutz; Millon, Lee; Womack, James E.; Lyons, Leslie A.

    2008-01-01

    A 10,000-rad radiation hybrid cell panel of the rhesus macaque was generated to construct a comprehensive RH map of chromosome 5. The map represents 218 markers typed in 185 RH clones. The 4,846 cR length map has an average marker spacing of 798 kb. Alignments of the RH map to macaque and human genome sequences confirm a large inversion and reveal a previously unreported telomeric inversion. The macaque genome sequence indicates small translocations from the ancestral homolog of macaque chromosome 5 to macaque chromosome 1 and 6. The RH map suggests that these are likely assembly artifacts. Unlike the genome sequence, the RH mapping data indicate the conservation of synteny between macaque chromosome 5 and human chromosome 4. This study shows that the 10,000-rad panel is appropriate for the generation of a high-resolution whole genome RH map suitable for the verification of the rhesus genome assembly. PMID:18601997

  19. Asymmetric Genome Organization in an RNA Virus Revealed via Graph-Theoretical Analysis of Tomographic Data

    PubMed Central

    Geraets, James A.; Dykeman, Eric C.; Stockley, Peter G.; Ranson, Neil A.; Twarock, Reidun

    2015-01-01

    Cryo-electron microscopy permits 3-D structures of viral pathogens to be determined in remarkable detail. In particular, the protein containers encapsulating viral genomes have been determined to high resolution using symmetry averaging techniques that exploit the icosahedral architecture seen in many viruses. By contrast, structure determination of asymmetric components remains a challenge, and novel analysis methods are required to reveal such features and characterize their functional roles during infection. Motivated by the important, cooperative roles of viral genomes in the assembly of single-stranded RNA viruses, we have developed a new analysis method that reveals the asymmetric structural organization of viral genomes in proximity to the capsid in such viruses. The method uses geometric constraints on genome organization, formulated based on knowledge of icosahedrally-averaged reconstructions and the roles of the RNA-capsid protein contacts, to analyse cryo-electron tomographic data. We apply this method to the low-resolution tomographic data of a model virus and infer the unique asymmetric organization of its genome in contact with the protein shell of the capsid. This opens unprecedented opportunities to analyse viral genomes, revealing conserved structural features and mechanisms that can be targeted in antiviral drug design. PMID:25793998

  20. A High-Resolution Map of Synteny Disruptions in Gibbon and Human Genomes

    PubMed Central

    Hallers, Boudewijn F.H. ten; Zhu, Baoli; Osoegawa, Kazutoyo; Mootnick, Alan; Kofler, Andrea; Wienberg, Johannes; Rogers, Jane; Humphray, Sean; Scott, Carol; Harris, R. Alan; Milosavljevic, Aleksandar; de Jong, Pieter J

    2006-01-01

    Gibbons are part of the same superfamily (Hominoidea) as humans and great apes, but their karyotype has diverged faster from the common hominoid ancestor. At least 24 major chromosome rearrangements are required to convert the presumed ancestral karyotype of gibbons into that of the hominoid ancestor. Up to 28 additional rearrangements distinguish the various living species from the common gibbon ancestor. Using the northern white-cheeked gibbon (2n = 52) (Nomascus leucogenys leucogenys) as a model, we created a high-resolution map of the homologous regions between the gibbon and human. The positions of 100 synteny breakpoints relative to the assembled human genome were determined at a resolution of about 200 kb. Interestingly, 46% of the gibbon–human synteny breakpoints occur in regions that correspond to segmental duplications in the human lineage, indicating a common source of plasticity leading to a different outcome in the two species. Additionally, the full sequences of 11 gibbon BACs spanning evolutionary breakpoints reveal either segmental duplications or interspersed repeats at the exact breakpoint locations. No specific sequence element appears to be common among independent rearrangements. We speculate that the extraordinarily high level of rearrangements seen in gibbons may be due to factors that increase the incidence of chromosome breakage or fixation of the derivative chromosomes in a homozygous state. PMID:17196042

  1. Genomic resolution of an aggressive, widespread, diverse and expanding meningococcal serogroup B, C and W lineage

    PubMed Central

    Lucidarme, Jay; Hill, Dorothea M.C.; Bratcher, Holly B.; Gray, Steve J.; du Plessis, Mignon; Tsang, Raymond S.W.; Vazquez, Julio A.; Taha, Muhamed-Kheir; Ceyhan, Mehmet; Efron, Adriana M.; Gorla, Maria C.; Findlow, Jamie; Jolley, Keith A.; Maiden, Martin C.J.; Borrow, Ray

    2015-01-01

    Summary Objectives Neisseria meningitidis is a leading cause of meningitis and septicaemia. The hyperinvasive ST-11 clonal complex (cc11) caused serogroup C (MenC) outbreaks in the US military in the 1960s and UK universities in the 1990s, a global Hajj-associated serogroup W (MenW) outbreak in 2000–2001, and subsequent MenW epidemics in sub-Saharan Africa. More recently, endemic MenW disease has expanded in South Africa, South America and the UK, and MenC cases have been reported among European and North American men who have sex with men (MSM). Routine typing schemes poorly resolve cc11 so we established the population structure at genomic resolution. Methods Representatives of these episodes and other geo-temporally diverse cc11 meningococci (n = 750) were compared across 1546 core genes and visualised on phylogenetic networks. Results MenW isolates were confined to a distal portion of one of two main lineages with MenB and MenC isolates interspersed elsewhere. An expanding South American/UK MenW strain was distinct from the ‘Hajj outbreak’ strain and a closely related endemic South African strain. Recent MenC isolates from MSM in France and the UK were closely related but distinct. Conclusions High resolution ‘genomic’ multilocus sequence typing is necessary to resolve and monitor the spread of diverse cc11 lineages globally. PMID:26226598

  2. A high-resolution map of synteny disruptions in gibbon and human genomes.

    PubMed

    Carbone, Lucia; Vessere, Gery M; ten Hallers, Boudewijn F H; Zhu, Baoli; Osoegawa, Kazutoyo; Mootnick, Alan; Kofler, Andrea; Wienberg, Johannes; Rogers, Jane; Humphray, Sean; Scott, Carol; Harris, R Alan; Milosavljevic, Aleksandar; de Jong, Pieter J

    2006-12-29

    Gibbons are part of the same superfamily (Hominoidea) as humans and great apes, but their karyotype has diverged faster from the common hominoid ancestor. At least 24 major chromosome rearrangements are required to convert the presumed ancestral karyotype of gibbons into that of the hominoid ancestor. Up to 28 additional rearrangements distinguish the various living species from the common gibbon ancestor. Using the northern white-cheeked gibbon (2n = 52) (Nomascus leucogenys leucogenys) as a model, we created a high-resolution map of the homologous regions between the gibbon and human. The positions of 100 synteny breakpoints relative to the assembled human genome were determined at a resolution of about 200 kb. Interestingly, 46% of the gibbon-human synteny breakpoints occur in regions that correspond to segmental duplications in the human lineage, indicating a common source of plasticity leading to a different outcome in the two species. Additionally, the full sequences of 11 gibbon BACs spanning evolutionary breakpoints reveal either segmental duplications or interspersed repeats at the exact breakpoint locations. No specific sequence element appears to be common among independent rearrangements. We speculate that the extraordinarily high level of rearrangements seen in gibbons may be due to factors that increase the incidence of chromosome breakage or fixation of the derivative chromosomes in a homozygous state.

  3. Enhancing cancer clonality analysis with integrative genomics

    PubMed Central

    2015-01-01

    Introduction It is understood that cancer is a clonal disease initiated by a single cell, and that metastasis, which is the spread of cancer from the primary site, is also initiated by a single cell. The seemingly natural capability of cancer to adapt dynamically in a Darwinian manner is a primary reason for therapeutic failures. Survival advantages may be induced by cancer therapies and also occur as a result of inherent cell and microenvironmental factors. The selected "more fit" clones outmatch their competition and then become dominant in the tumor via propagation of progeny. This clonal expansion leads to relapse, therapeutic resistance and eventually death. The goal of this study is to develop and demonstrate a more detailed clonality approach by utilizing integrative genomics. Methods Patient tumor samples were profiled by Whole Exome Sequencing (WES) and RNA-seq on an Illumina HiSeq 2500 and methylation profiling was performed on the Illumina Infinium 450K array. STAR and the Haplotype Caller were used for RNA-seq processing. Custom approaches were used for the integration of the multi-omic datasets. Results Reported are major enhancements to CloneViz, which now provides capabilities enabling a formal tumor multi-dimensional clonality analysis by integrating: i) DNA mutations, ii) RNA expressed mutations, and iii) DNA methylation data. RNA and DNA methylation integration were not previously possible, by CloneViz (previous version) or any other clonality method to date. This new approach, named iCloneViz (integrated CloneViz) employs visualization and quantitative methods, revealing an integrative genomic mutational dissection and traceability (DNA, RNA, epigenetics) thru the different layers of molecular structures. Conclusion The iCloneViz approach can be used for analysis of clonal evolution and mutational dynamics of multi-omic data sets. Revealing tumor clonal complexity in an integrative and quantitative manner facilitates improved mutational

  4. Comparative genomic analysis of teleost fish bmal genes.

    PubMed

    Wang, Han

    2009-05-01

    Bmal1 (Brain and muscle ARNT like 1) gene is a key circadian clock gene. Tetrapods also have the second Bmal gene, Bmal2. Fruit fly has only one bmal1/cycle gene. Interrogation of the five teleost fish genome sequences coupled with phylogenetic and splice site analyses found that zebrafish have two bmal1 genes, bmal1a and bmal1b, and bmal2a; Japanese pufferfish (fugu), green spotted pufferfish (tetraodon) and Japanese medaka fish each have two bmal2 genes, bmal2a and bmal2b, and bmal1a; and three-spine stickleback have bmal1a and bmal2b. Syntenic analysis further indicated that zebrafish bmal1a/bmal1b, and fugu, tetraodon and medaka bmal2a/bmal2b are ancient duplicates. Although the dN/dS ratios of these four fish bmal duplicates are all <1, implicating they have been under purifying selection, the Tajima relative rate test showed that fugu, tetraodon and medaka bmal2a/bmal2b have asymmetric evolutionary rates, suggesting that one of these duplicates have been subject to positive selection or relaxed functional constraint. These results support the notion that teleost fish bmal genes were derived from the fish-specific genome duplication (FSGD), divergent resolution following the duplication led to retaining different ancient bmal duplicates in different fishes, which could have shaped the evolution of the complex teleost fish timekeeping mechanisms.

  5. The Complete Mitochondrial Genome of Gossypium hirsutum and Evolutionary Analysis of Higher Plant Mitochondrial Genomes

    PubMed Central

    Su, Aiguo; Geng, Jianing; Grover, Corrinne E.; Hu, Songnian; Hua, Jinping

    2013-01-01

    Background Mitochondria are the main manufacturers of cellular ATP in eukaryotes. The plant mitochondrial genome contains large number of foreign DNA and repeated sequences undergone frequently intramolecular recombination. Upland Cotton (Gossypium hirsutum L.) is one of the main natural fiber crops and also an important oil-producing plant in the world. Sequencing of the cotton mitochondrial (mt) genome could be helpful for the evolution research of plant mt genomes. Methodology/Principal Findings We utilized 454 technology for sequencing and combined with Fosmid library of the Gossypium hirsutum mt genome screening and positive clones sequencing and conducted a series of evolutionary analysis on Cycas taitungensis and 24 angiosperms mt genomes. After data assembling and contigs joining, the complete mitochondrial genome sequence of G. hirsutum was obtained. The completed G.hirsutum mt genome is 621,884 bp in length, and contained 68 genes, including 35 protein genes, four rRNA genes and 29 tRNA genes. Five gene clusters are found conserved in all plant mt genomes; one and four clusters are specifically conserved in monocots and dicots, respectively. Homologous sequences are distributed along the plant mt genomes and species closely related share the most homologous sequences. For species that have both mt and chloroplast genome sequences available, we checked the location of cp-like migration and found several fragments closely linked with mitochondrial genes. Conclusion The G. hirsutum mt genome possesses most of the common characters of higher plant mt genomes. The existence of syntenic gene clusters, as well as the conservation of some intergenic sequences and genic content among the plant mt genomes suggest that evolution of mt genomes is consistent with plant taxonomy but independent among different species. PMID:23940520

  6. Analysis of DOA estimation spatial resolution using MUSIC algorithm

    NASA Astrophysics Data System (ADS)

    Guo, Yue; Wang, Hongyuan; Luo, Bin

    2005-11-01

    This paper presents a performance analysis of the spatial resolution of the direction of arrival (DOA) estimates attained by the multiple signal classification (MUSIC) algorithm for uncorrelated sources. The confidence interval of estimation angle which is much more intuitionistic will be considered as the new evaluation standard for the spatial resolution. Then, based on the statistic method, the qualitative analysis reveals the factors influencing the performance of the MUSIC algorithm. At last, quantitative simulations prove the theoretical analysis result exactly.

  7. Computational Methods for the Analysis of Array Comparative Genomic Hybridization

    PubMed Central

    Chari, Raj; Lockwood, William W.; Lam, Wan L.

    2006-01-01

    Array comparative genomic hybridization (array CGH) is a technique for assaying the copy number status of cancer genomes. The widespread use of this technology has lead to a rapid accumulation of high throughput data, which in turn has prompted the development of computational strategies for the analysis of array CGH data. Here we explain the principles behind array image processing, data visualization and genomic profile analysis, review currently available software packages, and raise considerations for future software development. PMID:17992253

  8. Millstone: software for multiplex microbial genome analysis and engineering.

    PubMed

    Goodman, Daniel B; Kuznetsov, Gleb; Lajoie, Marc J; Ahern, Brian W; Napolitano, Michael G; Chen, Kevin Y; Chen, Changping; Church, George M

    2017-05-25

    Inexpensive DNA sequencing and advances in genome editing have made computational analysis a major rate-limiting step in adaptive laboratory evolution and microbial genome engineering. We describe Millstone, a web-based platform that automates genotype comparison and visualization for projects with up to hundreds of genomic samples. To enable iterative genome engineering, Millstone allows users to design oligonucleotide libraries and create successive versions of reference genomes. Millstone is open source and easily deployable to a cloud platform, local cluster, or desktop, making it a scalable solution for any lab.

  9. Millstone: software for multiplex microbial genome analysis and engineering

    DOE PAGES

    Goodman, Daniel B.; Kuznetsov, Gleb; Lajoie, Marc J.; ...

    2017-05-25

    Inexpensive DNA sequencing and advances in genome editing have made computational analysis a major rate-limiting step in adaptive laboratory evolution and microbial genome engineering. Here, we describe Millstone, a web-based platform that automates genotype comparison and visualization for projects with up to hundreds of genomic samples. To enable iterative genome engineering, Millstone allows users to design oligonucleotide libraries and create successive versions of reference genomes. Millstone is open source and easily deployable to a cloud platform, local cluster, or desktop, making it a scalable solution for any lab.

  10. Genome-wide analysis of ruminant Staphylococcus aureus reveals diversification of the core genome.

    PubMed

    Ben Zakour, Nouri L; Sturdevant, Daniel E; Even, Sergine; Guinane, Caitriona M; Barbey, Corinne; Alves, Priscila D; Cochet, Marie-Françoise; Gautier, Michel; Otto, Michael; Fitzgerald, J Ross; Le Loir, Yves

    2008-10-01

    Staphylococcus aureus causes disease in humans and a wide array of animals. Of note, S. aureus mastitis of ruminants, including cows, sheep, and goats, results in major economic losses worldwide. Extensive variation in genome content exists among S. aureus pathogenic clones. However, the genomic variation among S. aureus strains infecting different animal species has not been well examined. To investigate variation in the genome content of human and ruminant S. aureus, we carried out whole-genome PCR scanning (WGPS), comparative genomic hybridizations (CGH), and the directed DNA sequence analysis of strains of human, bovine, ovine, and caprine origin. Extensive variation in genome content was discovered, including host- and ruminant-specific genetic loci. Ovine and caprine strains were genetically allied, whereas bovine strains were heterogeneous in gene content. As expected, mobile genetic elements such as pathogenicity islands and bacteriophages contributed to the variation in genome content between strains. However, differences specific for ruminant strains were restricted to regions of the conserved core genome, which contained allelic variation in genes encoding proteins of known and unknown function. Many of these proteins are predicted to be exported and could play a role in host-pathogen interactions. The genomic regions of difference identified by the whole-genome approaches adopted in the current study represent excellent targets for studies of the molecular basis of S. aureus host adaptation.

  11. High-resolution genome-wide array-based comparative genome hybridization reveals cryptic chromosome changes in AML and MDS cases with trisomy 8 as the sole cytogenetic aberration.

    PubMed

    Paulsson, K; Heidenblad, M; Strömbeck, B; Staaf, J; Jönsson, G; Borg, A; Fioretos, T; Johansson, B

    2006-05-01

    Although trisomy 8 as the sole chromosome aberration is the most common numerical abnormality in acute myeloid leukemia (AML) and myelodysplastic syndromes (MDS), little is known about its pathogenetic effects. Considering that +8 is a frequent secondary change in AML/MDS, cryptic--possibly primary--genetic aberrations may occur in cases with trisomy 8 as the apparently single anomaly. However, no such hidden anomalies have been reported. We performed a high-resolution genome-wide array-based comparative genome hybridization (array CGH) analysis of 10 AML/MDS cases with isolated +8, utilizing a 32K bacterial artificial chromosome array set, providing >98% coverage of the genome with a resolution of 100 kb. Array CGH revealed intrachromosomal imbalances, not corresponding to known genomic copy number polymorphisms, in 4/10 cases, comprising nine duplications and hemizygous deletions ranging in size from 0.5 to 2.2 Mb. A 1.8 Mb deletion at 7p14.1, which had occurred prior to the +8, was identified in MDS transforming to AML. Furthermore, a deletion including ETV6 was present in one case. The remaining seven imbalances involved more than 40 genes. The present results show that cryptic genetic abnormalities are frequent in trisomy 8-positive AML/MDS cases and that +8 as the sole cytogenetic aberration is not always the primary genetic event.

  12. Bacillus anthracis comparative genome analysis in support of the Amerithrax investigation

    PubMed Central

    Rasko, David A.; Worsham, Patricia L.; Abshire, Terry G.; Stanley, Scott T.; Bannan, Jason D.; Wilson, Mark R.; Langham, Richard J.; Decker, R. Scott; Jiang, Lingxia; Read, Timothy D.; Phillippy, Adam M.; Salzberg, Steven L.; Pop, Mihai; Van Ert, Matthew N.; Kenefic, Leo J.; Keim, Paul S.; Fraser-Liggett, Claire M.; Ravel, Jacques

    2011-01-01

    Before the anthrax letter attacks of 2001, the developing field of microbial forensics relied on microbial genotyping schemes based on a small portion of a genome sequence. Amerithrax, the investigation into the anthrax letter attacks, applied high-resolution whole-genome sequencing and comparative genomics to identify key genetic features of the letters’ Bacillus anthracis Ames strain. During systematic microbiological analysis of the spore material from the letters, we identified a number of morphological variants based on phenotypic characteristics and the ability to sporulate. The genomes of these morphological variants were sequenced and compared with that of the B. anthracis Ames ancestor, the progenitor of all B. anthracis Ames strains. Through comparative genomics, we identified four distinct loci with verifiable genetic mutations. Three of the four mutations could be directly linked to sporulation pathways in B. anthracis and more specifically to the regulation of the phosphorylation state of Spo0F, a key regulatory protein in the initiation of the sporulation cascade, thus linking phenotype to genotype. None of these variant genotypes were identified in single-colony environmental B. anthracis Ames isolates associated with the investigation. These genotypes were identified only in B. anthracis morphotypes isolated from the letters, indicating that the variants were not prevalent in the environment, not even the environments associated with the investigation. This study demonstrates the forensic value of systematic microbiological analysis combined with whole-genome sequencing and comparative genomics. PMID:21383169

  13. Topological Data Analysis of High-Resolution Temporal Rainfall

    NASA Astrophysics Data System (ADS)

    Carsteanu, Alin Andrei; Fernández Méndez, Félix; Vásquez Aguilar, Raciel

    2017-04-01

    This study applies topological data analysis (TDA) to the state space representations of high-resolution temporal rainfall intensity data from Iowa City (IIHR, U of Iowa). Using a sufficient embedding dimension, topological properties of the underlying manifold are depicted.

  14. Comparative optical genome analysis of two pangolin species: Manis pentadactyla and Manis javanica.

    PubMed

    Zhihai, Huang; Jiang, Xu; Shuiming, Xiao; Baosheng, Liao; Yuan, Gao; Chaochao, Zhai; Xiaohui, Qiu; Wen, Xu; Shilin, Chen

    2016-12-01

    The pangolin is a Pholidota mammal with large keratin scales protecting its skin. Two pangolin species ( Manis pentadactyla and Manis javanica ) have been recorded as critically endangered on the International Union for Conservation of Nature Red List of Threatened Species. Optical mapping constructs high-resolution restriction maps from single DNA molecules for genome analysis at the megabase scale and to assist genome assembly. Here, we constructed restriction maps of M. pentadactyla and M. javanica using optical mapping to assist with genome assembly and analysis of these species. Genomic DNA was nicked with Nt.BspQI and then labeled using fluorescently labeled bases that were detected by the Irys optical mapping system. In total, 3,313,734 DNA molecules (517.847 Gb) for M. pentadactyla and 3,439,885 DNA molecules (504.743 Gb) for M. javanica were obtained, which corresponded to approximately 178X and 177X genome coverage, respectively. Qualified molecules (≥150 kb with a label density of >6 sites per 100 kb) were analyzed using the de novo assembly program embedded in the IrysView pipeline. We obtained two maps that were 2.91 Gb and 2.85 Gb in size with N50s of 1.88 Mb and 1.97 Mb, respectively. Optical mapping reveals large-scale structural information that is especially important for non-model genomes that lack a good reference. The approach has the potential to guide de novo assembly of genomes sequenced using next-generation sequencing. Our data provide a resource for Manidae genome analysis and references for de novo assembly. This note also provides new insights into Manidae evolutionary analysis at the genome structure level.

  15. A high-resolution map of the Nile tilapia genome: a resource for studying cichlids and other percomorphs

    PubMed Central

    2012-01-01

    Background The Nile tilapia (Oreochromis niloticus) is the second most farmed fish species worldwide. It is also an important model for studies of fish physiology, particularly because of its broad tolerance to an array of environments. It is a good model to study evolutionary mechanisms in vertebrates, because of its close relationship to haplochromine cichlids, which have undergone rapid speciation in East Africa. The existing genomic resources for Nile tilapia include a genetic map, BAC end sequences and ESTs, but comparative genome analysis and maps of quantitative trait loci (QTL) are still limited. Results We have constructed a high-resolution radiation hybrid (RH) panel for the Nile tilapia and genotyped 1358 markers consisting of 850 genes, 82 markers corresponding to BAC end sequences, 154 microsatellites and 272 single nucleotide polymorphisms (SNPs). From these, 1296 markers could be associated in 81 RH groups, while 62 were not linked. The total size of the RH map is 34,084 cR3500 and 937,310 kb. It covers 88% of the entire genome with an estimated inter-marker distance of 742 Kb. Mapping of microsatellites enabled integration to the genetic map. We have merged LG8 and LG24 into a single linkage group, and confirmed that LG16-LG21 are also merged. The orientation and association of RH groups to each chromosome and LG was confirmed by chromosomal in situ hybridizations (FISH) of 55 BACs. Fifty RH groups were localized on the 22 chromosomes while 31 remained small orphan groups. Synteny relationships were determined between Nile tilapia, stickleback, medaka and pufferfish. Conclusion The RH map and associated FISH map provide a valuable gene-ordered resource for gene mapping and QTL studies. All genetic linkage groups with their corresponding RH groups now have a corresponding chromosome which can be identified in the karyotype. Placement of conserved segments indicated that multiple inter-chromosomal rearrangements have occurred between Nile tilapia

  16. Integrative genomic analysis by interoperation of bioinformatics tools in GenomeSpace

    PubMed Central

    Thorvaldsdottir, Helga; Liefeld, Ted; Ocana, Marco; Borges-Rivera, Diego; Pochet, Nathalie; Robinson, James T.; Demchak, Barry; Hull, Tim; Ben-Artzi, Gil; Blankenberg, Daniel; Barber, Galt P.; Lee, Brian T.; Kuhn, Robert M.; Nekrutenko, Anton; Segal, Eran; Ideker, Trey; Reich, Michael; Regev, Aviv; Chang, Howard Y.; Mesirov, Jill P.

    2015-01-01

    Integrative analysis of multiple data types to address complex biomedical questions requires the use of multiple software tools in concert and remains an enormous challenge for most of the biomedical research community. Here we introduce GenomeSpace (http://www.genomespace.org), a cloud-based, cooperative community resource. Seeded as a collaboration of six of the most popular genomics analysis tools, GenomeSpace now supports the streamlined interaction of 20 bioinformatics tools and data resources. To facilitate the ability of non-programming users’ to leverage GenomeSpace in integrative analysis, it offers a growing set of ‘recipes’, short workflows involving a few tools and steps to guide investigators through high utility analysis tasks. PMID:26780094

  17. The human genome: a multifractal analysis

    PubMed Central

    2011-01-01

    Background Several studies have shown that genomes can be studied via a multifractal formalism. Recently, we used a multifractal approach to study the genetic information content of the Caenorhabditis elegans genome. Here we investigate the possibility that the human genome shows a similar behavior to that observed in the nematode. Results We report here multifractality in the human genome sequence. This behavior correlates strongly on the presence of Alu elements and to a lesser extent on CpG islands and (G+C) content. In contrast, no or low relationship was found for LINE, MIR, MER, LTRs elements and DNA regions poor in genetic information. Gene function, cluster of orthologous genes, metabolic pathways, and exons tended to increase their frequencies with ranges of multifractality and large gene families were located in genomic regions with varied multifractality. Additionally, a multifractal map and classification for human chromosomes are proposed. Conclusions Based on these findings, we propose a descriptive non-linear model for the structure of the human genome, with some biological implications. This model reveals 1) a multifractal regionalization where many regions coexist that are far from equilibrium and 2) this non-linear organization has significant molecular and medical genetic implications for understanding the role of Alu elements in genome stability and structure of the human genome. Given the role of Alu sequences in gene regulation, genetic diseases, human genetic diversity, adaptation and phylogenetic analyses, these quantifications are especially useful. PMID:21999602

  18. GENOME ANALYSIS OF BURKHOLDERIA CEPACIA AC1100

    EPA Science Inventory

    Burkholderia cepacia is an important organism in bioremediation of environmental pollutants and it is also of increasing interest as a human pathogen. The genomic organization of B. cepacia is being studied in order to better understand its unusual adaptive capacity and genome pl...

  19. The human genome: a multifractal analysis.

    PubMed

    Moreno, Pedro A; Vélez, Patricia E; Martínez, Ember; Garreta, Luis E; Díaz, Néstor; Amador, Siler; Tischer, Irene; Gutiérrez, José M; Naik, Ashwinikumar K; Tobar, Fabián; García, Felipe

    2011-10-14

    Several studies have shown that genomes can be studied via a multifractal formalism. Recently, we used a multifractal approach to study the genetic information content of the Caenorhabditis elegans genome. Here we investigate the possibility that the human genome shows a similar behavior to that observed in the nematode. We report here multifractality in the human genome sequence. This behavior correlates strongly on the presence of Alu elements and to a lesser extent on CpG islands and (G+C) content. In contrast, no or low relationship was found for LINE, MIR, MER, LTRs elements and DNA regions poor in genetic information. Gene function, cluster of orthologous genes, metabolic pathways, and exons tended to increase their frequencies with ranges of multifractality and large gene families were located in genomic regions with varied multifractality. Additionally, a multifractal map and classification for human chromosomes are proposed. Based on these findings, we propose a descriptive non-linear model for the structure of the human genome, with some biological implications. This model reveals 1) a multifractal regionalization where many regions coexist that are far from equilibrium and 2) this non-linear organization has significant molecular and medical genetic implications for understanding the role of Alu elements in genome stability and structure of the human genome. Given the role of Alu sequences in gene regulation, genetic diseases, human genetic diversity, adaptation and phylogenetic analyses, these quantifications are especially useful.

  20. GENOME ANALYSIS OF BURKHOLDERIA CEPACIA AC1100

    EPA Science Inventory

    Burkholderia cepacia is an important organism in bioremediation of environmental pollutants and it is also of increasing interest as a human pathogen. The genomic organization of B. cepacia is being studied in order to better understand its unusual adaptive capacity and genome pl...

  1. A rapid whole genome sequencing and analysis system supporting genomic epidemiology (7th Annual SFAF Meeting, 2012)

    SciTech Connect

    FitzGerald, Michael

    2012-06-01

    Michael FitzGerald on "A rapid whole genome sequencing and analysis system supporting genomic epidemiology" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

  2. A rapid whole genome sequencing and analysis system supporting genomic epidemiology (7th Annual SFAF Meeting, 2012)

    ScienceCinema

    FitzGerald, Michael [Broad Institute

    2016-07-12

    Michael FitzGerald on "A rapid whole genome sequencing and analysis system supporting genomic epidemiology" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

  3. Web-based visual analysis for high-throughput genomics.

    PubMed

    Goecks, Jeremy; Eberhard, Carl; Too, Tomithy; Nekrutenko, Anton; Taylor, James

    2013-06-13

    Visualization plays an essential role in genomics research by making it possible to observe correlations and trends in large datasets as well as communicate findings to others. Visual analysis, which combines visualization with analysis tools to enable seamless use of both approaches for scientific investigation, offers a powerful method for performing complex genomic analyses. However, there are numerous challenges that arise when creating rich, interactive Web-based visualizations/visual analysis applications for high-throughput genomics. These challenges include managing data flow from Web server to Web browser, integrating analysis tools and visualizations, and sharing visualizations with colleagues. We have created a platform simplifies the creation of Web-based visualization/visual analysis applications for high-throughput genomics. This platform provides components that make it simple to efficiently query very large datasets, draw common representations of genomic data, integrate with analysis tools, and share or publish fully interactive visualizations. Using this platform, we have created a Circos-style genome-wide viewer, a generic scatter plot for correlation analysis, an interactive phylogenetic tree, a scalable genome browser for next-generation sequencing data, and an application for systematically exploring tool parameter spaces to find good parameter values. All visualizations are interactive and fully customizable. The platform is integrated with the Galaxy (http://galaxyproject.org) genomics workbench, making it easy to integrate new visual applications into Galaxy. Visualization and visual analysis play an important role in high-throughput genomics experiments, and approaches are needed to make it easier to create applications for these activities. Our framework provides a foundation for creating Web-based visualizations and integrating them into Galaxy. Finally, the visualizations we have created using the framework are useful tools for high

  4. Detection of Indel Mutations in Drosophila by High-Resolution Melt Analysis (HRMA).

    PubMed

    Housden, Benjamin E; Perrimon, Norbert

    2016-09-01

    Although CRISPR technology allows specific genome alterations to be created with relative ease, detection of these events can be problematic. For example, CRISPR-induced double-strand breaks are often repaired imprecisely to generate unpredictable short indel mutations. Detection of these events requires the use of molecular screening techniques such as endonuclease assays, restriction profiling, or high-resolution melt analysis (HRMA). Here, we provide detailed protocols for HRMA-based mutation screening in Drosophila and analysis of the resulting data using the online tool HRMAnalyzer.

  5. Dissecting direct reprogramming through integrative genomic analysis

    PubMed Central

    Mikkelsen, Tarjei S.; Hanna, Jacob; Zhang, Xiaolan; Ku, Manching; Wernig, Marius; Schorderet, Patrick; Bernstein, Bradley E.; Jaenisch, Rudolf; Lander, Eric S.; Meissner, Alexander

    2009-01-01

    Somatic cells can be reprogrammed to a pluripotent state through the ectopic expression of defined transcription factors. Understanding the mechanism and kinetics of this transformation may shed light on the nature of developmental potency and suggest strategies with improved efficiency or safety. Here we report an integrative genomic analysis of reprogramming of mouse fibroblasts and B lymphocytes. Lineage-committed cells show a complex response to the ectopic expression involving induction of genes downstream of individual reprogramming factors. Fully reprogrammed cells show gene expression and epigenetic states that are highly similar to embryonic stem cells. In contrast, stable partially reprogrammed cell lines show reactivation of a distinctive subset of stem-cell-related genes, incomplete repression of lineage-specifying transcription factors, and DNA hypermethylation at pluripotency-related loci. These observations suggest that some cells may become trapped in partially reprogrammed states owing to incomplete repression of transcription factors, and that DNA de-methylation is an inefficient step in the transition to pluripotency. We demonstrate that RNA inhibition of transcription factors can facilitate reprogramming, and that treatment with DNA methyltransferase inhibitors can improve the overall efficiency of the reprogramming process. PMID:18509334

  6. Genome-wide interrogation advances resolution of recalcitrant groups in the tree of life.

    PubMed

    Arcila, Dahiana; Ortí, Guillermo; Vari, Richard; Armbruster, Jonathan W; Stiassny, Melanie L J; Ko, Kyung D; Sabaj, Mark H; Lundberg, John; Revell, Liam J; Betancur-R, Ricardo

    2017-01-13

    Much progress has been achieved in disentangling evolutionary relationships among species in the tree of life, but some taxonomic groups remain difficult to resolve despite increasing availability of genome-scale data sets. Here we present a practical approach to studying ancient divergences in the face of high levels of conflict, based on explicit gene genealogy interrogation (GGI). We show its efficacy in resolving the controversial relationships within the largest freshwater fish radiation (Otophysi) based on newly generated DNA sequences for 1,051 loci from 225 species. Initial results using a suite of standard methodologies revealed conflicting phylogenetic signal, which supports ten alternative evolutionary histories among early otophysan lineages. By contrast, GGI revealed that the vast majority of gene genealogies supports a single tree topology grounded on morphology that was not obtained by previous molecular studies. We also reanalysed published data sets for exemplary groups with recalcitrant resolution to assess the power of this approach. GGI supports the notion that ctenophores are the earliest-branching animal lineage, and adds insight into relationships within clades of yeasts, birds and mammals. GGI opens up a promising avenue to account for incompatible signals in large data sets and to discern between estimation error and actual biological conflict explaining gene tree discordance.

  7. Phylogenetic Analysis of Genome Rearrangements among Five Mammalian Orders

    PubMed Central

    Luo, Haiwei; Arndt, William; Zhang, Yiwei; Shi, Guanqun; Alekseyev, Max; Tang, Jijun; Hughes, Austin L.; Friedman, Robert

    2015-01-01

    Evolutionary relationships among placental mammalian orders have been controversial. Whole genome sequencing and new computational methods offer opportunities to resolve the relationships among 10 genomes belonging to the mammalian orders Primates, Rodentia, Carnivora, Perissodactyla and Artiodactyla. By application of the double cut and join distance metric, where gene order is the phylogenetic character, we computed genomic distances among the sampled mammalian genomes. With a marsupial outgroup, the gene order tree supported a topology in which Rodentia fell outside the cluster of Primates, Carnivora, Perissodactyla, and Artiodactyla. Results of breakpoint reuse rate and synteny block length analyses were consistent with the prediction of random breakage model, which provided a diagnostic test to support use of gene order as an appropriate phylogenetic character in this study. We the influence of rate differences among lineages and other factors that may contribute to different resolutions of mammalian ordinal relationships by different methods of phylogenetic reconstruction. PMID:22929217

  8. Analysis of inserts in prokaryote genomes

    NASA Astrophysics Data System (ADS)

    Cristea, Paul Dan; Tuduce, Rodica Aurora

    2008-02-01

    Nucleotide genomic signals satisfy regularities that reveal restrictions in the distribution of nucleotides and pairs of nucleotides along DNA sequences. Structurally, a chromosome appears to be more than a plain text, by satisfying symmetry constrains that evoke the rhythm and rhyme in poems. These regularities make it easy to identify exogenous inserts in the genomes of prokaryotes, because such inserts obey different regularities than the background sequence. The paper presents instances of inserts found in the genomes of Bacillus subtilis, Mycobacterium tuberculosis and other prokaryotes. Inserts of exogenous material are frequently accompanied by complementary inserts tending to restore the original constrains.

  9. GenomePeek—an online tool for prokaryotic genome and metagenome analysis

    DOE PAGES

    McNair, Katelyn; Edwards, Robert A.

    2015-06-16

    As increases in prokaryotic sequencing take place, a method to quickly and accurately analyze this data is needed. Previous tools are mainly designed for metagenomic analysis and have limitations; such as long runtimes and significant false positive error rates. The online tool GenomePeek (edwards.sdsu.edu/GenomePeek) was developed to analyze both single genome and metagenome sequencing files, quickly and with low error rates. GenomePeek uses a sequence assembly approach where reads to a set of conserved genes are extracted, assembled and then aligned against the highly specific reference database. GenomePeek was found to be faster than traditional approaches while still keeping errormore » rates low, as well as offering unique data visualization options.« less

  10. GenomePeek—an online tool for prokaryotic genome and metagenome analysis

    SciTech Connect

    McNair, Katelyn; Edwards, Robert A.

    2015-06-16

    As increases in prokaryotic sequencing take place, a method to quickly and accurately analyze this data is needed. Previous tools are mainly designed for metagenomic analysis and have limitations; such as long runtimes and significant false positive error rates. The online tool GenomePeek (edwards.sdsu.edu/GenomePeek) was developed to analyze both single genome and metagenome sequencing files, quickly and with low error rates. GenomePeek uses a sequence assembly approach where reads to a set of conserved genes are extracted, assembled and then aligned against the highly specific reference database. GenomePeek was found to be faster than traditional approaches while still keeping error rates low, as well as offering unique data visualization options.

  11. Genome-scale high-resolution mapping of activating and repressive nucleotides in regulatory regions

    PubMed Central

    Ernst, Jason; Melnikov, Alexandre; Zhang, Xiaolan; Wang, Li; Rogov, Peter; Mikkelsen, Tarjei S.; Kellis, Manolis

    2016-01-01

    Massively parallel reporter assays (MPRA) enable nucleotide-resolution dissection of transcriptional regulatory regions, such as enhancers, but only few regions at a time. Here, we present a combined experimental and computational approach, Sharpr-MPRA, that allows high-resolution analysis of thousands of regions simultaneously. Sharpr-MPRA combines dense tiling of overlapping MPRA constructs with a probabilistic graphical model to recognize functional regulatory nucleotides, and to distinguish activating and repressive nucleotides, using their inferred contribution to reporter gene expression. We use Sharpr-MPRA to test 4.6 million nucleotides spanning 15,000 putative regulatory regions tiled at 5-nucleotide resolution in two human cell types. Our results recover known cell type-specific regulatory motifs and evolutionarily-conserved nucleotides, and distinguish known activating and repressive motifs. Our results also show that endogenous chromatin state and DNA accessibility are both predictive of regulatory function in reporter assays, identify retroviral elements with activating roles, and uncover ‘attenuator’ motifs with repressive roles in active chromatin. PMID:27701403

  12. Creation and genomic analysis of irradiation hybrids in Populus

    Treesearch

    Matthew S. Zinkgraf; K. Haiby; M.C. Lieberman; L. Comai; I.M. Henry; Andrew Groover

    2016-01-01

    Establishing efficient functional genomic systems for creating and characterizing genetic variation in forest trees is challenging. Here we describe protocols for creating novel gene-dosage variation in Populus through gamma-irradiation of pollen, followed by genomic analysis to identify chromosomal regions that have been deleted or inserted in...

  13. Initial sequencing and analysis of the human genome.

    PubMed

    Lander, E S; Linton, L M; Birren, B; Nusbaum, C; Zody, M C; Baldwin, J; Devon, K; Dewar, K; Doyle, M; FitzHugh, W; Funke, R; Gage, D; Harris, K; Heaford, A; Howland, J; Kann, L; Lehoczky, J; LeVine, R; McEwan, P; McKernan, K; Meldrim, J; Mesirov, J P; Miranda, C; Morris, W; Naylor, J; Raymond, C; Rosetti, M; Santos, R; Sheridan, A; Sougnez, C; Stange-Thomann, Y; Stojanovic, N; Subramanian, A; Wyman, D; Rogers, J; Sulston, J; Ainscough, R; Beck, S; Bentley, D; Burton, J; Clee, C; Carter, N; Coulson, A; Deadman, R; Deloukas, P; Dunham, A; Dunham, I; Durbin, R; French, L; Grafham, D; Gregory, S; Hubbard, T; Humphray, S; Hunt, A; Jones, M; Lloyd, C; McMurray, A; Matthews, L; Mercer, S; Milne, S; Mullikin, J C; Mungall, A; Plumb, R; Ross, M; Shownkeen, R; Sims, S; Waterston, R H; Wilson, R K; Hillier, L W; McPherson, J D; Marra, M A; Mardis, E R; Fulton, L A; Chinwalla, A T; Pepin, K H; Gish, W R; Chissoe, S L; Wendl, M C; Delehaunty, K D; Miner, T L; Delehaunty, A; Kramer, J B; Cook, L L; Fulton, R S; Johnson, D L; Minx, P J; Clifton, S W; Hawkins, T; Branscomb, E; Predki, P; Richardson, P; Wenning, S; Slezak, T; Doggett, N; Cheng, J F; Olsen, A; Lucas, S; Elkin, C; Uberbacher, E; Frazier, M; Gibbs, R A; Muzny, D M; Scherer, S E; Bouck, J B; Sodergren, E J; Worley, K C; Rives, C M; Gorrell, J H; Metzker, M L; Naylor, S L; Kucherlapati, R S; Nelson, D L; Weinstock, G M; Sakaki, Y; Fujiyama, A; Hattori, M; Yada, T; Toyoda, A; Itoh, T; Kawagoe, C; Watanabe, H; Totoki, Y; Taylor, T; Weissenbach, J; Heilig, R; Saurin, W; Artiguenave, F; Brottier, P; Bruls, T; Pelletier, E; Robert, C; Wincker, P; Smith, D R; Doucette-Stamm, L; Rubenfield, M; Weinstock, K; Lee, H M; Dubois, J; Rosenthal, A; Platzer, M; Nyakatura, G; Taudien, S; Rump, A; Yang, H; Yu, J; Wang, J; Huang, G; Gu, J; Hood, L; Rowen, L; Madan, A; Qin, S; Davis, R W; Federspiel, N A; Abola, A P; Proctor, M J; Myers, R M; Schmutz, J; Dickson, M; Grimwood, J; Cox, D R; Olson, M V; Kaul, R; Raymond, C; Shimizu, N; Kawasaki, K; Minoshima, S; Evans, G A; Athanasiou, M; Schultz, R; Roe, B A; Chen, F; Pan, H; Ramser, J; Lehrach, H; Reinhardt, R; McCombie, W R; de la Bastide, M; Dedhia, N; Blöcker, H; Hornischer, K; Nordsiek, G; Agarwala, R; Aravind, L; Bailey, J A; Bateman, A; Batzoglou, S; Birney, E; Bork, P; Brown, D G; Burge, C B; Cerutti, L; Chen, H C; Church, D; Clamp, M; Copley, R R; Doerks, T; Eddy, S R; Eichler, E E; Furey, T S; Galagan, J; Gilbert, J G; Harmon, C; Hayashizaki, Y; Haussler, D; Hermjakob, H; Hokamp, K; Jang, W; Johnson, L S; Jones, T A; Kasif, S; Kaspryzk, A; Kennedy, S; Kent, W J; Kitts, P; Koonin, E V; Korf, I; Kulp, D; Lancet, D; Lowe, T M; McLysaght, A; Mikkelsen, T; Moran, J V; Mulder, N; Pollara, V J; Ponting, C P; Schuler, G; Schultz, J; Slater, G; Smit, A F; Stupka, E; Szustakowki, J; Thierry-Mieg, D; Thierry-Mieg, J; Wagner, L; Wallis, J; Wheeler, R; Williams, A; Wolf, Y I; Wolfe, K H; Yang, S P; Yeh, R F; Collins, F; Guyer, M S; Peterson, J; Felsenfeld, A; Wetterstrand, K A; Patrinos, A; Morgan, M J; de Jong, P; Catanese, J J; Osoegawa, K; Shizuya, H; Choi, S; Chen, Y J; Szustakowki, J

    2001-02-15

    The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

  14. Analysis of recent segmental duplications in the bovine genome

    USDA-ARS?s Scientific Manuscript database

    Duplicated sequences are an important source of gene innovation and structural variation within mammalian genomes. We describe the first systematic and genome-wide analysis of segmental duplications in the modern domesticated cattle (Bos taurus). Using two distinct computational analyses, we estimat...

  15. Analysis of Primate Genomic Variation Reveals a Repeat-Driven Expansion of the Human Genome

    PubMed Central

    Liu, Ge; Program, NISC Comparative Sequencing; Zhao, Shaying; Bailey, Jeffrey A.; Sahinalp, S. Cenk; Alkan, Can; Tuzun, Eray; Green, Eric D.; Eichler, Evan E.

    2003-01-01

    We performed a detailed analysis of both single-nucleotide and large insertion/deletion events based on large-scale comparison of 10.6 Mb of genomic sequence from lemur, baboon, and chimpanzee to human. Using a human genomic reference, optimal global alignments were constructed from large (>50-kb) genomic sequence clones. These alignments were examined for the pattern, frequency, and nature of mutational events. Whereas rates of single-nucleotide substitution remain relatively constant (1–2 × 10−9 substitutions/site/year), rates of retrotransposition vary radically among different primate lineages. These differences have lead to a 15%–20% expansion of human genome size over the last 50 million years of primate evolution, 90% of it due to new retroposon insertions. Orthologous comparisons with the chimpanzee suggest that the human genome continues to significantly expand due to shifts in retrotransposition activity. Assuming that the primate genome sequence we have sampled is representative, we estimate that human euchromatin has expanded 30 Mb and 550 Mb compared to the primate genomes of chimpanzee and lemur, respectively. [Supplemental material is available online at www.genome.org.] PMID:12618366

  16. High-resolution genomic copy number profiling of glioblastoma multiforme by single nucleotide polymorphism DNA microarray.

    PubMed

    Yin, Dong; Ogawa, Seishi; Kawamata, Norihiko; Tunici, Patrizia; Finocchiaro, Gaetano; Eoli, Marica; Ruckert, Christian; Huynh, Thien; Liu, Gentao; Kato, Motohiro; Sanada, Masashi; Jauch, Anna; Dugas, Martin; Black, Keith L; Koeffler, H Phillip

    2009-05-01

    Glioblastoma multiforme (GBM) is an extremely malignant brain tumor. To identify new genomic alterations in GBM, genomic DNA of tumor tissue/explants from 55 individuals and 6 GBM cell lines were examined using single nucleotide polymorphism DNA microarray (SNP-Chip). Further gene expression analysis relied on an additional 56 GBM samples. SNP-Chip results were validated using several techniques, including quantitative PCR (Q-PCR), nucleotide sequencing, and a combination of Q-PCR and detection of microsatellite markers for loss of heterozygosity with normal copy number [acquired uniparental disomy (AUPD)]. Whole genomic DNA copy number in each GBM sample was profiled by SNP-Chip. Several signaling pathways were frequently abnormal. Either the p16(INK4A)/p15(INK4B)-CDK4/6-pRb or p14(ARF)-MDM2/4-p53 pathways were abnormal in 89% (49 of 55) of cases. Simultaneous abnormalities of both pathways occurred in 84% (46 of 55) samples. The phosphoinositide 3-kinase pathway was altered in 71% (39 of 55) GBMs either by deletion of PTEN or amplification of epidermal growth factor receptor and/or vascular endothelial growth factor receptor/platelet-derived growth factor receptor alpha. Deletion of chromosome 6q26-27 often occurred (16 of 55 samples). The minimum common deleted region included PARK2, PACRG, QKI, and PDE10A genes. Further reverse transcription Q-PCR studies showed that PARK2 expression was decreased in another collection of GBMs at a frequency of 61% (34 of 56) of samples. The 1p36.23 region was deleted in 35% (19 of 55) of samples. Notably, three samples had homozygous deletion encompassing this site. Also, a novel internal deletion of a putative tumor suppressor gene, LRP1B, was discovered causing an aberrant protein. AUPDs occurred in 58% (32 of 55) of the GBM samples and five of six GBM cell lines. A common AUPD was found at chromosome 17p13.3-12 (included p53 gene) in 13 of 61 samples and cell lines. Single-strand conformational polymorphism and nucleotide

  17. Comparative and demographic analysis of orangutan genomes

    PubMed Central

    Locke, Devin P.; Hillier, LaDeana W.; Warren, Wesley C.; Worley, Kim C.; Nazareth, Lynne V.; Muzny, Donna M.; Yang, Shiaw-Pyng; Wang, Zhengyuan; Chinwalla, Asif T.; Minx, Pat; Mitreva, Makedonka; Cook, Lisa; Delehaunty, Kim D.; Fronick, Catrina; Schmidt, Heather; Fulton, Lucinda A.; Fulton, Robert S.; Nelson, Joanne O.; Magrini, Vincent; Pohl, Craig; Graves, Tina A.; Markovic, Chris; Cree, Andy; Dinh, Huyen H.; Hume, Jennifer; Kovar, Christie L.; Fowler, Gerald R.; Lunter, Gerton; Meader, Stephen; Heger, Andreas; Ponting, Chris P.; Marques-Bonet, Tomas; Alkan, Can; Chen, Lin; Cheng, Ze; Kidd, Jeffrey M.; Eichler, Evan E.; White, Simon; Searle, Stephen; Vilella, Albert J.; Chen, Yuan; Flicek, Paul; Ma, Jian; Raney, Brian; Suh, Bernard; Burhans, Richard; Herrero, Javier; Haussler, David; Faria, Rui; Fernando, Olga; Darré, Fleur; Farré, Domènec; Gazave, Elodie; Oliva, Meritxell; Navarro, Arcadi; Roberto, Roberta; Capozzi, Oronzo; Archidiacono, Nicoletta; Valle, Giuliano Della; Purgato, Stefania; Rocchi, Mariano; Konkel, Miriam K.; Walker, Jerilyn A.; Ullmer, Brygg; Batzer, Mark A.; Smit, Arian F. A.; Hubley, Robert; Casola, Claudio; Schrider, Daniel R.; Hahn, Matthew W.; Quesada, Victor; Puente, Xose S.; Ordoñez, Gonzalo R.; López-Otín, Carlos; Vinar, Tomas; Brejova, Brona; Ratan, Aakrosh; Harris, Robert S.; Miller, Webb; Kosiol, Carolin; Lawson, Heather A.; Taliwal, Vikas; Martins, André L.; Siepel, Adam; RoyChoudhury, Arindam; Ma, Xin; Degenhardt, Jeremiah; Bustamante, Carlos D.; Gutenkunst, Ryan N.; Mailund, Thomas; Dutheil, Julien Y.; Hobolth, Asger; Schierup, Mikkel H.; Chemnick, Leona; Ryder, Oliver A.; Yoshinaga, Yuko; de Jong, Pieter J.; Weinstock, George M.; Rogers, Jeffrey; Mardis, Elaine R.; Gibbs, Richard A.; Wilson, Richard K.

    2011-01-01

    “Orangutan” is derived from the Malay term “man of the forest” and aptly describes the Southeast Asian great apes native to Sumatra and Borneo. The orangutan species, Pongo abelii (Sumatran) and Pongo pygmaeus (Bornean), are the most phylogenetically distant great apes from humans, thereby providing an informative perspective on hominid evolution. Here we present a Sumatran orangutan draft genome assembly and short read sequence data from five Sumatran and five Bornean orangutan genomes. Our analyses reveal that, compared to other primates, the orangutan genome has many unique features. Structural evolution of the orangutan genome has proceeded much more slowly than other great apes, evidenced by fewer rearrangements, less segmental duplication, a lower rate of gene family turnover and surprisingly quiescent Alu repeats, which have played a major role in restructuring other primate genomes. We also describe the first primate polymorphic neocentromere, found in both Pongo species, emphasizing the gradual evolution of orangutan genome structure. Orangutans have extremely low energy usage for a eutherian mammal1, far lower than their hominid relatives. Adding their genome to the repertoire of sequenced primates illuminates new signals of positive selection in several pathways including glycolipid metabolism. From the population perspective, both Pongo species are deeply diverse; however, Sumatran individuals possess greater diversity than their Bornean counterparts, and more species-specific variation. Our estimate of Bornean/Sumatran speciation time, 400k years ago (ya), is more recent than most previous studies and underscores the complexity of the orangutan speciation process. Despite a smaller modern census population size, the Sumatran effective population size (Ne) expanded exponentially relative to the ancestral Ne after the split, while Bornean Ne declined over the same period. Overall, the resources and analyses presented here offer new opportunities

  18. High-resolution DNA melting analysis in plant research

    USDA-ARS?s Scientific Manuscript database

    Genetic and genomic studies provide valuable insight into the inheritance, structure, organization, and function of genes. The knowledge gained from the analysis of plant genes is beneficial to all aspects of plant research, including crop improvement. New methods and tools are continually developed...

  19. Genome-Scale Analysis of Cell-Specific Regulatory Codes Using Nuclear Enzymes.

    PubMed

    Baek, Songjoon; Sung, Myong-Hee

    2016-01-01

    High-throughput sequencing technologies have made it possible for biologists to generate genome-wide profiles of chromatin features at the nucleotide resolution. Enzymes such as nucleases or transposes have been instrumental as a chromatin-probing agent due to their ability to target accessible chromatin for cleavage or insertion. On the scale of a few hundred base pairs, preferential action of the nuclear enzymes on accessible chromatin allows mapping of cell state-specific accessibility in vivo. Such accessible regions contain functionally important regulatory sites, including promoters and enhancers, which undergo active remodeling for cells adapting in a dynamic environment. DNase-seq and the more recent ATAC-seq are two assays that are gaining popularity. Deep sequencing of DNA libraries from these assays, termed genomic footprinting, has been proposed to enable the comprehensive construction of protein occupancy profiles over the genome at the nucleotide level. Recent studies have discovered limitations of genomic footprinting which reduce the scope of detectable proteins. In addition, the identification of putative factors that bind to the observed footprints remains challenging. Despite these caveats, the methodology still presents significant advantages over alternative techniques such as ChIP-seq or FAIRE-seq. Here we describe computational approaches and tools for analysis of chromatin accessibility and genomic footprinting. Proper experimental design and assay-specific data analysis ensure the detection sensitivity and maximize retrievable information. The enzyme-based chromatin profiling approaches represent a powerful and evolving methodology which facilitates our understanding of how the genome is regulated.

  20. Supercomputing for the parallelization of whole genome analysis

    PubMed Central

    Puckelwartz, Megan J.; Pesce, Lorenzo L.; Nelakuditi, Viswateja; Dellefave-Castillo, Lisa; Golbus, Jessica R.; Day, Sharlene M.; Cappola, Thomas P.; Dorn, Gerald W.; Foster, Ian T.; McNally, Elizabeth M.

    2014-01-01

    Motivation: The declining cost of generating DNA sequence is promoting an increase in whole genome sequencing, especially as applied to the human genome. Whole genome analysis requires the alignment and comparison of raw sequence data, and results in a computational bottleneck because of limited ability to analyze multiple genomes simultaneously. Results: We now adapted a Cray XE6 supercomputer to achieve the parallelization required for concurrent multiple genome analysis. This approach not only markedly speeds computational time but also results in increased usable sequence per genome. Relying on publically available software, the Cray XE6 has the capacity to align and call variants on 240 whole genomes in ∼50 h. Multisample variant calling is also accelerated. Availability and implementation: The MegaSeq workflow is designed to harness the size and memory of the Cray XE6, housed at Argonne National Laboratory, for whole genome analysis in a platform designed to better match current and emerging sequencing volume. Contact: emcnally@uchicago.edu PMID:24526712

  1. Genome-wide mapping of mutations at single-nucleotide resolution for protein, metabolic and genome engineering.

    PubMed

    Garst, Andrew D; Bassalo, Marcelo C; Pines, Gur; Lynch, Sean A; Halweg-Edwards, Andrea L; Liu, Rongming; Liang, Liya; Wang, Zhiwen; Zeitoun, Ramsey; Alexander, William G; Gill, Ryan T

    2017-01-01

    Improvements in DNA synthesis and sequencing have underpinned comprehensive assessment of gene function in bacteria and eukaryotes. Genome-wide analyses require high-throughput methods to generate mutations and analyze their phenotypes, but approaches to date have been unable to efficiently link the effects of mutations in coding regions or promoter elements in a highly parallel fashion. We report that CRISPR-Cas9 gene editing in combination with massively parallel oligomer synthesis can enable trackable editing on a genome-wide scale. Our method, CRISPR-enabled trackable genome engineering (CREATE), links each guide RNA to homologous repair cassettes that both edit loci and function as barcodes to track genotype-phenotype relationships. We apply CREATE to site saturation mutagenesis for protein engineering, reconstruction of adaptive laboratory evolution experiments, and identification of stress tolerance and antibiotic resistance genes in bacteria. We provide preliminary evidence that CREATE will work in yeast. We also provide a webtool to design multiplex CREATE libraries.

  2. Whole Genome Amplification in Genomic Analysis of Single Circulating Tumor Cells.

    PubMed

    Gasch, Christin; Pantel, Klaus; Riethdorf, Sabine

    2015-01-01

    Investigation of the genome of organisms is one of the major basics in molecular biology to understand the complex organization of cells. While genomic DNA can easily be isolated from tissues or cell cultures of plant, animal or human origin, DNA extraction from single cells is still challenging. Here, we describe three techniques for the amplification of genomic DNA of fixed single circulating tumor cells (CTC) isolated from blood of cancer patients. This amplification is aimed to increase DNA amounts from those of one cell to yields sufficient for different DNA analyses such as mutational analysis including next-generation sequencing, array-comparative genome hybridization (CGH), and quantitative measurement of gene amplifications. Molecular analysis of CTC as liquid biopsy can be used to identify therapeutic targets in personalized medicine directed, e.g. against human epidermal growth factor receptor 2 (HER2) or epidermal growth factor receptor (EGFR) and to stratify the patients to those therapies.

  3. Genomic Analysis of wig-1 Pathways

    PubMed Central

    Sedaghat, Yalda; Mazur, Curt; Sabripour, Mahyar; Hung, Gene; Monia, Brett P.

    2012-01-01

    Background Wig-1 is a transcription factor regulated by p53 that can interact with hnRNP A2/B1, RNA Helicase A, and dsRNAs, which plays an important role in RNA and protein stabilization. in vitro studies have shown that wig-1 binds p53 mRNA and stabilizes it by protecting it from deadenylation. Furthermore, p53 has been implicated as a causal factor in neurodegenerative diseases based in part on its selective regulatory function on gene expression, including genes which, in turn, also possess regulatory functions on gene expression. In this study we focused on the wig-1 transcription factor as a downstream p53 regulated gene and characterized the effects of wig-1 down regulation on gene expression in mouse liver and brain. Methods and Results Antisense oligonucleotides (ASOs) were identified that specifically target mouse wig-1 mRNA and produce a dose-dependent reduction in wig-1 mRNA levels in cell culture. These wig-1 ASOs produced marked reductions in wig-1 levels in liver following intraperitoneal administration and in brain tissue following ASO administration through a single striatal bolus injection in FVB and BACHD mice. Wig-1 suppression was well tolerated and resulted in the reduction of mutant Htt protein levels in BACHD mouse brain but had no effect on normal Htt protein levels nor p53 mRNA or protein levels. Expression microarray analysis was employed to determine the effects of wig-1 suppression on genome-wide expression in mouse liver and brain. Reduction of wig-1 caused both down regulation and up regulation of several genes, and a number of wig-1 regulated genes were identified that potentially links wig-1 various signaling pathways and diseases. Conclusion Antisense oligonucleotides can effectively reduce wig-1 levels in mouse liver and brain, which results in specific changes in gene expression for pathways relevant to both the nervous system and cancer. PMID:22347364

  4. Analysis of Heritability Using Genome-Wide Data.

    PubMed

    Hall, Jacob B; Bush, William S

    2016-10-11

    Most analyses of genome-wide association data consider each variant independently without considering or adjusting for the genetic background present in the rest of the genome. New approaches to genome analysis use representations of genomic sharing to better account for confounding factors like population stratification or to directly approximate heritability through the estimated sharing of individuals in a dataset. These approaches use mixed linear models, which relate genotypic sharing to phenotypic sharing, and rely on the efficient computation of genetic sharing among individuals in a dataset. This unit describes the principles and practical application of mixed models for the analysis of genome-wide association study data. © 2016 by John Wiley & Sons, Inc.

  5. In Situ Super-Resolution Imaging of Genomic DNA with OligoSTORM and OligoDNA-PAINT.

    PubMed

    Beliveau, Brian J; Boettiger, Alistair N; Nir, Guy; Bintu, Bogdan; Yin, Peng; Zhuang, Xiaowei; Wu, C-Ting

    2017-01-01

    OligoSTORM and OligoDNA-PAINT meld the Oligopaint technology for fluorescent in situ hybridization (FISH) with, respectively, Stochastic Optical Reconstruction Microscopy (STORM) and DNA-based Point Accumulation for Imaging in Nanoscale Topography (DNA-PAINT) to enable in situ single-molecule super-resolution imaging of nucleic acids. Both strategies enable ≤20 nm resolution and are appropriate for imaging nanoscale features of the genomes of a wide range of species, including human, mouse, and fruit fly (Drosophila).

  6. First report of two complete Clostridium chauvoei genome sequences and detailed in silico genome analysis.

    PubMed

    Thomas, Prasad; Semmler, Torsten; Eichhorn, Inga; Lübke-Becker, Antina; Werckenthin, Christiane; Abdel-Glil, Mostafa Y; Wieler, Lothar H; Neubauer, Heinrich; Seyboldt, Christian

    2017-10-01

    Clostridium (C.) chauvoei is a Gram-positive, spore forming, anaerobic bacterium. It causes black leg in ruminants, a typically fatal histotoxic myonecrosis. High quality circular genome sequences were generated for the C. chauvoei type strain DSM 7528(T) (ATCC 10092(T)) and a field strain 12S0467 isolated in Germany. The origin of replication (oriC) was comparable to that of Bacillus subtilis in structure with two regions containing DnaA boxes. Similar prophages were identified in the genomes of both C. chauvoei strains which also harbored hemolysin and bacterial spore formation genes. A CRISPR type I-B system with limited variations in the repeat number was identified. Sporulation and germination process related genes were homologous to that of the Clostridia cluster I group but novel variations for regulatory genes were identified indicative for strain specific control of regulatory events. Phylogenomics showed a higher relatedness to C. septicum than to other so far sequenced genomes of species belonging to the genus Clostridium. Comparative genome analysis of three C. chauvoei circular genome sequences revealed the presence of few inversions and translocations in locally collinear blocks (LCBs). The species genome also shows a large number of genes involved in proteolysis, genes for glycosyl hydrolases and metal iron transportation genes which are presumably involved in virulence and survival in the host. Three conserved flagellar genes (fliC) were identified in each of the circular genomes. In conclusion this is the first comparative analysis of circular genomes for the species C. chauvoei, enabling insights into genome composition and virulence factor variation. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  7. Alfresco—A Workbench for Comparative Genomic Sequence Analysis

    PubMed Central

    Jareborg, Niclas; Durbin, Richard

    2000-01-01

    Comparative analysis of genomic sequences provides a powerful tool for identifying regions of potential biologic function; by comparing corresponding regions of genomes from suitable species, protein coding or regulatory regions can be identified by their homology. This requires the use of several specific types of computational analysis tools. Many programs exist for these types of analysis; not many exist for overall view/control of the results, which is necessary for large-scale genomic sequence analysis. Using Java, we have developed a new visualization tool that allows effective comparative genome sequence analysis. The program handles a pair of sequences from putatively homologous regions in different species. Results from various different existing external analysis programs, such as database searching, gene prediction, repeat masking, and alignment programs, are visualized and used to find corresponding functional sequence domains in the two sequences. The user interacts with the program through a graphic display of the genome regions, in which an independently scrollable and zoomable symbolic representation of the sequences is shown. As an example, the analysis of two unannotated orthologous genomic sequences from human and mouse containing parts of the UTY locus is presented. PMID:10958633

  8. Genome-Wide Mapping of Nucleosome Positions in Yeast Using High-Resolution MNase ChIP-Seq

    PubMed Central

    Wal, Megha

    2016-01-01

    Eukaryotic DNA is packaged into chromatin where nucleosomes form the basic building unit. Knowing the precise positions of nucleosomes is important because they determine the accessibility of underlying regulatory DNA sequences. Here we describe a detailed method to map on a genomic scale the locations of nucleosomes with very high resolution. Micrococcal nuclease (MNase) digestion followed by chromatin immunoprecipitation and facilitated library construction for deep sequencing provides a simple and accurate map of nucleosome positions. PMID:22929772

  9. Mechanisms of assembly and genome packaging in an RNA virus revealed by high-resolution cryo-EM

    PubMed Central

    Hesketh, Emma L.; Meshcheriakova, Yulia; Dent, Kyle C.; Saxena, Pooja; Thompson, Rebecca F.; Cockburn, Joseph J.; Lomonossoff, George P.; Ranson, Neil A.

    2015-01-01

    Cowpea mosaic virus is a plant-infecting member of the Picornavirales and is of major interest in the development of biotechnology applications. Despite the availability of >100 crystal structures of Picornavirales capsids, relatively little is known about the mechanisms of capsid assembly and genome encapsidation. Here we have determined cryo-electron microscopy reconstructions for the wild-type virus and an empty virus-like particle, to 3.4 Å and 3.0 Å resolution, respectively, and built de novo atomic models of their capsids. These new structures reveal the C-terminal region of the small coat protein subunit, which is essential for virus assembly and which was missing from previously determined crystal structures, as well as residues that bind to the viral genome. These observations allow us to develop a new model for genome encapsidation and capsid assembly. PMID:26657148

  10. Comparative genomic analysis of eutherian interferon-γ-inducible GTPases.

    PubMed

    Premzl, Marko

    2012-11-01

    The interferon-γ-inducible GTPases, IFGGs, are intracellular proteins involved in immune response against pathogens. A comprehensive comparative genomic review and analysis of eutherian IFGGs was carried out using public genomic sequences. The 64 eutherian IFGG genes were examined in detail and annotated. The eutherian IFGG promoter types were first catalogued followed by a phylogenetic analysis of eutherian IFGGs, which described five major IFGG clusters. The patterns of differential gene expansions and protein regions that may regulate IFGG catalytic features suggested a new classification of eutherian IFGGs. This mini-review has also provided new tests of reliability of public genomic sequences as well as tests of protein molecular evolution.

  11. High-resolution physical mapping in Pennisetum squamulatum reveals extensive chromosomal heteromorphism of the genomic region associated with apomixis.

    PubMed

    Akiyama, Yukio; Conner, Joann A; Goel, Shailendra; Morishige, Daryl T; Mullet, John E; Hanna, Wayne W; Ozias-Akins, Peggy

    2004-04-01

    Gametophytic apomixis is asexual reproduction as a consequence of parthenogenetic development of a chromosomally unreduced egg. The trait leads to the production of embryos with a maternal genotype, i.e. progeny are clones of the maternal plant. The application of the trait in agriculture could be a tremendous tool for crop improvement through conventional and nonconventional breeding methods. Unfortunately, there are no major crops that reproduce by apomixis, and interspecific hybridization with wild relatives has not yet resulted in commercially viable germplasm. Pennisetum squamulatum is an aposporous apomict from which the gene(s) for apomixis has been transferred to sexual pearl millet by backcrossing. Twelve molecular markers that are linked with apomixis coexist in a tight linkage block called the apospory-specific genomic region (ASGR), and several of these markers have been shown to be hemizygous in the polyploid genome of P. squamulatum. High resolution genetic mapping of these markers has not been possible because of low recombination in this region of the genome. We now show the physical arrangement of bacterial artificial chromosomes containing apomixis-linked molecular markers by high resolution fluorescence in situ hybridization on pachytene chromosomes. The size of the ASGR, currently defined as the entire hemizygous region that hybridizes with apomixis-linked bacterial artificial chromosomes, was estimated on pachytene and mitotic chromosomes to be approximately 50 Mbp (a quarter of the chromosome). The ASGR includes highly repetitive sequences from an Opie-2-like retrotransposon family that are particularly abundant in this region of the genome.

  12. Comparative analysis of rosaceous genomes and the reconstruction of a putative ancestral genome for the family

    PubMed Central

    2011-01-01

    Background Comparative genome mapping studies in Rosaceae have been conducted until now by aligning genetic maps within the same genus, or closely related genera and using a limited number of common markers. The growing body of genomics resources and sequence data for both Prunus and Fragaria permits detailed comparisons between these genera and the recently released Malus × domestica genome sequence. Results We generated a comparative analysis using 806 molecular markers that are anchored genetically to the Prunus and/or Fragaria reference maps, and physically to the Malus genome sequence. Markers in common for Malus and Prunus, and Malus and Fragaria, respectively were 784 and 148. The correspondence between marker positions was high and conserved syntenic blocks were identified among the three genera in the Rosaceae. We reconstructed a proposed ancestral genome for the Rosaceae. Conclusions A genome containing nine chromosomes is the most likely candidate for the ancestral Rosaceae progenitor. The number of chromosomal translocations observed between the three genera investigated was low. However, the number of inversions identified among Malus and Prunus was much higher than any reported genome comparisons in plants, suggesting that small inversions have played an important role in the evolution of these two genera or of the Rosaceae. PMID:21226921

  13. Resequencing of the common marmoset genome improves genome assemblies and gene-coding sequence analysis

    PubMed Central

    Sato, Kengo; Kuroki, Yoko; Kumita, Wakako; Fujiyama, Asao; Toyoda, Atsushi; Kawai, Jun; Iriki, Atsushi; Sasaki, Erika; Okano, Hideyuki; Sakakibara, Yasubumi

    2015-01-01

    The first draft of the common marmoset (Callithrix jacchus) genome was published by the Marmoset Genome Sequencing and Analysis Consortium. The draft was based on whole-genome shotgun sequencing, and the current assembly version is Callithrix_jacches-3.2.1, but there still exist 187,214 undetermined gap regions and supercontigs and relatively short contigs that are unmapped to chromosomes in the draft genome. We performed resequencing and assembly of the genome of common marmoset by deep sequencing with high-throughput sequencing technology. Several different sequence runs using Illumina sequencing platforms were executed, and 181 Gbp of high-quality bases including mate-pairs with long insert lengths of 3, 8, 20, and 40 Kbp were obtained, that is, approximately 60× coverage. The resequencing significantly improved the MGSAC draft genome sequence. The N50 of the contigs, which is a statistical measure used to evaluate assembly quality, doubled. As a result, 51% of the contigs (total length: 299 Mbp) that were unmapped to chromosomes in the MGSAC draft were merged with chromosomal contigs, and the improved genome sequence helped to detect 5,288 new genes that are homologous to human cDNAs and the gaps in 5,187 transcripts of the Ensembl gene annotations were completely filled. PMID:26586576

  14. Resequencing of the common marmoset genome improves genome assemblies and gene-coding sequence analysis.

    PubMed

    Sato, Kengo; Kuroki, Yoko; Kumita, Wakako; Fujiyama, Asao; Toyoda, Atsushi; Kawai, Jun; Iriki, Atsushi; Sasaki, Erika; Okano, Hideyuki; Sakakibara, Yasubumi

    2015-11-20

    The first draft of the common marmoset (Callithrix jacchus) genome was published by the Marmoset Genome Sequencing and Analysis Consortium. The draft was based on whole-genome shotgun sequencing, and the current assembly version is Callithrix_jacches-3.2.1, but there still exist 187,214 undetermined gap regions and supercontigs and relatively short contigs that are unmapped to chromosomes in the draft genome. We performed resequencing and assembly of the genome of common marmoset by deep sequencing with high-throughput sequencing technology. Several different sequence runs using Illumina sequencing platforms were executed, and 181 Gbp of high-quality bases including mate-pairs with long insert lengths of 3, 8, 20, and 40 Kbp were obtained, that is, approximately 60× coverage. The resequencing significantly improved the MGSAC draft genome sequence. The N50 of the contigs, which is a statistical measure used to evaluate assembly quality, doubled. As a result, 51% of the contigs (total length: 299 Mbp) that were unmapped to chromosomes in the MGSAC draft were merged with chromosomal contigs, and the improved genome sequence helped to detect 5,288 new genes that are homologous to human cDNAs and the gaps in 5,187 transcripts of the Ensembl gene annotations were completely filled.

  15. The Arabidopsis TAC Position Viewer: a high-resolution map of transformation-competent artificial chromosome (TAC) clones aligned with the Arabidopsis thaliana Columbia-0 genome.

    PubMed

    Hirose, Yoshitsugu; Suda, Kunihiro; Liu, Yao-Guang; Sato, Shusei; Nakamura, Yukino; Yokoyama, Koji; Yamamoto, Naoki; Hanano, Shigeru; Takita, Eiji; Sakurai, Nozomu; Suzuki, Hideyuki; Nakamura, Yasukazu; Kaneko, Takakazu; Yano, Kentaro; Tabata, Satoshi; Shibata, Daisuke

    2015-09-01

    We present a high-resolution map of genomic transformation-competent artificial chromosome (TAC) clones extending over all Arabidopsis thaliana (Arabidopsis) chromosomes. The Arabidopsis genomic TAC clones have been valuable genetic tools. Previously, we constructed an Arabidopsis genomic TAC library consisting of more than 10,000 TAC clones harboring large genomic DNA fragments extending over the whole Arabidopsis genome. Here, we determined 13,577 end sequences from 6987 Arabidopsis TAC clones and mapped 5937 TAC clones to precise locations, covering approximately 90% of the Arabidopsis chromosomes. We present the large-scale data set of TAC clones with high-resolution mapping information as a Java application tool, the Arabidopsis TAC Position Viewer, which provides ready-to-go transformable genomic DNA clones corresponding to certain loci on Arabidopsis chromosomes. The TAC clone resources will accelerate genomic DNA cloning, positional walking, complementation of mutants and DNA transformation for heterologous gene expression.

  16. Resolution of Ultramicroscopy and Field of View Analysis

    PubMed Central

    Leischner, Ulrich; Zieglgänsberger, Walter; Dodt, Hans-Ulrich

    2009-01-01

    In a recent publication we described a microscopical technique called Ultramicroscopy, combined with a histological procedure that makes biological samples transparent. With this combination we can gather three-dimensional image data of large biological samples. Here we present the theoretical analysis of the z-resolution. By analyzing the cross-section of the illuminating sheet of light we derive resolution values according to the Rayleigh-criterion. Next we investigate the resolution adjacent to the focal point of the illumination beam, analyze throughout what extend the illumination beam is of acceptable sharpness and investigate the resolution improvements caused by the objective lens. Finally we conclude with a useful rule for the sampling rates. These findings are of practical importance for researchers working with Ultramicroscopy to decide on adequate sampling rates. They are also necessary to modify deconvolution techniques to gain further image improvements. PMID:19492052

  17. Experimental analysis of the resolution in shallow GPR survey

    NASA Astrophysics Data System (ADS)

    Pérez-Gracia, V.; González-Drigo, R.; Di Capua, D.; Pujades, L. G.

    2007-10-01

    Ground-penetrating radar (GPR) is a high resolution surveying method applied to civil engineering, surface geology, archaeology and other disciplines. Today, GPR is an effective technique for investigating the integrity of concrete structures. As a non destructive technique, it is particularly suited for the assessment of large structures such as prestressed concrete bridges, highways, railway tracks and tunnels. A significant parameter in GPR high frequency surveys is the horizontal resolution. This parameter indicates the capability of the method to detect anomalies and to discriminate between adjacent elements. In concrete structures analysis the horizontal resolution lead to determine the exact position of reinforcing elements. This paper presents the basics of GPR, its limits, and the experimental measurements and the signals post-processing performed in order to determine the horizontal resolution of a 1.6 GHz antenna in concrete structures assessments.

  18. Mycobacterial species as case-study of comparative genome analysis.

    PubMed

    Zakham, F; Belayachi, L; Ussery, D; Akrim, M; Benjouad, A; El Aouad, R; Ennaji, M M

    2011-02-08

    The genus Mycobacterium represents more than 120 species including important pathogens of human and cause major public health problems and illnesses. Further, with more than 100 genome sequences from this genus, comparative genome analysis can provide new insights for better understanding the evolutionary events of these species and improving drugs, vaccines, and diagnostics tools for controlling Mycobacterial diseases. In this present study we aim to outline a comparative genome analysis of fourteen Mycobacterial genomes: M. avium subsp. paratuberculosis K—10, M. bovis AF2122/97, M. bovis BCG str. Pasteur 1173P2, M. leprae Br4923, M. marinum M, M. sp. KMS, M. sp. MCS, M. tuberculosis CDC1551, M. tuberculosis F11, M. tuberculosis H37Ra, M. tuberculosis H37Rv, M. tuberculosis KZN 1435 , M. ulcerans Agy99,and M. vanbaalenii PYR—1, For this purpose a comparison has been done based on their length of genomes, GC content, number of genes in different data bases (Genbank, Refseq, and Prodigal). The BLAST matrix of these genomes has been figured to give a lot of information about the similarity between species in a simple scheme. As a result of multiple genome analysis, the pan and core genome have been defined for twelve Mycobacterial species. We have also introduced the genome atlas of the reference strain M. tuberculosis H37Rv which can give a good overview of this genome. And for examining the phylogenetic relationships among these bacteria, a phylogenic tree has been constructed from 16S rRNA gene for tuberculosis and non tuberculosis Mycobacteria to understand the evolutionary events of these species.

  19. Nucleotide resolution analysis of TMPRSS2 and ERG rearrangements in prostate cancer

    PubMed Central

    Weier, Christopher; Haffner, Michael C.; Mosbruger, Timothy; Esopi, David M.; Hicks, Jessica; Zheng, Qizhi; Fedor, Helen; Isaacs, William B.; De Marzo, Angelo M.; Nelson, William G.; Yegnasubramanian, Srinivasan

    2013-01-01

    TMPRSS2-ERG rearrangements occur in approximately 50% of prostate cancers and therefore represent one of the most frequently observed structural rearrangements in all cancers. However, little is known about the genomic architecture of such rearrangements. We therefore designed and optimized a pipeline involving target-capture of TMPRSS2 and ERG genomic sequences coupled with paired-end next generation sequencing to resolve genomic rearrangement breakpoints in TMPRSS2 and ERG at nucleotide resolution in a large series of primary prostate cancer specimens (n = 83). This strategy showed >90% sensitivity and specificity in identifying TMPRSS2-ERG rearrangements, and allowed identification of intra- and inter-chromosomal rearrangements involving TMPRSS2 and ERG with known and novel fusion partners. Our results indicate that rearrangement breakpoints show strong clustering in specific intronic regions of TMPRSS2 and ERG. The observed TMPRSS2-ERG rearrangements often exhibited complex chromosomal architecture associated with several intra- and inter-chromosomal rearrangements. Nucleotide resolution analysis of breakpoint junctions revealed that the majority of TMPRSS2 and ERG rearrangements (~88%) occurred at or near regions of microhomology or involved insertions of one or more base pairs. This architecture implicates nonhomologous end joining (NHEJ) and microhomology mediated end joining (MMEJ) pathways in the generation of such rearrangements. These analyses have provided important insights into the molecular mechanisms involved in generating prostate cancer-specific recurrent rearrangements. PMID:23447416

  20. Power, resolution and bias: recent advances in insect phylogeny driven by the genomic revolution.

    PubMed

    Yeates, David K; Meusemann, Karen; Trautwein, Michelle; Wiegmann, Brian; Zwick, Andreas

    2016-02-01

    Our understanding on the phylogenetic relationships of insects has been revolutionised in the last decade by the proliferation of next generation sequencing technologies (NGS). NGS has allowed insect systematists to assemble very large molecular datasets that include both model and non-model organisms. Such datasets often include a large proportion of the total number of protein coding sequences available for phylogenetic comparison. We review some early entomological phylogenomic studies that employ a range of different data sampling protocols and analyses strategies, illustrating a fundamental renaissance in our understanding of insect evolution all driven by the genomic revolution. The analysis of phylogenomic datasets is challenging because of their size and complexity, and it is obvious that the increasing size alone does not ensure that phylogenetic signal overcomes systematic biases in the data. Biases can be due to various factors such as the method of data generation and assembly, or intrinsic biological feature of the data per se, such as similarities due to saturation or compositional heterogeneity. Such biases often cause violations in the underlying assumptions of phylogenetic models. We review some of the bioinformatics tools available and being developed to detect and minimise systematic biases in phylogenomic datasets. Phylogenomic-scale data coupled with sophisticated analyses will revolutionise our understanding of insect functional genomics. This will illuminate the relationship between the vast range of insect phenotypic diversity and underlying genetic diversity. In combination with rapidly developing methods to estimate divergence times, these analyses will also provide a compelling view of the rates and patterns of lineagenesis (birth of lineages) over the half billion years of insect evolution.

  1. Personalized genomic results: analysis of informational needs.

    PubMed

    Schmidlen, Tara J; Wawak, Lisa; Kasper, Rachel; García-España, J Felipe; Christman, Michael F; Gordon, Erynn S

    2014-08-01

    Use of genomic information in healthcare is increasing; however data on the needs of consumers of genomic information is limited. The Coriell Personalized Medicine Collaborative (CPMC) is a longitudinal study investigating the utility of personalized medicine. Participants receive results reflecting risk of common complex conditions and drug-gene pairs deemed actionable by an external review board. To explore the needs of individuals receiving genomic information we reviewed all genetic counseling sessions with CPMC participants. A retrospective qualitative review of notes from 157 genetic counseling inquiries was conducted. Notes were coded for salient themes. Five primary themes; "understanding risk", "basic genetics", "complex disease genetics", "what do I do now?" and "other" were identified. Further review revealed that participants had difficulty with basic genetic concepts, confused relative and absolute risks, and attributed too high a risk burden to individual single nucleotide polymorphisms (SNPs). Despite these hurdles, counseled participants recognized that behavior changes could potentially mitigate risk and there were few comments alluding to an overly deterministic or fatalistic interpretation of results. Participants appeared to recognize the multifactorial nature of the diseases for which results were provided; however education to understand the complexities of genomic risk information was often needed.

  2. Genome sequencing and analysis of the model grass Brachypodium distachyon.

    PubMed

    2010-02-11

    Three subfamilies of grasses, the Ehrhartoideae, Panicoideae and Pooideae, provide the bulk of human nutrition and are poised to become major sources of renewable energy. Here we describe the genome sequence of the wild grass Brachypodium distachyon (Brachypodium), which is, to our knowledge, the first member of the Pooideae subfamily to be sequenced. Comparison of the Brachypodium, rice and sorghum genomes shows a precise history of genome evolution across a broad diversity of the grasses, and establishes a template for analysis of the large genomes of economically important pooid grasses such as wheat. The high-quality genome sequence, coupled with ease of cultivation and transformation, small size and rapid life cycle, will help Brachypodium reach its potential as an important model system for developing new energy and food crops.

  3. Genome sequencing and analysis of the model grass Brachypodium distachyon

    SciTech Connect

    Yang, Xiaohan; Kalluri, Udaya C; Tuskan, Gerald A

    2010-01-01

    Three subfamilies of grasses, the Ehrhartoideae, Panicoideae and Pooideae, provide the bulk of human nutrition and are poised to become major sources of renewable energy. Here we describe the genome sequence of the wild grass Brachypodium distachyon (Brachypodium), which is, to our knowledge, the first member of the Pooideae subfamily to be sequenced. Comparison of the Brachypodium, rice and sorghum genomes shows a precise history of genome evolution across a broad diversity of the grasses, and establishes a template for analysis of the large genomes of economically important pooid grasses such as wheat. The high-quality genome sequence, coupled with ease of cultivation and transformation, small size and rapid life cycle, will help Brachypodium reach its potential as an important model system for developing new energy and food crops.

  4. A Comparative Analysis of the Lyve-SET Phylogenomics Pipeline for Genomic Epidemiology of Foodborne Pathogens.

    PubMed

    Katz, Lee S; Griswold, Taylor; Williams-Newkirk, Amanda J; Wagner, Darlene; Petkau, Aaron; Sieffert, Cameron; Van Domselaar, Gary; Deng, Xiangyu; Carleton, Heather A

    2017-01-01

    genomic profiles, its central database, and its ability to be run in a graphical user interface. However, creating a functional wgMLST scheme requires extended up-front development and subject-matter expertise. When a scheme does not exist or when the highest resolution is needed, SNP analysis is used. Using three Listeria outbreak data sets, we demonstrated the concordance between Lyve-SET SNP typing and wgMLST. Availability: Lyve-SET can be found at https://github.com/lskatz/Lyve-SET.

  5. A Comparative Analysis of the Lyve-SET Phylogenomics Pipeline for Genomic Epidemiology of Foodborne Pathogens

    PubMed Central

    Katz, Lee S.; Griswold, Taylor; Williams-Newkirk, Amanda J.; Wagner, Darlene; Petkau, Aaron; Sieffert, Cameron; Van Domselaar, Gary; Deng, Xiangyu; Carleton, Heather A.

    2017-01-01

    genomic profiles, its central database, and its ability to be run in a graphical user interface. However, creating a functional wgMLST scheme requires extended up-front development and subject-matter expertise. When a scheme does not exist or when the highest resolution is needed, SNP analysis is used. Using three Listeria outbreak data sets, we demonstrated the concordance between Lyve-SET SNP typing and wgMLST. Availability: Lyve-SET can be found at https://github.com/lskatz/Lyve-SET. PMID:28348549

  6. Bovine Genome Database: supporting community annotation and analysis of the Bos taurus genome

    PubMed Central

    2010-01-01

    Background A goal of the Bovine Genome Database (BGD; http://BovineGenome.org) has been to support the Bovine Genome Sequencing and Analysis Consortium (BGSAC) in the annotation and analysis of the bovine genome. We were faced with several challenges, including the need to maintain consistent quality despite diversity in annotation expertise in the research community, the need to maintain consistent data formats, and the need to minimize the potential duplication of annotation effort. With new sequencing technologies allowing many more eukaryotic genomes to be sequenced, the demand for collaborative annotation is likely to increase. Here we present our approach, challenges and solutions facilitating a large distributed annotation project. Results and Discussion BGD has provided annotation tools that supported 147 members of the BGSAC in contributing 3,871 gene models over a fifteen-week period, and these annotations have been integrated into the bovine Official Gene Set. Our approach has been to provide an annotation system, which includes a BLAST site, multiple genome browsers, an annotation portal, and the Apollo Annotation Editor configured to connect directly to our Chado database. In addition to implementing and integrating components of the annotation system, we have performed computational analyses to create gene evidence tracks and a consensus gene set, which can be viewed on individual gene pages at BGD. Conclusions We have provided annotation tools that alleviate challenges associated with distributed annotation. Our system provides a consistent set of data to all annotators and eliminates the need for annotators to format data. Involving the bovine research community in genome annotation has allowed us to leverage expertise in various areas of bovine biology to provide biological insight into the genome sequence. PMID:21092105

  7. A structural genomics analysis of histidine kinase sensor domains

    NASA Astrophysics Data System (ADS)

    Cheung, Jonah

    2005-11-01

    Histidine kinase sensors are a part of a two-component system of protein signaling in prokaryotes and lower eukaryotes that relay an external environmental signal to an adaptive internal cellular response. Signal transduction occurs via phosphotransfer between a sensor protein and a response regulator which interact in tandem. The sensor is usually a transmembrane protein that contains a conserved cytoplasmic histidine kinase transmitter domain and a modular periplasmic sensor domain. The response regulator is cytoplasmic protein that contains a receiver domain that interacts with the histidine kinase, and an output domain that interacts with regulators of transcription or chemotaxis. My work focuses on the X-ray structure determination of a variety of bacterial sensor domains, based on a structural genomics analysis of the entire sensor domain family. Structures of the NarX, DcuS, LisK, and DctB sensor domains have been solved to atomic resolution, some in both ligand-bound and ligand-free states. Two distinct structural folds have been revealed---all-alpha helical and mixed alpha-beta. An analysis of the structures reveals a possible mechanism of transmembrane signaling in histidine kinase sensors as a sliding-piston motion between transmembrane helices. Although there is great diversity in ligand binding, there appears to be a small number of distinct sensor domain folds for which structural representatives of two have been solved. A final synthesis of the structural information with a comprehensive bio-informatics analysis of all histidine kinase sensor domain sequences allows fold prediction for over 400 sensor domains, in a step towards mapping the entire structural landscape of this protein family.

  8. Copy Number Variation Analysis by Array Analysis of Single Cells Following Whole Genome Amplification.

    PubMed

    Dimitriadou, Eftychia; Zamani Esteki, Masoud; Vermeesch, Joris Robert

    2015-01-01

    Whole genome amplification is required to ensure the availability of sufficient material for copy number variation analysis of a genome deriving from an individual cell. Here, we describe the protocols we use for copy number variation analysis of non-fixed single cells by array-based approaches following single-cell isolation and whole genome amplification. We are focusing on two alternative protocols, an isothermal and a PCR-based whole genome amplification method, followed by either comparative genome hybridization (aCGH) or SNP array analysis, respectively.

  9. A high resolution genetic map anchoring scaffolds of the sequenced watermelon genome

    USDA-ARS?s Scientific Manuscript database

    As part of our ongoing efforts to sequence and map the watermelon (Citrullus spp.) genome, we have constructed a high-density genetic linkage map. The map positioned 234 watermelon genome sequence scaffolds (an average size of 1.41 Mb) that cover about 330 Mb and account for 93.5% of the 353 Mb of ...

  10. Microarray-based ultra-high resolution discovery of genomic deletion mutations

    PubMed Central

    2014-01-01

    Background Oligonucleotide microarray-based comparative genomic hybridization (CGH) offers an attractive possible route for the rapid and cost-effective genome-wide discovery of deletion mutations. CGH typically involves comparison of the hybridization intensities of genomic DNA samples with microarray chip representations of entire genomes, and has widespread potential application in experimental research and medical diagnostics. However, the power to detect small deletions is low. Results Here we use a graduated series of Arabidopsis thaliana genomic deletion mutations (of sizes ranging from 4 bp to ~5 kb) to optimize CGH-based genomic deletion detection. We show that the power to detect smaller deletions (4, 28 and 104 bp) depends upon oligonucleotide density (essentially the number of genome-representative oligonucleotides on the microarray chip), and determine the oligonucleotide spacings necessary to guarantee detection of deletions of specified size. Conclusions Our findings will enhance a wide range of research and clinical applications, and in particular will aid in the discovery of genomic deletions in the absence of a priori knowledge of their existence. PMID:24655320

  11. Increasing phylogenetic resolution at low taxonomic levels using massively parallel sequencing of chloroplast genomes

    Treesearch

    Matthew Parks; Richard Cronn; Aaron Liston

    2009-01-01

    We reconstruct the infrageneric phylogeny of Pinus from 37 nearly-complete chloroplast genomes (average 109 kilobases each of an approximately 120 kilobase genome) generated using multiplexed massively parallel sequencing. We found that 30/33 ingroup nodes resolved wlth > 95-percent bootstrap support; this is a substantial improvement relative...

  12. MIPS: analysis and annotation of proteins from whole genomes.

    PubMed

    Mewes, H W; Amid, C; Arnold, R; Frishman, D; Güldener, U; Mannhaupt, G; Münsterkötter, M; Pagel, P; Strack, N; Stümpflen, V; Warfsmann, J; Ruepp, A

    2004-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein-protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de).

  13. MIPS: analysis and annotation of proteins from whole genomes

    PubMed Central

    Mewes, H. W.; Amid, C.; Arnold, R.; Frishman, D.; Güldener, U.; Mannhaupt, G.; Münsterkötter, M.; Pagel, P.; Strack, N.; Stümpflen, V.; Warfsmann, J.; Ruepp, A.

    2004-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein–protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de). PMID:14681354

  14. Evacuee Compliance Behavior Analysis using High Resolution Demographic Information

    SciTech Connect

    Lu, Wei; Han, Lee; Liu, Cheng; Tuttle, Mark A; Bhaduri, Budhendra L

    2014-01-01

    The purpose of this study is to examine whether evacuee compliance behavior with route assignments from different resolutions of demographic data would impact the evacuation performance. Most existing evacuation strategies assume that travelers will follow evacuation instructions, while in reality a certain percent of evacuees do not comply with prescribed instructions. In this paper, a comparison study of evacuation assignment based on Traffic Analysis Zones (TAZ) and high resolution LandScan USA Population Cells (LPC) were conducted for the detailed road network representing Alexandria, Virginia. A revised platform for evacuation modeling built on high resolution demographic data and activity-based microscopic traffic simulation is proposed. The results indicate that evacuee compliance behavior affects evacuation efficiency with traditional TAZ assignment, but it does not significantly compromise the efficiency with high resolution LPC assignment. The TAZ assignment also underestimates the real travel time during evacuation, especially for high compliance simulations. This suggests that conventional evacuation studies based on TAZ assignment might not be effective at providing efficient guidance to evacuees. From the high resolution data perspective, traveler compliance behavior is an important factor but it does not impact the system performance significantly. The highlight of evacuee compliance behavior analysis should be emphasized on individual evacuee level route/shelter assignments, rather than the whole system performance.

  15. A process for analysis of microarray comparative genomics hybridisation studies for bacterial genomes

    PubMed Central

    Carter, Ben; Wu, Guanghui; Woodward, Martin J; Anjum, Muna F

    2008-01-01

    Background Microarray based comparative genomic hybridisation (CGH) experiments have been used to study numerous biological problems including understanding genome plasticity in pathogenic bacteria. Typically such experiments produce large data sets that are difficult for biologists to handle. Although there are some programmes available for interpretation of bacterial transcriptomics data and CGH microarray data for looking at genetic stability in oncogenes, there are none specifically to understand the mosaic nature of bacterial genomes. Consequently a bottle neck still persists in accurate processing and mathematical analysis of these data. To address this shortfall we have produced a simple and robust CGH microarray data analysis process that may be automated in the future to understand bacterial genomic diversity. Results The process involves five steps: cleaning, normalisation, estimating gene presence and absence or divergence, validation, and analysis of data from test against three reference strains simultaneously. Each stage of the process is described and we have compared a number of methods available for characterising bacterial genomic diversity, for calculating the cut-off between gene presence and absence or divergence, and shown that a simple dynamic approach using a kernel density estimator performed better than both established, as well as a more sophisticated mixture modelling technique. We have also shown that current methods commonly used for CGH microarray analysis in tumour and cancer cell lines are not appropriate for analysing our data. Conclusion After carrying out the analysis and validation for three sequenced Escherichia coli strains, CGH microarray data from 19 E. coli O157 pathogenic test strains were used to demonstrate the benefits of applying this simple and robust process to CGH microarray studies using bacterial genomes. PMID:18230148

  16. SPANDx: a genomics pipeline for comparative analysis of large haploid whole genome re-sequencing datasets.

    PubMed

    Sarovich, Derek S; Price, Erin P

    2014-09-08

    Next-generation sequencing (NGS) is now a commonplace tool for molecular characterisation of virtually any species of interest. Despite the ever-increasing use of NGS in laboratories worldwide, analysis of whole genome re-sequencing (WGS) datasets from start to finish remains nontrivial due to the fragmented nature of NGS software and the lack of experienced bioinformaticists in many research teams. We describe SPANDx (Synergised Pipeline for Analysis of NGS Data in Linux), a new tool for high-throughput comparative analysis of haploid WGS datasets comprising one through thousands of genomes. SPANDx consolidates several well-validated, open-source packages into a single tool, mitigating the need to learn and manipulate individual NGS programs. SPANDx incorporates BWA for alignment of raw NGS reads against a reference genome or pan-genome, followed by data filtering, variant calling and annotation using Picard, GATK, SAMtools and SnpEff. BEDTools has also been included for genetic locus presence/absence (P/A) determination to easily visualise the core and accessory genomes. Additional SPANDx features include construction of error-corrected single-nucleotide polymorphism (SNP) and insertion-deletion matrices, and P/A matrices, to enable user-friendly visualisation of genetic variants. The SNP matrices generated using VCFtools and GATK are directly importable into PAUP*, PHYLIP or RAxML for downstream phylogenetic analysis. SPANDx has been developed to handle NGS data from Illumina, Ion Personal Genome Machine (PGM) and 454 platforms, and we demonstrate that it has comparable performance across Illumina MiSeq/HiSeq2000 and Ion PGM data. SPANDx is an all-in-one tool for comprehensive haploid WGS analysis. SPANDx is open source and is freely available at: http://sourceforge.net/projects/spandx/.

  17. Interacting networks of resistance, virulence and core machinery genes identified by genome-wide epistasis analysis

    PubMed Central

    Pesonen, Maiju; Musser, James M.; Bentley, Stephen D.; Aurell, Erik; Corander, Jukka

    2017-01-01

    Recent advances in the scale and diversity of population genomic datasets for bacteria now provide the potential for genome-wide patterns of co-evolution to be studied at the resolution of individual bases. Here we describe a new statistical method, genomeDCA, which uses recent advances in computational structural biology to identify the polymorphic loci under the strongest co-evolutionary pressures. We apply genomeDCA to two large population data sets representing the major human pathogens Streptococcus pneumoniae (pneumococcus) and Streptococcus pyogenes (group A Streptococcus). For pneumococcus we identified 5,199 putative epistatic interactions between 1,936 sites. Over three-quarters of the links were between sites within the pbp2x, pbp1a and pbp2b genes, the sequences of which are critical in determining non-susceptibility to beta-lactam antibiotics. A network-based analysis found these genes were also coupled to that encoding dihydrofolate reductase, changes to which underlie trimethoprim resistance. Distinct from these antibiotic resistance genes, a large network component of 384 protein coding sequences encompassed many genes critical in basic cellular functions, while another distinct component included genes associated with virulence. The group A Streptococcus (GAS) data set population represents a clonal population with relatively little genetic variation and a high level of linkage disequilibrium across the genome. Despite this, we were able to pinpoint two RNA pseudouridine synthases, which were each strongly linked to a separate set of loci across the chromosome, representing biologically plausible targets of co-selection. The population genomic analysis method applied here identifies statistically significantly co-evolving locus pairs, potentially arising from fitness selection interdependence reflecting underlying protein-protein interactions, or genes whose product activities contribute to the same phenotype. This discovery approach greatly

  18. Interacting networks of resistance, virulence and core machinery genes identified by genome-wide epistasis analysis.

    PubMed

    Skwark, Marcin J; Croucher, Nicholas J; Puranen, Santeri; Chewapreecha, Claire; Pesonen, Maiju; Xu, Ying Ying; Turner, Paul; Harris, Simon R; Beres, Stephen B; Musser, James M; Parkhill, Julian; Bentley, Stephen D; Aurell, Erik; Corander, Jukka

    2017-02-01

    Recent advances in the scale and diversity of population genomic datasets for bacteria now provide the potential for genome-wide patterns of co-evolution to be studied at the resolution of individual bases. Here we describe a new statistical method, genomeDCA, which uses recent advances in computational structural biology to identify the polymorphic loci under the strongest co-evolutionary pressures. We apply genomeDCA to two large population data sets representing the major human pathogens Streptococcus pneumoniae (pneumococcus) and Streptococcus pyogenes (group A Streptococcus). For pneumococcus we identified 5,199 putative epistatic interactions between 1,936 sites. Over three-quarters of the links were between sites within the pbp2x, pbp1a and pbp2b genes, the sequences of which are critical in determining non-susceptibility to beta-lactam antibiotics. A network-based analysis found these genes were also coupled to that encoding dihydrofolate reductase, changes to which underlie trimethoprim resistance. Distinct from these antibiotic resistance genes, a large network component of 384 protein coding sequences encompassed many genes critical in basic cellular functions, while another distinct component included genes associated with virulence. The group A Streptococcus (GAS) data set population represents a clonal population with relatively little genetic variation and a high level of linkage disequilibrium across the genome. Despite this, we were able to pinpoint two RNA pseudouridine synthases, which were each strongly linked to a separate set of loci across the chromosome, representing biologically plausible targets of co-selection. The population genomic analysis method applied here identifies statistically significantly co-evolving locus pairs, potentially arising from fitness selection interdependence reflecting underlying protein-protein interactions, or genes whose product activities contribute to the same phenotype. This discovery approach greatly

  19. Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes.

    PubMed

    Riechmann, J L; Heard, J; Martin, G; Reuber, L; Jiang, C; Keddie, J; Adam, L; Pineda, O; Ratcliffe, O J; Samaha, R R; Creelman, R; Pilgrim, M; Broun, P; Zhang, J Z; Ghandehari, D; Sherman, B K; Yu, G

    2000-12-15

    The completion of the Arabidopsis thaliana genome sequence allows a comparative analysis of transcriptional regulators across the three eukaryotic kingdoms. Arabidopsis dedicates over 5% of its genome to code for more than 1500 transcription factors, about 45% of which are from families specific to plants. Arabidopsis transcription factors that belong to families common to all eukaryotes do not share significant similarity with those of the other kingdoms beyond the conserved DNA binding domains, many of which have been arranged in combinations specific to each lineage. The genome-wide comparison reveals the evolutionary generation of diversity in the regulation of transcription.

  20. The dog genome: survey sequencing and comparative analysis.

    PubMed

    Kirkness, Ewen F; Bafna, Vineet; Halpern, Aaron L; Levy, Samuel; Remington, Karin; Rusch, Douglas B; Delcher, Arthur L; Pop, Mihai; Wang, Wei; Fraser, Claire M; Venter, J Craig

    2003-09-26

    A survey of the dog genome sequence (6.22 million sequence reads; 1.5x coverage) demonstrates the power of sample sequencing for comparative analysis of mammalian genomes and the generation of species-specific resources. More than 650 million base pairs (>25%) of dog sequence align uniquely to the human genome, including fragments of putative orthologs for 18,473 of 24,567 annotated human genes. Mutation rates, conserved synteny, repeat content, and phylogeny can be compared among human, mouse, and dog. A variety of polymorphic elements are identified that will be valuable for mapping the genetic basis of diseases and traits in the dog.

  1. Private genome analysis through homomorphic encryption

    PubMed Central

    2015-01-01

    Background The rapid development of genome sequencing technology allows researchers to access large genome datasets. However, outsourcing the data processing o the cloud poses high risks for personal privacy. The aim of this paper is to give a practical solution for this problem using homomorphic encryption. In our approach, all the computations can be performed in an untrusted cloud without requiring the decryption key or any interaction with the data owner, which preserves the privacy of genome data. Methods We present evaluation algorithms for secure computation of the minor allele frequencies and χ2 statistic in a genome-wide association studies setting. We also describe how to privately compute the Hamming distance and approximate Edit distance between encrypted DNA sequences. Finally, we compare performance details of using two practical homomorphic encryption schemes - the BGV scheme by Gentry, Halevi and Smart and the YASHE scheme by Bos, Lauter, Loftus and Naehrig. Results The approach with the YASHE scheme analyzes data from 400 people within about 2 seconds and picks a variant associated with disease from 311 spots. For another task, using the BGV scheme, it took about 65 seconds to securely compute the approximate Edit distance for DNA sequences of size 5K and figure out the differences between them. Conclusions The performance numbers for BGV are better than YASHE when homomorphically evaluating deep circuits (like the Hamming distance algorithm or approximate Edit distance algorithm). On the other hand, it is more efficient to use the YASHE scheme for a low-degree computation, such as minor allele frequencies or χ2 test statistic in a case-control study. PMID:26733152

  2. Comparative Genomics via Wavelet Analysis for Closely Related Bacteria

    NASA Astrophysics Data System (ADS)

    Song, Jiuzhou; Ware, Tony; Liu, Shu-Lin; Surette, M.

    2004-12-01

    Comparative genomics has been a valuable method for extracting and extrapolating genome information among closely related bacteria. The efficiency of the traditional methods is extremely influenced by the software method used. To overcome the problem here, we propose using wavelet analysis to perform comparative genomics. First, global comparison using wavelet analysis gives the difference at a quantitative level. Then local comparison using keto-excess or purine-excess plots shows precise positions of inversions, translocations, and horizontally transferred DNA fragments. We firstly found that the level of energy spectra difference is related to the similarity of bacteria strains; it could be a quantitative index to describe the similarities of genomes. The strategy is described in detail by comparisons of closely related strains: S.typhi CT18, S.typhi Ty2, S.typhimurium LT2, H.pylori 26695, and H.pylori J99.

  3. Genomic analysis of stayability in Nellore cattle.

    PubMed

    Barreto Amaral Teixeira, Daniela; Alves Fernandes Júnior, Gerardo; Beraldo Dos Santos Silva, Danielly; Bermal Costa, Raphael; Takada, Luciana; Gustavo Mansan Gordo, Daniel; Bresolin, Tiago; Carvalheiro, Roberto; Baldi, Fernando; Galvão de Albuquerque, Lucia

    2017-01-01

    Stayability, which can be defined as the probability of a cow calving at a certain age when given the opportunity, is an important reproductive trait in beef cattle because it is directly related to herd profitability. The objective of this study was to estimate genetic parameters and to identify possible genomic regions associated with the phenotypic expression of stayability in Nellore cows. The variance components were estimated by Bayesian inference using a threshold animal model that included the systematic effects of contemporary group and sexual precocity and the random effects of animal and residual. The SNP effects were estimated by the single-step genomic BLUP method using information of 2,838 animals (2,020 females and 930 sires) genotyped with the Illumina High-Density BeadChip Array (San Diego, CA, USA). The variance explained by windows formed by 200 consecutive SNPs was used to identify genomic regions of largest effect on the expression of stayability. The heritability was 0.11 ± 0.01 when A matrix (pedigree) was used and 0.14 ± 0.01 when H matrix (relationship matrix that combines pedigree information and SNP data) was used. A total of 147 candidate genes for stayability were identified on chromosomes 1, 2, 5, 6, 9 and 20 and on the X chromosome. New candidate regions for stayability were detected, most of them related to reproductive, immunological and central nervous system functions.

  4. Analysis of the Choristoneura fumiferana nucleopolyhedrovirus genome.

    PubMed

    de Jong, Jondavid G; Lauzon, Hilary A M; Dominy, Cliff; Poloumienko, Arkadi; Carstens, Eric B; Arif, Basil M; Krell, Peter J

    2005-04-01

    The double-stranded DNA genome of Choristoneura fumiferana nucleopolyhedrovirus (CfMNPV) was sequenced and analysed in the context of other group I nucleopolyhedroviruses (NPVs). The genome consists of 129,593 bp with a G + C content of 50.1 mol%. A total of 146 open reading frames (ORFs) of greater than 150 bp, and with no or minimal overlap were identified. In addition, five homologous regions were identified containing 7-10 repeats of a 36 bp imperfect palindromic core. Comparison with other completely sequenced baculovirus genomes revealed that 139 of the CfMNPV ORFs have homologues in at least one other baculovirus and seven ORFs are unique to CfMNPV. Of the 117 CfMNPV ORFs common to all group I NPVs, 12 are exclusive to group I NPVs. Overall, CfMNPV is most similar to Orgyia pseudotsugata MNPV based on gene content, arrangement and overall amino acid identity. Unlike other group I baculoviruses, however, CfMNPV encodes a viral enhancing factor (vef) and has two copies of p26.

  5. Genomic analysis of stayability in Nellore cattle

    PubMed Central

    Barreto Amaral Teixeira, Daniela; Beraldo dos Santos Silva, Danielly; Bermal Costa, Raphael; Takada, Luciana; Gustavo Mansan Gordo, Daniel; Bresolin, Tiago; Carvalheiro, Roberto; Baldi, Fernando; Galvão de Albuquerque, Lucia

    2017-01-01

    Stayability, which can be defined as the probability of a cow calving at a certain age when given the opportunity, is an important reproductive trait in beef cattle because it is directly related to herd profitability. The objective of this study was to estimate genetic parameters and to identify possible genomic regions associated with the phenotypic expression of stayability in Nellore cows. The variance components were estimated by Bayesian inference using a threshold animal model that included the systematic effects of contemporary group and sexual precocity and the random effects of animal and residual. The SNP effects were estimated by the single-step genomic BLUP method using information of 2,838 animals (2,020 females and 930 sires) genotyped with the Illumina High-Density BeadChip Array (San Diego, CA, USA). The variance explained by windows formed by 200 consecutive SNPs was used to identify genomic regions of largest effect on the expression of stayability. The heritability was 0.11 ± 0.01 when A matrix (pedigree) was used and 0.14 ± 0.01 when H matrix (relationship matrix that combines pedigree information and SNP data) was used. A total of 147 candidate genes for stayability were identified on chromosomes 1, 2, 5, 6, 9 and 20 and on the X chromosome. New candidate regions for stayability were detected, most of them related to reproductive, immunological and central nervous system functions. PMID:28591167

  6. Analysis of three leafminers' complete mitochondrial genomes.

    PubMed

    Yang, Fei; Du, Yuzhou; Cao, Jingman; Huang, Fangneng

    2013-10-15

    Liriomyza trifolii (Burgess), Liriomyza huidobrensis (Blanchard), and Liriomyza bryoniae (Kaltenbach), are three closely related and economically important leafminer pests in the world. This study examined the complete mitochondrial genomes of L. trifolii, L. huidobrensis and L. bryoniae, which were 16,141 bp, 16,236 bp and 16,183 bp in length, respectively. All of them displayed 37 typical animal mitochondrial genes and an A+T-rich region. The genomes were highly compact with only 60-68 bp of non-coding intergenic spacer. However, considerable differences in the A+T-rich region were detected among the three species. Results of this study also showed the two ribosomal RNA genes of the three species had very limited variable sites and thus should not provide much information in the study of population genetics of these species. Data generated from three leafminers' complete mitochondrial genomes should provide valuable information in studying phylogeny of Diptera, and developing genetic markers for species identification in leafminers.

  7. Using comparative genome analysis to identify problems in annotated microbial genomes.

    PubMed

    Poptsova, Maria S; Gogarten, J Peter

    2010-07-01

    Genome annotation is a tedious task that is mostly done by automated methods; however, the accuracy of these approaches has been questioned since the beginning of the sequencing era. Genome annotation is a multilevel process, and errors can emerge at different stages: during sequencing, as a result of gene-calling procedures, and in the process of assigning gene functions. Missed or wrongly annotated genes differentially impact different types of analyses. Here we discuss and demonstrate how the methods of comparative genome analysis can refine annotations by locating missing orthologues. We also discuss possible reasons for errors and show that the second-generation annotation systems, which combine multiple gene-calling programs with similarity-based methods, perform much better than the first annotation tools. Since old errors may propagate to the newly sequenced genomes, we emphasize that the problem of continuously updating popular public databases is an urgent and unresolved one. Due to the progress in genome-sequencing technologies, automated annotation techniques will remain the main approach in the future. Researchers need to be aware of the existing errors in the annotation of even well-studied genomes, such as Escherichia coli, and consider additional quality control for their results.

  8. Population Genomics: Whole-Genome Analysis of Polymorphism and Divergence in Drosophila simulans

    PubMed Central

    Begun, David J; Holloway, Alisha K; Stevens, Kristian; Hillier, LaDeana W; Poh, Yu-Ping; Hahn, Matthew W; Nista, Phillip M; Jones, Corbin D; Kern, Andrew D; Dewey, Colin N; Pachter, Lior; Myers, Eugene; Langley, Charles H

    2007-01-01

    The population genetic perspective is that the processes shaping genomic variation can be revealed only through simultaneous investigation of sequence polymorphism and divergence within and between closely related species. Here we present a population genetic analysis of Drosophila simulans based on whole-genome shotgun sequencing of multiple inbred lines and comparison of the resulting data to genome assemblies of the closely related species, D. melanogaster and D. yakuba. We discovered previously unknown, large-scale fluctuations of polymorphism and divergence along chromosome arms, and significantly less polymorphism and faster divergence on the X chromosome. We generated a comprehensive list of functional elements in the D. simulans genome influenced by adaptive evolution. Finally, we characterized genomic patterns of base composition for coding and noncoding sequence. These results suggest several new hypotheses regarding the genetic and biological mechanisms controlling polymorphism and divergence across the Drosophila genome, and provide a rich resource for the investigation of adaptive evolution and functional variation in D. simulans. PMID:17988176

  9. A Mitochondrial Genome of Rhyparochromidae (Hemiptera: Heteroptera) and a Comparative Analysis of Related Mitochondrial Genomes

    PubMed Central

    Li, Teng; Yang, Jie; Li, Yinwan; Cui, Ying; Xie, Qiang; Bu, Wenjun; Hillis, David M.

    2016-01-01

    The Rhyparochromidae, the largest family of Lygaeoidea, encompasses more than 1,850 described species, but no mitochondrial genome has been sequenced to date. Here we describe the first mitochondrial genome for Rhyparochromidae: a complete mitochondrial genome of Panaorus albomaculatus (Scott, 1874). This mitochondrial genome is comprised of 16,345 bp, and contains the expected 37 genes and control region. The majority of the control region is made up of a large tandem-repeat region, which has a novel pattern not previously observed in other insects. The tandem-repeats region of P. albomaculatus consists of 53 tandem duplications (including one partial repeat), which is the largest number of tandem repeats among all the known insect mitochondrial genomes. Slipped-strand mispairing during replication is likely to have generated this novel pattern of tandem repeats. Comparative analysis of tRNA gene families in sequenced Pentatomomorpha and Lygaeoidea species shows that the pattern of nucleotide conservation is markedly higher on the J-strand. Phylogenetic reconstruction based on mitochondrial genomes suggests that Rhyparochromidae is not the sister group to all the remaining Lygaeoidea, and supports the monophyly of Lygaeoidea. PMID:27756915

  10. Differentiating between monozygotic twins through DNA methylation-specific high-resolution melt curve analysis.

    PubMed

    Stewart, Leander; Evans, Neil; Bexon, Kimberley J; van der Meer, Dieudonne J; Williams, Graham A

    2015-05-01

    Although short tandem repeat profiling is extremely powerful in identifying individuals from crime scene stains, it is unable to differentiate between monozygotic (MZ) twins. Efforts to address this include mutation analysis through whole genome sequencing and through DNA methylation studies. Methylation of DNA is affected by environmental factors; thus, as MZ twins age, their DNA methylation patterns change. This can be characterized by bisulfite treatment followed by pyrosequencing. However, this can be time-consuming and expensive; thus, it is unlikely to be widely used by investigators. If the sequences are different, then in theory the melting temperature should be different. Thus, the aim of this study was to assess whether high-resolution melt curve analysis can be used to differentiate between MZ twins. Five sets of MZ twins provided buccal swabs that underwent extraction, quantification, bisulfite treatment, polymerase chain reaction amplification and high-resolution melting curve analysis targeting two markers, Alu-E2F3 and Alu-SP. Significant differences were observed between all MZ twins targeting Alu-E2F3 and in four of five MZ twins targeting Alu-SP (P<0.05). Thus, it has been demonstrated that bisulfite treatment followed by high-resolution melting curve analysis could be used to differentiate between MZ twins.

  11. The genome sequence of Blochmannia floridanus: Comparative analysis of reduced genomes

    PubMed Central

    Gil, Rosario; Silva, Francisco J.; Zientz, Evelyn; Delmotte, François; González-Candelas, Fernando; Latorre, Amparo; Rausell, Carolina; Kamerbeek, Judith; Gadau, Jürgen; Hölldobler, Bert; van Ham, Roeland C. H. J.; Gross, Roy; Moya, Andrés

    2003-01-01

    Bacterial symbioses are widespread among insects, probably being one of the key factors of their evolutionary success. We present the complete genome sequence of Blochmannia floridanus, the primary endosymbiont of carpenter ants. Although these ants feed on a complex diet, this symbiosis very likely has a nutritional basis: Blochmannia is able to supply nitrogen and sulfur compounds to the host while it takes advantage of the host metabolic machinery. Remarkably, these bacteria lack all known genes involved in replication initiation (dnaA, priA, and recA). The phylogenetic analysis of a set of conserved protein-coding genes shows that Bl. floridanus is phylogenetically related to Buchnera aphidicola and Wigglesworthia glossinidia, the other endosymbiotic bacteria whose complete genomes have been sequenced so far. Comparative analysis of the five known genomes from insect endosymbiotic bacteria reveals they share only 313 genes, a number that may be close to the minimum gene set necessary to sustain endosymbiotic life. PMID:12886019

  12. Mapping DNA-protein interactions in large genomes by sequence tag analysis of genomic enrichment.

    PubMed

    Kim, Jonghwan; Bhinge, Akshay A; Morgan, Xochitl C; Iyer, Vishwanath R

    2005-01-01

    Identifying the chromosomal targets of transcription factors is important for reconstructing the transcriptional regulatory networks underlying global gene expression programs. We have developed an unbiased genomic method called sequence tag analysis of genomic enrichment (STAGE) to identify the direct binding targets of transcription factors in vivo. STAGE is based on high-throughput sequencing of concatemerized tags derived from target DNA enriched by chromatin immunoprecipitation. We first used STAGE in yeast to confirm that RNA polymerase III genes are the most prominent targets of the TATA-box binding protein. We optimized the STAGE protocol and developed analysis methods to allow the identification of transcription factor targets in human cells. We used STAGE to identify several previously unknown binding targets of human transcription factor E2F4 that we independently validated by promoter-specific PCR and microarray hybridization. STAGE provides a means of identifying the chromosomal targets of DNA-associated proteins in any sequenced genome.

  13. Microevolutionary analysis of Clostridium difficile genomes to investigate transmission

    PubMed Central

    2012-01-01

    Background The control of Clostridium difficile infection is a major international healthcare priority, hindered by a limited understanding of transmission epidemiology for these bacteria. However, transmission studies of bacterial pathogens are rapidly being transformed by the advent of next generation sequencing. Results Here we sequence whole C. difficile genomes from 486 cases arising over four years in Oxfordshire. We show that we can estimate the times back to common ancestors of bacterial lineages with sufficient resolution to distinguish whether direct transmission is plausible or not. Time depths were inferred using a within-host evolutionary rate that we estimated at 1.4 mutations per genome per year based on serially isolated genomes. The subset of plausible transmissions was found to be highly associated with pairs of patients sharing time and space in hospital. Conversely, the large majority of pairs of genomes matched by conventional typing and isolated from patients within a month of each other were too distantly related to be direct transmissions. Conclusions Our results confirm that nosocomial transmission between symptomatic C. difficile cases contributes far less to current rates of infection than has been widely assumed, which clarifies the importance of future research into other transmission routes, such as from asymptomatic carriers. With the costs of DNA sequencing rapidly falling and its use becoming more and more widespread, genomics will revolutionize our understanding of the transmission of bacterial pathogens. PMID:23259504

  14. Toward a Comprehensive Genomic Analysis of Cancer - TCGA

    Cancer.gov

    The National Cancer Institute (NCI) and National Human Genome Research Institute (NHGRI) convened a "Toward a Comprehensive Genomic Analysis of Cancer" workshop in Washington, D.C. This workshop brought together physicians, basic scientists and other members of the U.S. and international cancer communities to assist in outlining the most effective strategies for the development of a successful project. Information about this workshop is reported in the Executive Summary.

  15. Comparative genome analysis of Pseudomonas genomes including Populus-associated isolates

    DOE PAGES

    Jun, Se Ran; Wassenaar, Trudy; Nookaew, Intawat; ...

    2016-01-01

    The Pseudomonas genus contains a metabolically versatile group of organisms that are known to occupy numerous ecological niches including the rhizosphere and endosphere of many plants influencing phylogenetic diversity and heterogeneity. In this study, comparative genome analysis was performed on over one thousand Pseudomonas genomes, including 21 Pseudomonas strains isolated from the roots of native Populus deltoides. Based on average amino acid identity, genomic clusters were identified within the Pseudomonas genus, which showed agreements with clades by NCBI and cliques by IMG. The P. fluorescens group was organized into 20 distinct genomic clusters, representing enormous diversity and heterogeneity. The speciesmore » P. aeruginosa showed clear distinction in their genomic relatedness compared to other Pseudomonas species groups based on the pan and core genome analysis. The 19 isolates of our 21 Populus-associated isolates formed three distinct subgroups within the P. fluorescens major group, supported by pathway profiles analysis, while two isolates were more closely related to P. chlororaphis and P. putida. The specific genes to Populus-associated subgroups were identified where genes specific to subgroup 1 include several sensory systems such as proteins which act in two-component signal transduction, a TonB-dependent receptor, and a phosphorelay sensor; specific genes to subgroup 2 contain unique hypothetical genes; and genes specific to subgroup 3 organisms have a different hydrolase activity. IMPORTANCE The comparative genome analyses of the genus Pseudomonas that included Populus-associated isolates resulted in novel insights into high diversity of Pseudomonas. Consistent and robust genomic clusters with phylogenetic homogeneity were identified, which resolved species-clades that are not clearly defined by 16S rRNA gene sequence analysis alone. The genomic clusters may be reflective of distinct ecological niches to which the organisms have adapted, but

  16. Comparative genome analysis of Pseudomonas genomes including Populus-associated isolates

    SciTech Connect

    Jun, Se Ran; Wassenaar, Trudy; Nookaew, Intawat; Hauser, Loren John; Wanchai, Visanu; Land, Miriam L.; Timm, Collin M.; Lu, Tse-Yuan S.; Schadt, Christopher Warren; Doktycz, Mitchel John; Pelletier, Dale A; Ussery, David W

    2016-01-01

    The Pseudomonas genus contains a metabolically versatile group of organisms that are known to occupy numerous ecological niches including the rhizosphere and endosphere of many plants influencing phylogenetic diversity and heterogeneity. In this study, comparative genome analysis was performed on over one thousand Pseudomonas genomes, including 21 Pseudomonas strains isolated from the roots of native Populus deltoides. Based on average amino acid identity, genomic clusters were identified within the Pseudomonas genus, which showed agreements with clades by NCBI and cliques by IMG. The P. fluorescens group was organized into 20 distinct genomic clusters, representing enormous diversity and heterogeneity. The species P. aeruginosa showed clear distinction in their genomic relatedness compared to other Pseudomonas species groups based on the pan and core genome analysis. The 19 isolates of our 21 Populus-associated isolates formed three distinct subgroups within the P. fluorescens major group, supported by pathway profiles analysis, while two isolates were more closely related to P. chlororaphis and P. putida. The specific genes to Populus-associated subgroups were identified where genes specific to subgroup 1 include several sensory systems such as proteins which act in two-component signal transduction, a TonB-dependent receptor, and a phosphorelay sensor; specific genes to subgroup 2 contain unique hypothetical genes; and genes specific to subgroup 3 organisms have a different hydrolase activity. IMPORTANCE The comparative genome analyses of the genus Pseudomonas that included Populus-associated isolates resulted in novel insights into high diversity of Pseudomonas. Consistent and robust genomic clusters with phylogenetic homogeneity were identified, which resolved species-clades that are not clearly defined by 16S rRNA gene sequence analysis alone. The genomic clusters may be reflective of distinct ecological niches to which the organisms have adapted, but this

  17. Features of genomic organization in a nucleotide-resolution molecular model of the Escherichia coli chromosome.

    PubMed

    Hacker, William C; Li, Shuxiang; Elcock, Adrian H

    2017-07-27

    We describe structural models of the Escherichia coli chromosome in which the positions of all 4.6 million nucleotides of each DNA strand are resolved. Models consistent with two basic chromosomal orientations, differing in their positioning of the origin of replication, have been constructed. In both types of model, the chromosome is partitioned into plectoneme-abundant and plectoneme-free regions, with plectoneme lengths and branching patterns matching experimental distributions, and with spatial distributions of highly-transcribed chromosomal regions matching recent experimental measurements of the distribution of RNA polymerases. Physical analysis of the models indicates that the effective persistence length of the DNA and relative contributions of twist and writhe to the chromosome's negative supercoiling are in good correspondence with experimental estimates. The models exhibit characteristics similar to those of 'fractal globules,' and even the most genomically-distant parts of the chromosome can be physically connected, through paths combining linear diffusion and inter-segmental transfer, by an average of only ∼10 000 bp. Finally, macrodomain structures and the spatial distributions of co-expressed genes are analyzed: the latter are shown to depend strongly on the overall orientation of the chromosome. We anticipate that the models will prove useful in exploring other static and dynamic features of the bacterial chromosome. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  18. Analysis tools for the interplay between genome layout and regulation.

    PubMed

    Bouyioukos, Costas; Elati, Mohamed; Képès, François

    2016-06-06

    Genome layout and gene regulation appear to be interdependent. Understanding this interdependence is key to exploring the dynamic nature of chromosome conformation and to engineering functional genomes. Evidence for non-random genome layout, defined as the relative positioning of either co-functional or co-regulated genes, stems from two main approaches. Firstly, the analysis of contiguous genome segments across species, has highlighted the conservation of gene arrangement (synteny) along chromosomal regions. Secondly, the study of long-range interactions along a chromosome has emphasised regularities in the positioning of microbial genes that are co-regulated, co-expressed or evolutionarily correlated. While one-dimensional pattern analysis is a mature field, it is often powerless on biological datasets which tend to be incomplete, and partly incorrect. Moreover, there is a lack of comprehensive, user-friendly tools to systematically analyse, visualise, integrate and exploit regularities along genomes. Here we present the Genome REgulatory and Architecture Tools SCAN (GREAT:SCAN) software for the systematic study of the interplay between genome layout and gene expression regulation. SCAN is a collection of related and interconnected applications currently able to perform systematic analyses of genome regularities as well as to improve transcription factor binding sites (TFBS) and gene regulatory network predictions based on gene positional information. We demonstrate the capabilities of these tools by studying on one hand the regular patterns of genome layout in the major regulons of the bacterium Escherichia coli. On the other hand, we demonstrate the capabilities to improve TFBS prediction in microbes. Finally, we highlight, by visualisation of multivariate techniques, the interplay between position and sequence information for effective transcription regulation.

  19. Genomic characteristics and comparative genomics analysis of Penicillium chrysogenum KF-25

    PubMed Central

    2014-01-01

    Background Penicillium chrysogenum has been used in producing penicillin and derived β-lactam antibiotics for many years. Although the genome of the mutant strain P. chrysogenum Wisconsin 54-1255 has already been sequenced, the versatility and genetic diversity of this species still needs to be intensively studied. In this study, the genome of the wild-type P. chrysogenum strain KF-25, which has high activity against Ustilaginoidea virens, was sequenced and characterized. Results The genome of KF-25 was about 29.9 Mb in size and contained 9,804 putative open reading frames (orfs). Thirteen genes were predicted to encode two-component system proteins, of which six were putatively involved in osmolarity adaption. There were 33 putative secondary metabolism pathways and numerous genes that were essential in metabolite biosynthesis. Several P. chrysogenum virus untranslated region sequences were found in the KF-25 genome, suggesting that there might be a relationship between the virus and P. chrysogenum in evolution. Comparative genome analysis showed that the genomes of KF-25 and Wisconsin 54-1255 were highly similar, except that KF-25 was 2.3 Mb smaller. Three hundred and fifty-five KF-25 specific genes were found and the biological functions of the proteins encoded by these genes were mainly unknown (232, representing 65%), except for some orfs encoding proteins with predicted functions in transport, metabolism, and signal transduction. Numerous KF-25-specific genes were found to be associated with the pathogenicity and virulence of the strains, which were identical to those of wild-type P. chrysogenum NRRL 1951. Conclusion Genome sequencing and comparative analysis are helpful in further understanding the biology, evolution, and environment adaption of P. chrysogenum, and provide a new tool for identifying further functional metabolites. PMID:24555742

  20. Diversity of Pseudomonas Genomes, Including Populus-Associated Isolates, as Revealed by Comparative Genome Analysis.

    PubMed

    Jun, Se-Ran; Wassenaar, Trudy M; Nookaew, Intawat; Hauser, Loren; Wanchai, Visanu; Land, Miriam; Timm, Collin M; Lu, Tse-Yuan S; Schadt, Christopher W; Doktycz, Mitchel J; Pelletier, Dale A; Ussery, David W

    2015-10-30

    The Pseudomonas genus contains a metabolically versatile group of organisms that are known to occupy numerous ecological niches, including the rhizosphere and endosphere of many plants. Their diversity influences the phylogenetic diversity and heterogeneity of these communities. On the basis of average amino acid identity, comparative genome analysis of >1,000 Pseudomonas genomes, including 21 Pseudomonas strains isolated from the roots of native Populus deltoides (eastern cottonwood) trees resulted in consistent and robust genomic clusters with phylogenetic homogeneity. All Pseudomonas aeruginosa genomes clustered together, and these were clearly distinct from other Pseudomonas species groups on the basis of pangenome and core genome analyses. In contrast, the genomes of Pseudomonas fluorescens were organized into 20 distinct genomic clusters, representing enormous diversity and heterogeneity. Most of our 21 Populus-associated isolates formed three distinct subgroups within the major P. fluorescens group, supported by pathway profile analysis, while two isolates were more closely related to Pseudomonas chlororaphis and Pseudomonas putida. Genes specific to Populus-associated subgroups were identified. Genes specific to subgroup 1 include several sensory systems that act in two-component signal transduction, a TonB-dependent receptor, and a phosphorelay sensor. Genes specific to subgroup 2 contain hypothetical genes, and genes specific to subgroup 3 were annotated with hydrolase activity. This study justifies the need to sequence multiple isolates, especially from P. fluorescens, which displays the most genetic variation, in order to study functional capabilities from a pangenomic perspective. This information will prove useful when choosing Pseudomonas strains for use to promote growth and increase disease resistance in plants.

  1. Genomic characteristics and comparative genomics analysis of Penicillium chrysogenum KF-25.

    PubMed

    Peng, Qin; Yuan, Yihui; Gao, Meiying; Chen, Xupeng; Liu, Biao; Liu, Pengming; Wu, Yan; Wu, Dandan

    2014-02-21

    Penicillium chrysogenum has been used in producing penicillin and derived β-lactam antibiotics for many years. Although the genome of the mutant strain P. chrysogenum Wisconsin 54-1255 has already been sequenced, the versatility and genetic diversity of this species still needs to be intensively studied. In this study, the genome of the wild-type P. chrysogenum strain KF-25, which has high activity against Ustilaginoidea virens, was sequenced and characterized. The genome of KF-25 was about 29.9 Mb in size and contained 9,804 putative open reading frames (orfs). Thirteen genes were predicted to encode two-component system proteins, of which six were putatively involved in osmolarity adaption. There were 33 putative secondary metabolism pathways and numerous genes that were essential in metabolite biosynthesis. Several P. chrysogenum virus untranslated region sequences were found in the KF-25 genome, suggesting that there might be a relationship between the virus and P. chrysogenum in evolution. Comparative genome analysis showed that the genomes of KF-25 and Wisconsin 54-1255 were highly similar, except that KF-25 was 2.3 Mb smaller. Three hundred and fifty-five KF-25 specific genes were found and the biological functions of the proteins encoded by these genes were mainly unknown (232, representing 65%), except for some orfs encoding proteins with predicted functions in transport, metabolism, and signal transduction. Numerous KF-25-specific genes were found to be associated with the pathogenicity and virulence of the strains, which were identical to those of wild-type P. chrysogenum NRRL 1951. Genome sequencing and comparative analysis are helpful in further understanding the biology, evolution, and environment adaption of P. chrysogenum, and provide a new tool for identifying further functional metabolites.

  2. Genome Sequence and Comparative Genome Analysis of Lactobacillus casei: Insights into Their Niche-Associated Evolution

    PubMed Central

    Cai, Hui; Thompson, Rebecca; Budinich, Mateo F.; Broadbent, Jeff R.

    2009-01-01

    Lactobacillus casei is remarkably adaptable to diverse habitats and widely used in the food industry. To reveal the genomic features that contribute to its broad ecological adaptability and examine the evolution of the species, the genome sequence of L. casei ATCC 334 is analyzed and compared with other sequenced lactobacilli. This analysis reveals that ATCC 334 contains a high number of coding sequences involved in carbohydrate utilization and transcriptional regulation, reflecting its requirement for dealing with diverse environmental conditions. A comparison of the genome sequences of ATCC 334 to L. casei BL23 reveals 12 and 19 genomic islands, respectively. For a broader assessment of the genetic variability within L. casei, gene content of 21 L. casei strains isolated from various habitats (cheeses, n = 7; plant materials, n = 8; and human sources, n = 6) was examined by comparative genome hybridization with an ATCC 334-based microarray. This analysis resulted in identification of 25 hypervariable regions. One of these regions contains an overrepresentation of genes involved in carbohydrate utilization and transcriptional regulation and was thus proposed as a lifestyle adaptation island. Differences in L. casei genome inventory reveal both gene gain and gene decay. Gene gain, via acquisition of genomic islands, likely confers a fitness benefit in specific habitats. Gene decay, that is, loss of unnecessary ancestral traits, is observed in the cheese isolates and likely results in enhanced fitness in the dairy niche. This study gives the first picture of the stable versus variable regions in L. casei and provides valuable insights into evolution, lifestyle adaptation, and metabolic diversity of L. casei. PMID:20333194

  3. A novel statistic for genome-wide interaction analysis.

    PubMed

    Wu, Xuesen; Dong, Hua; Luo, Li; Zhu, Yun; Peng, Gang; Reveille, John D; Xiong, Momiao

    2010-09-23

    Although great progress in genome-wide association studies (GWAS) has been made, the significant SNP associations identified by GWAS account for only a few percent of the genetic variance, leading many to question where and how we can find the missing heritability. There is increasing interest in genome-wide interaction analysis as a possible source of finding heritability unexplained by current GWAS. However, the existing statistics for testing interaction have low power for genome-wide interaction analysis. To meet challenges raised by genome-wide interactional analysis, we have developed a novel statistic for testing interaction between two loci (either linked or unlinked). The null distribution and the type I error rates of the new statistic for testing interaction are validated using simulations. Extensive power studies show that the developed statistic has much higher power to detect interaction than classical logistic regression. The results identified 44 and 211 pairs of SNPs showing significant evidence of interactions with FDR<0.001 and 0.001genome-wide interaction analysis is a valuable tool for finding remaining missing heritability unexplained by the current GWAS, and the developed novel statistic is able to search significant interaction between SNPs across the genome. Real data analysis showed that the results of genome-wide interaction analysis can be replicated in two independent studies.

  4. Array comparative genomic hybridization analysis of Trichoderma reesei strains with enhanced cellulase production properties

    PubMed Central

    2010-01-01

    Background Trichoderma reesei is the main industrial producer of cellulases and hemicellulases that are used to depolymerize biomass in a variety of biotechnical applications. Many of the production strains currently in use have been generated by classical mutagenesis. In this study we characterized genomic alterations in high-producing mutants of T. reesei by high-resolution array comparative genomic hybridization (aCGH). Our aim was to obtain genome-wide information which could be utilized for better understanding of the mechanisms underlying efficient cellulase production, and would enable targeted genetic engineering for improved production of proteins in general. Results We carried out an aCGH analysis of four high-producing strains (QM9123, QM9414, NG14 and Rut-C30) using the natural isolate QM6a as a reference. In QM9123 and QM9414 we detected a total of 44 previously undocumented mutation sites including deletions, chromosomal translocation breakpoints and single nucleotide mutations. In NG14 and Rut-C30 we detected 126 mutations of which 17 were new mutations not documented previously. Among these new mutations are the first chromosomal translocation breakpoints identified in NG14 and Rut-C30. We studied the effects of two deletions identified in Rut-C30 (a deletion of 85 kb in the scaffold 15 and a deletion in a gene encoding a transcription factor) on cellulase production by constructing knock-out strains in the QM6a background. Neither the 85 kb deletion nor the deletion of the transcription factor affected cellulase production. Conclusions aCGH analysis identified dozens of mutations in each strain analyzed. The resolution was at the level of single nucleotide mutation. High-density aCGH is a powerful tool for genome-wide analysis of organisms with small genomes e.g. fungi, especially in studies where a large set of interesting strains is analyzed. PMID:20642838

  5. Analysis of the Core Genome and Pan-Genome of Autotrophic Acetogenic Bacteria

    PubMed Central

    Shin, Jongoh; Song, Yoseb; Jeong, Yujin; Cho, Byung-Kwan

    2016-01-01

    Acetogens are obligate anaerobic bacteria capable of reducing carbon dioxide (CO2) to multicarbon compounds coupled to the oxidation of inorganic substrates, such as hydrogen (H2) or carbon monoxide (CO), via the Wood-Ljungdahl pathway. Owing to the metabolic capability of CO2 fixation, much attention has been focused on understanding the unique pathways associated with acetogens, particularly their metabolic coupling of CO2 fixation to energy conservation. Most known acetogens are phylogenetically and metabolically diverse bacteria present in 23 different bacterial genera. With the increased volume of available genome information, acetogenic bacterial genomes can be analyzed by comparative genome analysis. Even with the genetic diversity that exists among acetogens, the Wood-Ljungdahl pathway, a central metabolic pathway, and cofactor biosynthetic pathways are highly conserved for autotrophic growth. Additionally, comparative genome analysis revealed that most genes in the acetogen-specific core genome were associated with the Wood-Ljungdahl pathway. The conserved enzymes and those predicted as missing can provide insight into biological differences between acetogens and allow for the discovery of promising candidates for industrial applications. PMID:27733845

  6. Tandem Repeat Regions within the Burkholderia pseudomallei Genome and their Application for High-Resolution Genotyping

    DTIC Science & Technology

    2007-03-30

    BioMed CentralBMC Microbiology ssOpen AcceResearch article Tandem repeat regions within the Burkholderia pseudomallei genome and their application...facultative, intracellular bacterium Burkholderia pseudomallei is the causative agent of melioidosis, a serious infectious disease of humans and animals. We...REPORT TYPE N/A 3. DATES COVERED - 4. TITLE AND SUBTITLE Large tandem repeat regions within the Burkholderia pseudomallei genome and their

  7. Pan-genome sequence analysis using Panseq: an online tool for the rapid analysis of core and accessory genomic regions

    PubMed Central

    2010-01-01

    Background The pan-genome of a bacterial species consists of a core and an accessory gene pool. The accessory genome is thought to be an important source of genetic variability in bacterial populations and is gained through lateral gene transfer, allowing subpopulations of bacteria to better adapt to specific niches. Low-cost and high-throughput sequencing platforms have created an exponential increase in genome sequence data and an opportunity to study the pan-genomes of many bacterial species. In this study, we describe a new online pan-genome sequence analysis program, Panseq. Results Panseq was used to identify Escherichia coli O157:H7 and E. coli K-12 genomic islands. Within a population of 60 E. coli O157:H7 strains, the existence of 65 accessory genomic regions identified by Panseq analysis was confirmed by PCR. The accessory genome and binary presence/absence data, and core genome and single nucleotide polymorphisms (SNPs) of six L. monocytogenes strains were extracted with Panseq and hierarchically clustered and visualized. The nucleotide core and binary accessory data were also used to construct maximum parsimony (MP) trees, which were compared to the MP tree generated by multi-locus sequence typing (MLST). The topology of the accessory and core trees was identical but differed from the tree produced using seven MLST loci. The Loci Selector module found the most variable and discriminatory combinations of four loci within a 100 loci set among 10 strains in 1 s, compared to the 449 s required to exhaustively search for all possible combinations; it also found the most discriminatory 20 loci from a 96 loci E. coli O157:H7 SNP dataset. Conclusion Panseq determines the core and accessory regions among a collection of genomic sequences based on user-defined parameters. It readily extracts regions unique to a genome or group of genomes, identifies SNPs within shared core genomic regions, constructs files for use in phylogeny programs based on both the presence

  8. Pan-genome sequence analysis using Panseq: an online tool for the rapid analysis of core and accessory genomic regions.

    PubMed

    Laing, Chad; Buchanan, Cody; Taboada, Eduardo N; Zhang, Yongxiang; Kropinski, Andrew; Villegas, Andre; Thomas, James E; Gannon, Victor P J

    2010-09-15

    The pan-genome of a bacterial species consists of a core and an accessory gene pool. The accessory genome is thought to be an important source of genetic variability in bacterial populations and is gained through lateral gene transfer, allowing subpopulations of bacteria to better adapt to specific niches. Low-cost and high-throughput sequencing platforms have created an exponential increase in genome sequence data and an opportunity to study the pan-genomes of many bacterial species. In this study, we describe a new online pan-genome sequence analysis program, Panseq. Panseq was used to identify Escherichia coli O157:H7 and E. coli K-12 genomic islands. Within a population of 60 E. coli O157:H7 strains, the existence of 65 accessory genomic regions identified by Panseq analysis was confirmed by PCR. The accessory genome and binary presence/absence data, and core genome and single nucleotide polymorphisms (SNPs) of six L. monocytogenes strains were extracted with Panseq and hierarchically clustered and visualized. The nucleotide core and binary accessory data were also used to construct maximum parsimony (MP) trees, which were compared to the MP tree generated by multi-locus sequence typing (MLST). The topology of the accessory and core trees was identical but differed from the tree produced using seven MLST loci. The Loci Selector module found the most variable and discriminatory combinations of four loci within a 100 loci set among 10 strains in 1 s, compared to the 449 s required to exhaustively search for all possible combinations; it also found the most discriminatory 20 loci from a 96 loci E. coli O157:H7 SNP dataset. Panseq determines the core and accessory regions among a collection of genomic sequences based on user-defined parameters. It readily extracts regions unique to a genome or group of genomes, identifies SNPs within shared core genomic regions, constructs files for use in phylogeny programs based on both the presence/absence of accessory regions

  9. Local sequence assembly reveals a high-resolution profile of somatic structural variations in 97 cancer genomes.

    PubMed

    Zhuang, Jiali; Weng, Zhiping

    2015-09-30

    Genomic structural variations (SVs) are pervasive in many types of cancers. Characterizing their underlying mechanisms and potential molecular consequences is crucial for understanding the basic biology of tumorigenesis. Here, we engineered a local assembly-based algorithm (laSV) that detects SVs with high accuracy from paired-end high-throughput genomic sequencing data and pinpoints their breakpoints at single base-pair resolution. By applying laSV to 97 tumor-normal paired genomic sequencing datasets across six cancer types produced by The Cancer Genome Atlas Research Network, we discovered that non-allelic homologous recombination is the primary mechanism for generating somatic SVs in acute myeloid leukemia. This finding contrasts with results for the other five types of solid tumors, in which non-homologous end joining and microhomology end joining are the predominant mechanisms. We also found that the genes recursively mutated by single nucleotide alterations differed from the genes recursively mutated by SVs, suggesting that these two types of genetic alterations play different roles during cancer progression. We further characterized how the gene structures of the oncogene JAK1 and the tumor suppressors KDM6A and RB1 are affected by somatic SVs and discussed the potential functional implications of intergenic SVs.

  10. Combination of high-resolution magic angle spinning proton magnetic resonance spectroscopy and microscale genomics to type brain tumor biopsies.

    PubMed

    Tzika, A Aria; Astrakas, Loukas; Cao, Haihui; Mintzopoulos, Dionyssios; Andronesi, Ovidiu C; Mindrinos, Michael; Zhang, Jiangwen; Rahme, Laurence G; Blekas, Konstantinos D; Likas, Aristidis C; Galatsanos, Nikolas P; Carroll, Rona S; Black, Peter M

    2007-08-01

    Advancements in the diagnosis and prognosis of brain tumor patients, and thus in their survival and quality of life, can be achieved using biomarkers that facilitate improved tumor typing. We introduce and implement a combinatorial metabolic and molecular approach that applies state-of-the-art, high-resolution magic angle spinning (HRMAS) proton (1H) MRS and gene transcriptome profiling to intact brain tumor biopsies, to identify unique biomarker profiles of brain tumors. Our results show that samples as small as 2 mg can be successfully processed, the HRMAS 1H MRS procedure does not result in mRNA degradation, and minute mRNA amounts yield high-quality genomic data. The MRS and genomic analyses demonstrate that CNS tumors have altered levels of specific 1H MRS metabolites that directly correspond to altered expression of Kennedy pathway genes; and exhibit rapid phospholipid turnover, which coincides with upregulation of cell proliferation genes. The data also suggest Sonic Hedgehog pathway (SHH) dysregulation may play a role in anaplastic ganglioglioma pathogenesis. That a strong correlation is seen between the HRMAS 1H MRS and genomic data cross-validates and further demonstrates the biological relevance of the MRS results. Our combined metabolic/molecular MRS/genomic approach provides insights into the biology of anaplastic ganglioglioma and a new potential tumor typing methodology that could aid neurologists and neurosurgeons to improve the diagnosis, treatment, and ongoing evaluation of brain tumor patients.

  11. Analysis of the genomic homologous recombination in Theilovirus based on complete genomes

    PubMed Central

    2011-01-01

    At present, Theilovirus is considered to comprise four distinct serotypes, including Theiler's murine encephalomyelitis virus, Vilyuisk human encephalomyelitis virus, Thera virus, and Saffold virus. So far, there is no systematical study that investigated the genomic recombination of Theilovirus. The present study performed the phylogenetic and recombination analysis of Theilovirus over the complete genomes. Seven potentially significant recombination events were identified. However, according to the strains information and references related to the recombinants and their parental strains, four of the recombination events might happen non-naturally. These results will provide valuable hints for future research on evolution and antigenic variability of Theilovirus. PMID:21923921

  12. Analysis of the genomic homologous recombination in Theilovirus based on complete genomes.

    PubMed

    Sun, Guangming; Zhang, Xiaodan; Yi, Maoli; Shao, Shihe; Zhang, Wen

    2011-09-17

    At present, Theilovirus is considered to comprise four distinct serotypes, including Theiler's murine encephalomyelitis virus, Vilyuisk human encephalomyelitis virus, Thera virus, and Saffold virus. So far, there is no systematical study that investigated the genomic recombination of Theilovirus. The present study performed the phylogenetic and recombination analysis of Theilovirus over the complete genomes. Seven potentially significant recombination events were identified. However, according to the strains information and references related to the recombinants and their parental strains, four of the recombination events might happen non-naturally. These results will provide valuable hints for future research on evolution and antigenic variability of Theilovirus.

  13. Macrorestriction Analysis of Caenorhabditis Elegans Genomic DNA

    PubMed Central

    Browning, H.; Berkowitz, L.; Madej, C.; Paulsen, J. E.; Zolan, M. E.; Strome, S.

    1996-01-01

    The usefulness of genomic physical maps is greatly enhanced by linkage of the physical map with the genetic map. We describe a ``macrorestriction mapping'' procedure for Caenorhabditis elegans that we have applied to this endeavor. High molecular weight, genomic DNA is digested with infrequently cutting restriction enzymes and size-fractionated by pulsed field gel electrophoresis. Southern blots of the gels are probed with clones from the C. elegans physical map. This procedure allows the construction of restriction maps covering several hundred kilobases and the detection of polymorphic restriction fragments using probes that map several hundred kilobases away. We describe several applications of this technique. (1) We determined that the amount of DNA in a previously uncloned region is <220 kb. (2) We mapped the mes-1 gene to a cosmid, by detecting polymorphic restriction fragments associated with a deletion allele of the gene. The 25-kb deletion was initially detected using as a probe sequences located ~400 kb away from the gene. (3) We mapped the molecular endpoint of the deficiency hDf6, and determined that three spontaneously derived duplications in the unc-38-dpy-5 region have very complex molecular structures, containing internal rearrangements and deletions. PMID:8889524

  14. Genomic analysis of fibrolamellar hepatocellular carcinoma

    PubMed Central

    Xu, Lei; Hazard, Florette K.; Zmoos, Anne-Flore; Jahchan, Nadine; Chaib, Hassan; Garfin, Phillip M.; Rangaswami, Arun; Snyder, Michael P.; Sage, Julien

    2015-01-01

    Pediatric tumors are relatively infrequent, but are often associated with significant lethality and lifelong morbidity. A major goal of pediatric cancer research has been to identify key drivers of tumorigenesis to eventually develop targeted therapies to enhance cure rate and minimize acute and long-term toxic effects. Here, we used genomic approaches to identify biomarkers and candidate drivers for fibrolamellar hepatocellular carcinoma (FL-HCC), a very rare subtype of pediatric liver cancer for which limited therapeutic options exist. In-depth genomic analyses of one tumor followed by immunohistochemistry validation on seven other tumors showed expression of neuroendocrine markers in FL-HCC. DNA and RNA sequencing data further showed that common cancer pathways are not visibly altered in FL-HCC but identified two novel structural variants, both resulting in fusion transcripts. The first, a 400 kb deletion, results in a DNAJB1-PRKCA fusion transcript, which leads to increased cAMP-dependent protein kinase (PKA) activity in the index tumor case and other FL-HCC cases compared with normal liver. This PKA fusion protein is oncogenic in HCC cells. The second gene fusion event, a translocation between the CLPTM1L and GLIS3 genes, generates a transcript whose product also promotes cancer phenotypes in HCC cell lines. These experiments further highlight the tumorigenic role of gene fusions in the etiology of pediatric solid tumors and identify both candidate biomarkers and possible therapeutic targets for this lethal pediatric disease. PMID:25122662

  15. Genomic Analysis of Companion Rabbit Staphylococcus aureus

    PubMed Central

    Holmes, Mark A.; Harrison, Ewan M.; Fisher, Elizabeth A.; Graham, Elizabeth M.; Parkhill, Julian; Foster, Geoffrey; Paterson, Gavin K.

    2016-01-01

    In addition to being an important human pathogen, Staphylococcus aureus is able to cause a variety of infections in numerous other host species. While the S. aureus strains causing infection in several of these hosts have been well characterised, this is not the case for companion rabbits (Oryctolagus cuniculus), where little data are available on S. aureus strains from this host. To address this deficiency we have performed antimicrobial susceptibility testing and genome sequencing on a collection of S. aureus isolates from companion rabbits. The findings show a diverse S. aureus population is able to cause infection in this host, and while antimicrobial resistance was uncommon, the isolates possess a range of known and putative virulence factors consistent with a diverse clinical presentation in companion rabbits including severe abscesses. We additionally show that companion rabbit isolates carry polymorphisms within dltB as described as underlying host-adaption of S. aureus to farmed rabbits. The availability of S. aureus genome sequences from companion rabbits provides an important aid to understanding the pathogenesis of disease in this host and in the clinical management and surveillance of these infections. PMID:26963381

  16. Genomic analysis and reconstruction of cefotaxime resistance in Streptococcus pneumoniae.

    PubMed

    Fani, Fereshteh; Brotherton, Marie-Christine; Leprohon, Philippe; Ouellette, Marc

    2013-08-01

    To identify non-penicillin-binding protein (PBP) mutations contributing to resistance to the third-generation cephalosporin cefotaxime in Streptococcus pneumoniae at the genome-wide scale. The genomes of two in vitro S. pneumoniae cefotaxime-resistant isolates and of two transformants serially transformed with the genomic DNA of cefotaxime-resistant mutants were determined by next-generation sequencing. A role in cefotaxime resistance for the mutations identified was confirmed by reconstructing resistance in a cefotaxime-susceptible background. Analysis of the genome assemblies revealed mutations in genes coding for the PBPs 2x, 2a and 3, of which pbp2x was the only mutated gene common to all mutants. The transformation of altered PBP alleles into S. pneumoniae R6 confirmed the role of PBP mutations in cefotaxime resistance, but these were not sufficient to fully explain the levels of resistance. Thirty-one additional genes were found to be mutated in at least one of the four sequenced genomes. Non-PBP resistance determinants appeared to be mostly lineage specific. Mutations in spr1333, spr0981, spr1704 and spr1098, encoding a peptidoglycan N-acetylglucosamine deacetylase, a glycosyltransferase, an ABC transporter and a sortase, respectively, were implicated in resistance by transformation experiments and allowed the reconstruction of the full level of resistance observed in the parent resistant strains. This whole-genome analysis coupled to functional studies has allowed the discovery of both known and novel cefotaxime resistance genes in S. pneumoniae.

  17. Genome wide copy number analysis of single cells

    PubMed Central

    Baslan, Timour; Kendall, Jude; Rodgers, Linda; Cox, Hilary; Riggs, Mike; Stepansky, Asya; Troge, Jennifer; Ravi, Kandasamy; Esposito, Diane; Lakshmi, B.; Wigler, Michael; Navin, Nicholas; Hicks, James

    2016-01-01

    Summary Copy number variation (CNV) is increasingly recognized as an important contributor to phenotypic variation in health and disease. Most methods for determining CNV rely on admixtures of cells, where information regarding genetic heterogeneity is lost. Here, we present a protocol that allows for the genome wide copy number analysis of single nuclei isolated from mixed populations of cells. Single nucleus sequencing (SNS), combines flow sorting of single nuclei based on DNA content, whole genome amplification (WGA), followed by next generation sequencing to quantize genomic intervals in a genome wide manner. Multiplexing of single cells is discussed. Additionally, we outline informatic approaches that correct for biases inherent in the WGA procedure and allow for accurate determination of copy number profiles. All together, the protocol takes ~3 days from flow cytometry to sequence-ready DNA libraries. PMID:22555242

  18. High-resolution mass spectrometric analysis of biomass pyrolysis vapors

    DOE PAGES

    Christensen, Earl; Evans, Robert J.; Carpenter, Daniel

    2017-01-19

    Vapors generated from the pyrolysis of lignocellulosic biomass are made up of a complex mixture of oxygenated compounds. Direct analysis of these vapors provides insight into the mechanisms of depolymerization of cellulose, hemicellulose, and lignin as well as insight into reactions that may occur during condensation of pyrolysis vapors into bio-oil. Studies utilizing pyrolysis molecular beam mass spectrometry have provided valuable information regarding the chemical composition of pyrolysis vapors. Mass spectrometers generally employed with these instruments have low mass resolution of approximately a mass unit. The presence of chemical species with identical unit mass but differing elemental formulas cannot bemore » resolved with these instruments and are therefore detected as a single ion. In this study we analyzed the pyrolysis vapors of several biomass sources using a high-resolution double focusing mass spectrometer. High-resolution analysis of pyrolysis vapors allowed for speciation of several compounds that would be detected as a single ion with unit mass resolution. Lastly, these data not only provide greater detail into the composition of pyrolysis vapors but also highlight differences between vapors generated from multiple biomass feedstocks.« less

  19. Texture analysis of high-resolution FLAIR images for TLE

    NASA Astrophysics Data System (ADS)

    Jafari-Khouzani, Kourosh; Soltanian-Zadeh, Hamid; Elisevich, Kost

    2005-04-01

    This paper presents a study of the texture information of high-resolution FLAIR images of the brain with the aim of determining the abnormality and consequently the candidacy of the hippocampus for temporal lobe epilepsy (TLE) surgery. Intensity and volume features of the hippocampus from FLAIR images of the brain have been previously shown to be useful in detecting the abnormal hippocampus in TLE. However, the small size of the hippocampus may limit the texture information. High-resolution FLAIR images show more details of the abnormal intensity variations of the hippocampi and therefore are more suitable for texture analysis. We study and compare the low and high-resolution FLAIR images of six epileptic patients. The hippocampi are segmented manually by an expert from T1-weighted MR images. Then the segmented regions are mapped on the corresponding FLAIR images for texture analysis. The 2-D wavelet transforms of the hippocampi are employed for feature extraction. We compare the ability of the texture features from regular and high-resolution FLAIR images to distinguish normal and abnormal hippocampi. Intracranial EEG results as well as surgery outcome are used as gold standard. The results show that the intensity variations of the hippocampus are related to the abnormalities in the TLE.

  20. Adaptive molecular resolution approach in Hamiltonian form: An asymptotic analysis.

    PubMed

    Zhu, Jinglong; Klein, Rupert; Delle Site, Luigi

    2016-10-01

    Adaptive molecular resolution approaches in molecular dynamics are becoming relevant tools for the analysis of molecular liquids characterized by the interplay of different physical scales. The essential difference among these methods is in the way the change of molecular resolution is made in a buffer (transition) region. In particular a central question concerns the possibility of the existence of a global Hamiltonian which, by describing the change of resolution, is at the same time physically consistent, mathematically well defined, and numerically accurate. In this paper we present an asymptotic analysis of the adaptive process complemented by numerical results and show that under certain mathematical conditions a Hamiltonian, which is physically consistent and numerically accurate, may exist. Such conditions show that molecular simulations in the current computational implementation require systems of large size, and thus a Hamiltonian approach such as the one proposed, at this stage, would not be practical from the numerical point of view. However, the Hamiltonian proposed provides the basis for a simplification and generalization of the numerical implementation of adaptive resolution algorithms to other molecular dynamics codes.

  1. A Novel Approach to Helicobacter pylori Pan-Genome Analysis for Identification of Genomic Islands

    PubMed Central

    Uchiyama, Ikuo; Albritton, Jacob; Fukuyo, Masaki; Kojima, Kenji K.; Yahara, Koji; Kobayashi, Ichizo

    2016-01-01

    Genomes of a given bacterial species can show great variation in gene content and thus systematic analysis of the entire gene repertoire, termed the pan-genome, is important for understanding bacterial intra-species diversity, population genetics, and evolution. Here, we analyzed the pan-genome from 30 completely sequenced strains of the human gastric pathogen Helicobacter pylori belonging to various phylogeographic groups, focusing on 991 accessory (not fully conserved) orthologous groups (OGs). We developed a method to evaluate the mobility of genes within a genome, using the gene order in the syntenically conserved regions as a reference, and classified the 991 accessory OGs into five classes: Core, Stable, Intermediate, Mobile, and Unique. Phylogenetic networks based on the gene content of Core and Stable classes are highly congruent with that created from the concatenated alignment of fully conserved core genes, in contrast to those of Intermediate and Mobile classes, which show quite different topologies. By clustering the accessory OGs on the basis of phylogenetic pattern similarity and chromosomal proximity, we identified 60 co-occurring gene clusters (CGCs). In addition to known genomic islands, including cag pathogenicity island, bacteriophages, and integrating conjugative elements, we identified some novel ones. One island encodes TerY-phosphorylation triad, which includes the eukaryote-type protein kinase/phosphatase gene pair, and components of type VII secretion system. Another one contains a reverse-transcriptase homolog, which may be involved in the defense against phage infection through altruistic suicide. Many of the CGCs contained restriction-modification (RM) genes. Different RM systems sometimes occupied the same (orthologous) locus in the strains. We anticipate that our method will facilitate pan-genome studies in general and help identify novel genomic islands in various bacterial species. PMID:27504980

  2. A Novel Approach to Helicobacter pylori Pan-Genome Analysis for Identification of Genomic Islands.

    PubMed

    Uchiyama, Ikuo; Albritton, Jacob; Fukuyo, Masaki; Kojima, Kenji K; Yahara, Koji; Kobayashi, Ichizo

    2016-01-01

    Genomes of a given bacterial species can show great variation in gene content and thus systematic analysis of the entire gene repertoire, termed the pan-genome, is important for understanding bacterial intra-species diversity, population genetics, and evolution. Here, we analyzed the pan-genome from 30 completely sequenced strains of the human gastric pathogen Helicobacter pylori belonging to various phylogeographic groups, focusing on 991 accessory (not fully conserved) orthologous groups (OGs). We developed a method to evaluate the mobility of genes within a genome, using the gene order in the syntenically conserved regions as a reference, and classified the 991 accessory OGs into five classes: Core, Stable, Intermediate, Mobile, and Unique. Phylogenetic networks based on the gene content of Core and Stable classes are highly congruent with that created from the concatenated alignment of fully conserved core genes, in contrast to those of Intermediate and Mobile classes, which show quite different topologies. By clustering the accessory OGs on the basis of phylogenetic pattern similarity and chromosomal proximity, we identified 60 co-occurring gene clusters (CGCs). In addition to known genomic islands, including cag pathogenicity island, bacteriophages, and integrating conjugative elements, we identified some novel ones. One island encodes TerY-phosphorylation triad, which includes the eukaryote-type protein kinase/phosphatase gene pair, and components of type VII secretion system. Another one contains a reverse-transcriptase homolog, which may be involved in the defense against phage infection through altruistic suicide. Many of the CGCs contained restriction-modification (RM) genes. Different RM systems sometimes occupied the same (orthologous) locus in the strains. We anticipate that our method will facilitate pan-genome studies in general and help identify novel genomic islands in various bacterial species.

  3. Differential DNA Methylation Analysis without a Reference Genome.

    PubMed

    Klughammer, Johanna; Datlinger, Paul; Printz, Dieter; Sheffield, Nathan C; Farlik, Matthias; Hadler, Johanna; Fritsch, Gerhard; Bock, Christoph

    2015-12-22

    Genome-wide DNA methylation mapping uncovers epigenetic changes associated with animal development, environmental adaptation, and species evolution. To address the lack of high-throughput methods for DNA methylation analysis in non-model organisms, we developed an integrated approach for studying DNA methylation differences independent of a reference genome. Experimentally, our method relies on an optimized 96-well protocol for reduced representation bisulfite sequencing (RRBS), which we have validated in nine species (human, mouse, rat, cow, dog, chicken, carp, sea bass, and zebrafish). Bioinformatically, we developed the RefFreeDMA software to deduce ad hoc genomes directly from RRBS reads and to pinpoint differentially methylated regions between samples or groups of individuals (http://RefFreeDMA.computational-epigenetics.org). The identified regions are interpreted using motif enrichment analysis and/or cross-mapping to annotated genomes. We validated our method by reference-free analysis of cell-type-specific DNA methylation in the blood of human, cow, and carp. In summary, we present a cost-effective method for epigenome analysis in ecology and evolution, which enables epigenome-wide association studies in natural populations and species without a reference genome. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  4. Differential DNA Methylation Analysis without a Reference Genome

    PubMed Central

    Klughammer, Johanna; Datlinger, Paul; Printz, Dieter; Sheffield, Nathan C.; Farlik, Matthias; Hadler, Johanna; Fritsch, Gerhard; Bock, Christoph

    2015-01-01

    Summary Genome-wide DNA methylation mapping uncovers epigenetic changes associated with animal development, environmental adaptation, and species evolution. To address the lack of high-throughput methods for DNA methylation analysis in non-model organisms, we developed an integrated approach for studying DNA methylation differences independent of a reference genome. Experimentally, our method relies on an optimized 96-well protocol for reduced representation bisulfite sequencing (RRBS), which we have validated in nine species (human, mouse, rat, cow, dog, chicken, carp, sea bass, and zebrafish). Bioinformatically, we developed the RefFreeDMA software to deduce ad hoc genomes directly from RRBS reads and to pinpoint differentially methylated regions between samples or groups of individuals (http://RefFreeDMA.computational-epigenetics.org). The identified regions are interpreted using motif enrichment analysis and/or cross-mapping to annotated genomes. We validated our method by reference-free analysis of cell-type-specific DNA methylation in the blood of human, cow, and carp. In summary, we present a cost-effective method for epigenome analysis in ecology and evolution, which enables epigenome-wide association studies in natural populations and species without a reference genome. PMID:26673328

  5. Comprehensive genome sequence analysis of a breast cancer amplicon.

    PubMed

    Collins, C; Volik, S; Kowbel, D; Ginzinger, D; Ylstra, B; Cloutier, T; Hawkins, T; Predki, P; Martin, C; Wernick, M; Kuo, W L; Alberts, A; Gray, J W

    2001-06-01

    Gene amplification occurs in most solid tumors and is associated with poor prognosis. Amplification of 20q13.2 is common to several tumor types including breast cancer. The 1 Mb of sequence spanning the 20q13.2 breast cancer amplicon is one of the most exhaustively studied segments of the human genome. These studies have included amplicon mapping by comparative genomic hybridization (CGH), fluorescent in-situ hybridization (FISH), array-CGH, quantitative microsatellite analysis (QUMA), and functional genomic studies. Together these studies revealed a complex amplicon structure suggesting the presence of at least two driver genes in some tumors. One of these, ZNF217, is capable of immortalizing human mammary epithelial cells (HMEC) when overexpressed. In addition, we now report the sequencing of this region in human and mouse, and on quantitative expression studies in tumors. Amplicon localization now is straightforward and the availability of human and mouse genomic sequence facilitates their functional analysis. However, comprehensive annotation of megabase-scale regions requires integration of vast amounts of information. We present a system for integrative analysis and demonstrate its utility on 1.2 Mb of sequence spanning the 20q13.2 breast cancer amplicon and 865 kb of syntenic murine sequence. We integrate tumor genome copy number measurements with exhaustive genome landscape mapping, showing that amplicon boundaries are associated with maxima in repetitive element density and a region of evolutionary instability. This integration of comprehensive sequence annotation, quantitative expression analysis, and tumor amplicon boundaries provide evidence for an additional driver gene prefoldin 4 (PFDN4), coregulated genes, conserved noncoding regions, and associate repetitive elements with regions of genomic instability at this locus.

  6. A Chromosome 7 Pericentric Inversion Defined at Single-Nucleotide Resolution Using Diagnostic Whole Genome Sequencing in a Patient with Hand-Foot-Genital Syndrome.

    PubMed

    Watson, Christopher M; Crinnion, Laura A; Harrison, Sally M; Lascelles, Carolina; Antanaviciute, Agne; Carr, Ian M; Bonthron, David T; Sheridan, Eamonn

    2016-01-01

    Next generation sequencing methodologies are facilitating the rapid characterisation of novel structural variants at nucleotide resolution. These approaches are particularly applicable to variants initially identified using alternative molecular methods. We report a child born with bilateral postaxial syndactyly of the feet and bilateral fifth finger clinodactyly. This was presumed to be an autosomal recessive syndrome, due to the family history of consanguinity. Karyotype analysis revealed a homozygous pericentric inversion of chromosome 7 (46,XX,inv(7)(p15q21)x2) which was confirmed to be heterozygous in both unaffected parents. Since the resolution of the karyotype was insufficient to identify any putatively causative gene, we undertook medium-coverage whole genome sequencing using paired-end reads, in order to elucidate the molecular breakpoints. In a two-step analysis, we first narrowed down the region by identifying discordant read-pairs, and then determined the precise molecular breakpoint by analysing the mapping locations of "soft-clipped" breakpoint-spanning reads. PCR and Sanger sequencing confirmed the identified breakpoints, both of which were located in intergenic regions. Significantly, the 7p15 breakpoint was located 523 kb upstream of HOXA13, the locus for hand-foot-genital syndrome. By inference from studies of HOXA locus control in the mouse, we suggest that the inversion has delocalised a HOXA13 enhancer to produce the phenotype observed in our patient. This study demonstrates how modern genetic diagnostic approach can characterise structural variants at nucleotide resolution and provide potential insights into functional regulation.

  7. Complete genome of Arthrobacter alpinus strain R3.8, bioremediation potential unraveled with genomic analysis.

    PubMed

    See-Too, Wah-Seng; Ee, Robson; Lim, Yan-Lue; Convey, Peter; Pearce, David A; Mohidin, Taznim Begam Mohd; Yin, Wai-Fong; Chan, Kok Gan

    2017-01-01

    Arthrobacter alpinus R3.8 is a psychrotolerant bacterial strain isolated from a soil sample obtained at Rothera Point, Adelaide Island, close to the Antarctic Peninsula. Strain R3.8 was sequenced in order to help discover potential cold active enzymes with biotechnological applications. Genome analysis identified various cold adaptation genes including some coding for anti-freeze proteins and cold-shock proteins, genes involved in bioremediation of xenobiotic compounds including naphthalene, and genes with chitinolytic and N-acetylglucosamine utilization properties and also plant-growth-influencing properties. In this genome report, we present a complete genome sequence of A. alpinus strain R3.8 and its annotation data, which will facilitate exploitation of potential novel cold-active enzymes.

  8. Savant Genome Browser 2: visualization and analysis for population-scale genomics.

    PubMed

    Fiume, Marc; Smith, Eric J M; Brook, Andrew; Strbenac, Dario; Turner, Brian; Mezlini, Aziz M; Robinson, Mark D; Wodak, Shoshana J; Brudno, Michael

    2012-07-01

    High-throughput sequencing (HTS) technologies are providing an unprecedented capacity for data generation, and there is a corresponding need for efficient data exploration and analysis capabilities. Although most existing tools for HTS data analysis are developed for either automated (e.g. genotyping) or visualization (e.g. genome browsing) purposes, such tools are most powerful when combined. For example, integration of visualization and computation allows users to iteratively refine their analyses by updating computational parameters within the visual framework in real-time. Here we introduce the second version of the Savant Genome Browser, a standalone program for visual and computational analysis of HTS data. Savant substantially improves upon its predecessor and existing tools by introducing innovative visualization modes and navigation interfaces for several genomic datatypes, and synergizing visual and automated analyses in a way that is powerful yet easy even for non-expert users. We also present a number of plugins that were developed by the Savant Community, which demonstrate the power of integrating visual and automated analyses using Savant. The Savant Genome Browser is freely available (open source) at www.savantbrowser.com.

  9. Savant Genome Browser 2: visualization and analysis for population-scale genomics

    PubMed Central

    Smith, Eric J. M.; Brook, Andrew; Strbenac, Dario; Turner, Brian; Mezlini, Aziz M.; Robinson, Mark D.; Wodak, Shoshana J.; Brudno, Michael

    2012-01-01

    High-throughput sequencing (HTS) technologies are providing an unprecedented capacity for data generation, and there is a corresponding need for efficient data exploration and analysis capabilities. Although most existing tools for HTS data analysis are developed for either automated (e.g. genotyping) or visualization (e.g. genome browsing) purposes, such tools are most powerful when combined. For example, integration of visualization and computation allows users to iteratively refine their analyses by updating computational parameters within the visual framework in real-time. Here we introduce the second version of the Savant Genome Browser, a standalone program for visual and computational analysis of HTS data. Savant substantially improves upon its predecessor and existing tools by introducing innovative visualization modes and navigation interfaces for several genomic datatypes, and synergizing visual and automated analyses in a way that is powerful yet easy even for non-expert users. We also present a number of plugins that were developed by the Savant Community, which demonstrate the power of integrating visual and automated analyses using Savant. The Savant Genome Browser is freely available (open source) at www.savantbrowser.com. PMID:22638571

  10. Annotation of the zebrafish genome through an integrated transcriptomic and proteomic analysis.

    PubMed

    Kelkar, Dhanashree S; Provost, Elayne; Chaerkady, Raghothama; Muthusamy, Babylakshmi; Manda, Srikanth S; Subbannayya, Tejaswini; Selvan, Lakshmi Dhevi N; Wang, Chieh-Huei; Datta, Keshava K; Woo, Sunghee; Dwivedi, Sutopa B; Renuse, Santosh; Getnet, Derese; Huang, Tai-Chung; Kim, Min-Sik; Pinto, Sneha M; Mitchell, Christopher J; Madugundu, Anil K; Kumar, Praveen; Sharma, Jyoti; Advani, Jayshree; Dey, Gourav; Balakrishnan, Lavanya; Syed, Nazia; Nanjappa, Vishalakshi; Subbannayya, Yashwanth; Goel, Renu; Prasad, T S Keshava; Bafna, Vineet; Sirdeshmukh, Ravi; Gowda, Harsha; Wang, Charles; Leach, Steven D; Pandey, Akhilesh

    2014-11-01

    Accurate annotation of protein-coding genes is one of the primary tasks upon the completion of whole genome sequencing of any organism. In this study, we used an integrated transcriptomic and proteomic strategy to validate and improve the existing zebrafish genome annotation. We undertook high-resolution mass-spectrometry-based proteomic profiling of 10 adult organs, whole adult fish body, and two developmental stages of zebrafish (SAT line), in addition to transcriptomic profiling of six organs. More than 7,000 proteins were identified from proteomic analyses, and ∼ 69,000 high-confidence transcripts were assembled from the RNA sequencing data. Approximately 15% of the transcripts mapped to intergenic regions, the majority of which are likely long non-coding RNAs. These high-quality transcriptomic and proteomic data were used to manually reannotate the zebrafish genome. We report the identification of 157 novel protein-coding genes. In addition, our data led to modification of existing gene structures including novel exons, changes in exon coordinates, changes in frame of translation, translation in annotated UTRs, and joining of genes. Finally, we discovered four instances of genome assembly errors that were supported by both proteomic and transcriptomic data. Our study shows how an integrative analysis of the transcriptome and the proteome can extend our understanding of even well-annotated genomes.

  11. Population-based 3D genome structure analysis reveals driving forces in spatial genome organization

    PubMed Central

    Li, Wenyuan; Kalhor, Reza; Dai, Chao; Hao, Shengli; Gong, Ke; Zhou, Yonggang; Li, Haochen; Zhou, Xianghong Jasmine; Le Gros, Mark A.; Larabell, Carolyn A.; Chen, Lin; Alber, Frank

    2016-01-01

    Conformation capture technologies (e.g., Hi-C) chart physical interactions between chromatin regions on a genome-wide scale. However, the structural variability of the genome between cells poses a great challenge to interpreting ensemble-averaged Hi-C data, particularly for long-range and interchromosomal interactions. Here, we present a probabilistic approach for deconvoluting Hi-C data into a model population of distinct diploid 3D genome structures, which facilitates the detection of chromatin interactions likely to co-occur in individual cells. Our approach incorporates the stochastic nature of chromosome conformations and allows a detailed analysis of alternative chromatin structure states. For example, we predict and experimentally confirm the presence of large centromere clusters with distinct chromosome compositions varying between individual cells. The stability of these clusters varies greatly with their chromosome identities. We show that these chromosome-specific clusters can play a key role in the overall chromosome positioning in the nucleus and stabilizing specific chromatin interactions. By explicitly considering genome structural variability, our population-based method provides an important tool for revealing novel insights into the key factors shaping the spatial genome organization. PMID:26951677

  12. Population-based 3D genome structure analysis reveals driving forces in spatial genome organization

    DOE PAGES

    Tjong, Harianto; Li, Wenyuan; Kalhor, Reza; ...

    2016-03-07

    Conformation capture technologies (e.g., Hi-C) chart physical interactions between chromatin regions on a genome-wide scale. However, the structural variability of the genome between cells poses a great challenge to interpreting ensemble-averaged Hi-C data, particularly for long-range and interchromosomal interactions. Here, we present a probabilistic approach for deconvoluting Hi-C data into a model population of distinct diploid 3D genome structures, which facilitates the detection of chromatin interactions likely to co-occur in individual cells. Here, our approach incorporates the stochastic nature of chromosome conformations and allows a detailed analysis of alternative chromatin structure states. For example, we predict and experimentally confirm themore » presence of large centromere clusters with distinct chromosome compositions varying between individual cells. The stability of these clusters varies greatly with their chromosome identities. We show that these chromosome-specific clusters can play a key role in the overall chromosome positioning in the nucleus and stabilizing specific chromatin interactions. By explicitly considering genome structural variability, our population-based method provides an important tool for revealing novel insights into the key factors shaping the spatial genome organization.« less

  13. Genome analysis and comparative genomics of a Giardia intestinalis assemblage E isolate

    PubMed Central

    2010-01-01

    Background Giardia intestinalis is a protozoan parasite that causes diarrhea in a wide range of mammalian species. To further understand the genetic diversity between the Giardia intestinalis species, we have performed genome sequencing and analysis of a wild-type Giardia intestinalis sample from the assemblage E group, isolated from a pig. Results We identified 5012 protein coding genes, the majority of which are conserved compared to the previously sequenced genomes of the WB and GS strains in terms of microsynteny and sequence identity. Despite this, there is an unexpectedly large number of chromosomal rearrangements and several smaller structural changes that are present in all chromosomes. Novel members of the VSP, NEK Kinase and HCMP gene families were identified, which may reveal possible mechanisms for host specificity and new avenues for antigenic variation. We used comparative genomics of the three diverse Giardia intestinalis isolates P15, GS and WB to define a core proteome for this species complex and to identify lineage-specific genes. Extensive analyses of polymorphisms in the core proteome of Giardia revealed differential rates of divergence among cellular processes. Conclusions Our results indicate that despite a well conserved core of genes there is significant genome variation between Giardia isolates, both in terms of gene content, gene polymorphisms, structural chromosomal variations and surface molecule repertoires. This study improves the annotation of the Giardia genomes and enables the identification of functionally important variation. PMID:20929575

  14. Strategies for efficient resolution analysis in full-waveform inversion

    NASA Astrophysics Data System (ADS)

    Fichtner, A.; van Leeuwen, T.; Trampert, J.

    2016-12-01

    Full-waveform inversion is developing into a standard method in the seismological toolbox. It combines numerical wave propagation for heterogeneous media with adjoint techniques in order to improve tomographic resolution. However, resolution becomes increasingly difficult to quantify because of the enormous computational requirements. Here we present two families of methods that can be used for efficient resolution analysis in full-waveform inversion. They are based on the targeted extraction of resolution proxies from the Hessian matrix, which is too large to store and to compute explicitly. Fourier methods rest on the application of the Hessian to Earth models with harmonic oscillations. This yields the Fourier spectrum of the Hessian for few selected wave numbers, from which we can extract properties of the tomographic point-spread function for any point in space. Random probing methods use uncorrelated, random test models instead of harmonic oscillations. Auto-correlating the Hessian-model applications for sufficiently many test models also characterises the point-spread function. Both Fourier and random probing methods provide a rich collection of resolution proxies. These include position- and direction-dependent resolution lengths, and the volume of point-spread functions as indicator of amplitude recovery and inter-parameter trade-offs. The computational requirements of these methods are equivalent to approximately 7 conjugate-gradient iterations in full-waveform inversion. This is significantly less than the optimisation itself, which may require tens to hundreds of iterations to reach convergence. In addition to the theoretical foundations of the Fourier and random probing methods, we show various illustrative examples from real-data full-waveform inversion for crustal and mantle structure.

  15. Microfluidic device for bacterial genome extraction and analysis

    NASA Astrophysics Data System (ADS)

    Galajda, Peter; Riehn, Robert; Wang, Yan-Mei; Keymer, Juan; Golding, Ido; Cox, Edward C.; Austin, Robert H.

    2006-03-01

    Although single molecule DNA manipulation and analysis techniques are emerging, methods for whole genome extraction from single cells, genomic length DNA handling and analytics is still to be developed. Here we present a microfabricated device to address some of these needs. This microfluidic chip is suitable for culturing bacteria and subsequently retrieve their genetic content. As a next step, the extracted DNA can be introduced in a nanostructured segment of the chip for precise handling, stretching and analysis. We hope that similar microdevices can be useful in studying genetic aspects of the cell lifecycle in a variety of organisms.

  16. The sea lamprey meiotic map improves resolution of ancient vertebrate genome duplications.

    PubMed

    Smith, Jeramiah J; Keinath, Melissa C

    2015-08-01

    It is generally accepted that many genes present in vertebrate genomes owe their origin to two whole-genome duplications that occurred deep in the ancestry of the vertebrate lineage. However, details regarding the timing and outcome of these duplications are not well resolved. We present high-density meiotic and comparative genomic maps for the sea lamprey (Petromyzon marinus), a representative of an ancient lineage that diverged from all other vertebrates ∼550 million years ago. Linkage analyses yielded a total of 95 linkage groups, similar to the estimated number of germline chromosomes (1n ∼ 99), spanning a total of 5570.25 cM. Comparative mapping data yield strong support for the hypothesis that a single whole-genome duplication occurred in the basal vertebrate lineage, but do not strongly support a hypothetical second event. Rather, these comparative maps reveal several evolutionarily independent segmental duplications occurring over the last 600+ million years of chordate evolution. This refined history of vertebrate genome duplication should permit more precise investigations of vertebrate evolution. © 2015 Smith and Keinath; Published by Cold Spring Harbor Laboratory Press.

  17. The sea lamprey meiotic map improves resolution of ancient vertebrate genome duplications

    PubMed Central

    Smith, Jeramiah J.; Keinath, Melissa C.

    2015-01-01

    It is generally accepted that many genes present in vertebrate genomes owe their origin to two whole-genome duplications that occurred deep in the ancestry of the vertebrate lineage. However, details regarding the timing and outcome of these duplications are not well resolved. We present high-density meiotic and comparative genomic maps for the sea lamprey (Petromyzon marinus), a representative of an ancient lineage that diverged from all other vertebrates ∼550 million years ago. Linkage analyses yielded a total of 95 linkage groups, similar to the estimated number of germline chromosomes (1n ∼ 99), spanning a total of 5570.25 cM. Comparative mapping data yield strong support for the hypothesis that a single whole-genome duplication occurred in the basal vertebrate lineage, but do not strongly support a hypothetical second event. Rather, these comparative maps reveal several evolutionarily independent segmental duplications occurring over the last 600+ million years of chordate evolution. This refined history of vertebrate genome duplication should permit more precise investigations of vertebrate evolution. PMID:26048246

  18. Revealing misassembled segments in the bovine reference genome by high resolution linkage disequilibrium scan.

    PubMed

    Utsunomiya, Adam T H; Santos, Daniel J A; Boison, Solomon A; Utsunomiya, Yuri T; Milanesi, Marco; Bickhart, Derek M; Ajmone-Marsan, Paolo; Sölkner, Johann; Garcia, José F; da Fonseca, Ricardo; da Silva, Marcos V G B

    2016-09-05

    Misassembly signatures, created by shuffling the order of sequences while assembling a genome, can be detected by the unexpected behavior of marker linkage disequilibrium (LD) decay. We developed a heuristic process to identify misassembly signatures, applied it to the bovine reference genome assembly (UMDv3.1) and presented the consequences of misassemblies in two case studies. We identified 2,906 single nucleotide polymorphism (SNP) markers presenting unexpected LD decay behavior in 626 putative misassembled contigs, which comprised less than 1 % of the whole genome. Although this represents a small fraction of the reference sequence, these poorly assembled segments can lead to severe implications to local genome context. For instance, we showed that one of the misassembled regions mapped to the POLL locus, which affected the annotation of positional candidate genes in a GWAS case study for polledness in Nellore (Bos indicus beef cattle). Additionally, we found that poorly performing markers in imputation mapped to putative misassembled regions, and that correction of marker positions based on LD was capable to recover imputation accuracy. This heuristic approach can be useful to cross validate reference assemblies and to filter out markers located at low confidence genomic regions before conducting downstream analyses.

  19. Digital microarray analysis for digital artifact genomics

    NASA Astrophysics Data System (ADS)

    Jaenisch, Holger; Handley, James; Williams, Deborah

    2013-06-01

    We implement a Spatial Voting (SV) based analogy of microarray analysis for digital gene marker identification in malware code sections. We examine a famous set of malware formally analyzed by Mandiant and code named Advanced Persistent Threat (APT1). APT1 is a Chinese organization formed with specific intent to infiltrate and exploit US resources. Manidant provided a detailed behavior and sting analysis report for the 288 malware samples available. We performed an independent analysis using a new alternative to the traditional dynamic analysis and static analysis we call Spatial Analysis (SA). We perform unsupervised SA on the APT1 originating malware code sections and report our findings. We also show the results of SA performed on some members of the families associated by Manidant. We conclude that SV based SA is a practical fast alternative to dynamics analysis and static analysis.

  20. Conflict Analysis and Resolution Theories for Professional Military Education

    DTIC Science & Technology

    2010-04-14

    education schools. These theories include; John Burton’s "needs theory," J ohan Galtung ’ s "cultural violence," Marie Dugan’s "nested theory of conflict...10. While Johan Galtung is known to have antipathy toward the coercive capacity of the military, his theories of conflict analysis and resolution...provide useful insight. Galtung is best known for articulating "a causal flow from cultural via structural to direct violence can be identified.ൔ

  1. High Resolution Continuous Flow Analysis System for Polar Ice Cores

    NASA Astrophysics Data System (ADS)

    Dallmayr, Remi; Azuma, Kumiko; Yamada, Hironobu; Kjær, Helle Astrid; Vallelonga, Paul; Azuma, Nobuhiko; Takata, Morimasa

    2014-05-01

    In the last decades, Continuous Flow Analysis (CFA) technology for ice core analyses has been developed to reconstruct the past changes of the climate system 1), 2). Compared with traditional analyses of discrete samples, a CFA system offers much faster and higher depth resolution analyses. It also generates a decontaminated sample stream without time-consuming sample processing procedure by using the inner area of an ice-core sample.. The CFA system that we have been developing is currently able to continuously measure stable water isotopes 3) and electrolytic conductivity, as well as to collect discrete samples for the both inner and outer areas with variable depth resolutions. Chemistry analyses4) and methane-gas analysis 5) are planned to be added using the continuous water stream system 5). In order to optimize the resolution of the current system with minimal sample volumes necessary for different analyses, our CFA system typically melts an ice core at 1.6 cm/min. Instead of using a wire position encoder with typical 1mm positioning resolution 6), we decided to use a high-accuracy CCD Laser displacement sensor (LKG-G505, Keyence). At the 1.6 cm/min melt rate, the positioning resolution was increased to 0.27mm. Also, the mixing volume that occurs in our open split debubbler is regulated using its weight. The overflow pumping rate is smoothly PID controlled to maintain the weight as low as possible, while keeping a safety buffer of water to avoid air bubbles downstream. To evaluate the system's depth-resolution, we will present the preliminary data of electrolytic conductivity obtained by melting 12 bags of the North Greenland Eemian Ice Drilling (NEEM) ice core. The samples correspond to different climate intervals (Greenland Stadial 21, 22, Greenland Stadial 5, Greenland Interstadial 5, Greenland Interstadial 7, Greenland Stadial 8). We will present results for the Greenland Stadial -8, whose depths and ages are between 1723.7 and 1724.8 meters, and 35.520 to

  2. High-resolution genome-wide mapping of the primary structure of chromatin.

    PubMed

    Zhang, Zhenhai; Pugh, B Franklin

    2011-01-21

    The genomic organization of chromatin is increasingly recognized as a key regulator of cell behavior, but deciphering its regulation mechanisms requires detailed knowledge of chromatin's primary structure-the assembly of nucleosomes throughout the genome. This Primer explains the principles for mapping and analyzing the primary organization of chromatin on a genomic scale. After introducing chromatin organization and its impact on gene regulation and human health, we then describe methods that detect nucleosome positioning and occupancy levels using chromatin immunoprecipitation in combination with deep sequencing (ChIP-Seq), a strategy that is now straightforward and cost efficient. We then explore current strategies for converting the sequence information into knowledge about chromatin, an exciting challenge for biologists and bioinformaticians. Copyright © 2011 Elsevier Inc. All rights reserved.

  3. High-resolution interrogation of functional elements in the noncoding genome

    PubMed Central

    Sanjana, Neville E.; Wright, Jason; Zheng, Kaijie; Shalem, Ophir; Fontanillas, Pierre; Joung, Julia; Cheng, Christine; Regev, Aviv; Zhang, Feng

    2016-01-01

    The noncoding genome affects gene regulation and disease, yet we lack tools for rapid identification and manipulation of noncoding elements. We develop a CRISPR screen employing ~18,000 sgRNAs targeting >700 kb surrounding the genes NF1, NF2, and CUL3, which are involved in BRAF inhibitor resistance in melanoma. We find that noncoding locations that modulate drug resistance also harbor predictive hallmarks of noncoding function. With a subset of regions at the CUL3 locus, we demonstrate that engineered mutations alter transcription factor occupancy and long-range and local epigenetic environments, implicating these sites in gene regulation and chemotherapeutic resistance. Though our expansion of the potential of pooled CRISPR screens we provide tools for genomic discovery and for elucidating biologically relevant mechanisms of gene regulation. Pooled CRISPR mutagenesis identifies functional elements in the noncoding genome. PMID:27708104

  4. High resolution genome-wide mapping of the primary structure of chromatin

    PubMed Central

    Zhang, Zhenhai; Pugh, B. Franklin

    2011-01-01

    The genomic organization of chromatin is increasingly recognized as a key regulator of cell behavior, but deciphering its regulation mechanisms requires detailed knowledge of chromatin’s primary structure - the assembly of nucleosomes throughout the genome. This Primer explains the principles for mapping and analyzing the primary organization of chromatin on a genomic scale. After introducing chromatin organization and its impact on gene regulation and human health, we then describe methods that detect nucleosome positioning and occupancy levels using chromatin-immunoprecipitation in combination with deep sequencing (ChIP-Seq), a strategy that is now straightforward and cost-efficient. We then explore current strategies for converting the sequence information into knowledge about chromatin, an exciting challenge for biologists and bioinformaticians. PMID:21241889

  5. Coevolution analysis of Hepatitis C virus genome to identify the structural and functional dependency network of viral proteins

    PubMed Central

    Champeimont, Raphaël; Laine, Elodie; Hu, Shuang-Wei; Penin, Francois; Carbone, Alessandra

    2016-01-01

    A novel computational approach of coevolution analysis allowed us to reconstruct the protein-protein interaction network of the Hepatitis C Virus (HCV) at the residue resolution. For the first time, coevolution analysis of an entire viral genome was realized, based on a limited set of protein sequences with high sequence identity within genotypes. The identified coevolving residues constitute highly relevant predictions of protein-protein interactions for further experimental identification of HCV protein complexes. The method can be used to analyse other viral genomes and to predict the associated protein interaction networks. PMID:27198619

  6. Coevolution analysis of Hepatitis C virus genome to identify the structural and functional dependency network of viral proteins

    NASA Astrophysics Data System (ADS)

    Champeimont, Raphaël; Laine, Elodie; Hu, Shuang-Wei; Penin, Francois; Carbone, Alessandra

    2016-05-01

    A novel computational approach of coevolution analysis allowed us to reconstruct the protein-protein interaction network of the Hepatitis C Virus (HCV) at the residue resolution. For the first time, coevolution analysis of an entire viral genome was realized, based on a limited set of protein sequences with high sequence identity within genotypes. The identified coevolving residues constitute highly relevant predictions of protein-protein interactions for further experimental identification of HCV protein complexes. The method can be used to analyse other viral genomes and to predict the associated protein interaction networks.

  7. FVGWAS: Fast Voxelwise Genome Wide Association Analysis of Large-scale Imaging Genetic Data 1

    PubMed Central

    Huang, Meiyan; Nichols, Thomas; Huang, Chao; Yang, Yu; Lu, Zhaohua; Feng, Qianjing; Knickmeyer, Rebecca C; Zhu, Hongtu

    2015-01-01

    More and more large-scale imaging genetic studies are being widely conducted to collect a rich set of imaging, genetic, and clinical data to detect putative genes for complexly inherited neuropsychiatric and neurodegenerative disorders. Several major big-data challenges arise from testing genome-wide (NC > 12 million known variants) associations with signals at millions of locations (NV ~ 106) in the brain from thousands of subjects (n ~ 103). The aim of this paper is to develop a Fast Voxelwise Genome Wide Association analysiS (FVGWAS) framework to e ciently carry out whole-genome analyses of whole-brain data. FVGWAS consists of three components including a heteroscedastic linear model, a global sure independence screening (G-SIS) procedure, and a detection procedure based on wild bootstrap methods. Specifically, for standard linear association, the computational complexity is O(nNV NC) for voxelwise genome wide association analysis (VGWAS) method compared with O((NC + NV)n2) for FVGWAS. Simulation studies show that FVGWAS is an effcient method of searching sparse signals in an extremely large search space, while controlling for the family-wise error rate. Finally, we have successfully applied FVGWAS to a large-scale imaging genetic data analysis of ADNI data with 708 subjects, 193,275 voxels in RAVENS maps, and 501,584 SNPs, and the total processing time was 203,645 seconds for a single CPU. Our FVG-WAS may be a valuable statistical toolbox for large-scale imaging genetic analysis as the field is rapidly advancing with ultra-high-resolution imaging and whole-genome sequencing. PMID:26025292

  8. OGRe: a relational database for comparative analysis of mitochondrial genomes

    PubMed Central

    Jameson, Daniel; Gibson, Andrew P.; Hudelot, Cendrine; Higgs, Paul G.

    2003-01-01

    Organellar Genome Retrieval (OGRe) is a relational database of complete mitochondrial genome sequences for over 250 Metazoan species. OGRe provides a resource for the comparative analysis of mitochondrial genomes at several levels. At the sequence level, OGRe allows the retrieval of any selected set of mitochondrial genes from any selected set of species. Species are classified using a taxonomic system that allows easy selection of related groups of species. Sequence alignments are also available for some species. At the level of individual nucleotides, the system contains information on base frequencies and codon usage frequencies that can be compared between organisms. At the level of whole genomes, OGRe provides several ways of visualizing information on gene order. Diagrams illustrating the genome arrangement can be generated for any selected set of species automatically from the information in the database. Searches can be done based on gene arrangement to find sets of species that have the same order as one another. Diagrams for pairwise comparison of species can be produced that show the positions of break-points in the gene order and use colour to highlight the sections of the genome that have moved. OGRe is available from http://www.bioinf.man.ac.uk/ogre. PMID:12519982

  9. Comparative genomic analysis of two brucellaphages of distant origins.

    PubMed

    Flores, Victor; López-Merino, Ahidé; Mendoza-Hernandez, Guillermo; Guarneros, Gabriel

    2012-04-01

    Here, we present the first complete genome sequence of brucellaphage Tbilisi (Tb) and compared it with that of Pr, a broad host-range brucellaphage recently isolated in Mexico. The genomes consist of 41,148 bp (Tb) and 38,253 bp (Pr), they differ mainly in the region encoding structural proteins, in which the genome of Tb shows two major insertions. Both genomes share 99.87% nucleotide identity, a high percentage of identity among phages isolated at so globally distant locations and temporally different occasions. Sequence analysis revealed 57 conserved ORFs, three transcriptional terminators and four putative transcriptional promoters. The co-occurrence of an ORF encoding a putative DnaA-like protein and a putative oriC-like origin of replication was found in both brucellaphages genomes, a feature not described in any other phage genome. These elements suggest that DNA replication in brucellaphages differs from other phages, and might resemble that of bacterial chromosomes. Copyright © 2012 Elsevier Inc. All rights reserved.

  10. MGcV: the microbial genomic context viewer for comparative genome analysis

    PubMed Central

    2013-01-01

    Background Conserved gene context is used in many types of comparative genome analyses. It is used to provide leads on gene function, to guide the discovery of regulatory sequences, but also to aid in the reconstruction of metabolic networks. We present the Microbial Genomic context Viewer (MGcV), an interactive, web-based application tailored to strengthen the practice of manual comparative genome context analysis for bacteria. Results MGcV is a versatile, easy-to-use tool that renders a visualization of the genomic context of any set of selected genes, genes within a phylogenetic tree, genomic segments, or regulatory elements. It is tailored to facilitate laborious tasks such as the interactive annotation of gene function, the discovery of regulatory elements, or the sequence-based reconstruction of gene regulatory networks. We illustrate that MGcV can be used in gene function annotation by visually integrating information on prokaryotic genes, like their annotation as available from NCBI with other annotation data such as Pfam domains, sub-cellular location predictions and gene-sequence characteristics such as GC content. We also illustrate the usefulness of the interactive features that allow the graphical selection of genes to facilitate data gathering (e.g. upstream regions, ID’s or annotation), in the analysis and reconstruction of transcription regulation. Moreover, putative regulatory elements and their corresponding scores or data from RNA-seq and microarray experiments can be uploaded, visualized and interpreted in (ranked-) comparative context maps. The ranked maps allow the interpretation of predicted regulatory elements and experimental data in light of each other. Conclusion MGcV advances the manual comparative analysis of genes and regulatory elements by providing fast and flexible integration of gene related data combined with straightforward data retrieval. MGcV is available at http://mgcv.cmbi.ru.nl. PMID:23547764

  11. Genome-wide RAD sequence data provide unprecedented resolution of species boundaries and relationships in the Lake Victoria cichlid adaptive radiation.

    PubMed

    Wagner, Catherine E; Keller, Irene; Wittwer, Samuel; Selz, Oliver M; Mwaiko, Salome; Greuter, Lucie; Sivasundar, Arjun; Seehausen, Ole

    2013-02-01

    Although population genomic studies using next generation sequencing (NGS) data are becoming increasingly common, studies focusing on phylogenetic inference using these data are in their infancy. Here, we use NGS data generated from reduced representation genomic libraries of restriction-site-associated DNA (RAD) markers to infer phylogenetic relationships among 16 species of cichlid fishes from a single rocky island community within Lake Victoria's cichlid adaptive radiation. Previous attempts at sequence-based phylogenetic analyses in Victoria cichlids have shown extensive sharing of genetic variation among species and no resolution of species or higher-level relationships. These patterns have generally been attributed to the very recent origin (<15,000 years) of the radiation, and ongoing hybridization between species. We show that as we increase the amount of sequence data used in phylogenetic analyses, we produce phylogenetic trees with unprecedented resolution for this group. In trees derived from our largest data supermatrices (3 to >5.8 million base pairs in width), species are reciprocally monophyletic with high bootstrap support, and the majority of internal branches on the tree have high support. Given the difficulty of the phylogenetic problem that the Lake Victoria cichlid adaptive radiation represents, these results are striking. The strict interpretation of the topologies we present here warrants caution because many questions remain about phylogenetic inference with very large genomic data set and because we can with the current analysis not distinguish between effects of shared ancestry and post-speciation gene flow. However, these results provide the first conclusive evidence for the monophyly of species in the Lake Victoria cichlid radiation and demonstrate the power that NGS data sets hold to resolve even the most difficult of phylogenetic challenges. © 2012 Blackwell Publishing Ltd.

  12. [DNA analysis for the post genome-sequencing era].

    PubMed

    Kambara, Hideki

    2002-05-01

    With the completion of the human genome sequencing, the new post genome-sequencing era has started. The major subjects are clarifying the function of genes to apply this information to medical as well as various industrial fields. Various DNA analysis methods and instruments for gene expression profiling as well as genetic diversity including SNPs typing are required and have been developed. Here, the history and technologies related to DNA analysis including the Wada project in the early 1980's, and the Human genome project from 1990 are described. Various new technologies have developed in this decade. They include a capillary gel array DNA sequencer, DNA chips, bead probe arrays, a new DNA sequencing method using pyrosequencing and an efficient SNP typing method by BAMPER.

  13. Accurate evaluation and analysis of functional genomics data and methods

    PubMed Central

    Greene, Casey S.; Troyanskaya, Olga G.

    2016-01-01

    The development of technology capable of inexpensively performing large-scale measurements of biological systems has generated a wealth of data. Integrative analysis of these data holds the promise of uncovering gene function, regulation, and, in the longer run, understanding complex disease. However, their analysis has proved very challenging, as it is difficult to quickly and effectively assess the relevance and accuracy of these data for individual biological questions. Here, we identify biases that present challenges for the assessment of functional genomics data and methods. We then discuss evaluation methods that, taken together, begin to address these issues. We also argue that the funding of systematic data-driven experiments and of high-quality curation efforts will further improve evaluation metrics so that they more-accurately assess functional genomics data and methods. Such metrics will allow researchers in the field of functional genomics to continue to answer important biological questions in a data-driven manner. PMID:22268703

  14. Biology and genomic analysis of Clostridium botulinum.

    PubMed

    Peck, Michael W

    2009-01-01

    The ability to form botulinum neurotoxin is restricted to six phylogenetically and physiologically distinct bacteria (Clostridium botulinum Groups I-IV and some strains of C. baratii and C. butyricum). The botulinum neurotoxin is the most potent toxin known, with as little as 30-100 ng potentially fatal, and is responsible for botulism, a severe neuroparalytic disease that affects humans, animals, and birds. In order to minimize the hazards presented by the botulinum neurotoxin-forming clostridia, it is necessary to extend understanding of the biology of these bacteria. Analyses of recently available genome sequences in conjunction with studies of bacterial physiology are beginning to reveal new and exciting information on the biology of these dangerous bacteria. At the whole organism level, substantial differences between the six botulinum neurotoxin-forming clostridia have been reported. For example, the genomes of proteolytic C. botulinum (C. botulinum Group I) and non-proteolytic C. botulinum (C. botulinum Group II) are highly diverged and show neither synteny nor homology. It has also emerged that the botulinum neurotoxin-forming clostridia are not overtly pathogenic (unlike C. difficile), but saprophytic bacteria that use the neurotoxin to kill a host and create a source of nutrients. One important feature that has contributed to the success of botulinum neurotoxin-forming clostridia is their ability to form highly resistant endospores. The spores, however, also present an opportunity to control these bacteria if escape from lag phase (and hence growth) can be prevented. This is dependent on extending understanding of the biology of these processes. Differences in the genetics and physiology of spore germination in proteolytic C. botulinum and non-proteolytic C. botulinum have been identified. The biological variability in lag phase and its stages has been described for individual spores, and it has been shown that various adverse treatments extend different

  15. Geometric multi-resolution analysis for dictionary learning

    NASA Astrophysics Data System (ADS)

    Maggioni, Mauro; Minsker, Stanislav; Strawn, Nate

    2015-09-01

    We present an efficient algorithm and theory for Geometric Multi-Resolution Analysis (GMRA), a procedure for dictionary learning. Sparse dictionary learning provides the necessary complexity reduction for the critical applications of compression, regression, and classification in high-dimensional data analysis. As such, it is a critical technique in data science and it is important to have techniques that admit both efficient implementation and strong theory for large classes of theoretical models. By construction, GMRA is computationally efficient and in this paper we describe how the GMRA correctly approximates a large class of plausible models (namely, the noisy manifolds).

  16. High-Resolution Melt Curve Analysis in Cancer Mutation Screen.

    PubMed

    Mehrotra, Meenakshi; Patel, Keyur P

    2016-01-01

    High-resolution melt (HRM) curve analysis is a PCR-based assay that identifies sequence alterations based on subtle variations in the melting curves of mutated versus wild-type DNA sequences. HRM analysis is a high-throughput, sensitive, and efficient alternative to Sanger sequencing and is used to assess for mutations in clinically important genes involved in cancer diagnosis. The technique involves PCR amplification of a target sequence in the presence of a fluorescent double-stranded DNA (dsDNA) binding dye, melting of the fluorescent amplicons, and subsequent interpretation of melt curve profiles.

  17. High-resolution copy number variation analysis of schizophrenia in Japan.

    PubMed

    Kushima, I; Aleksic, B; Nakatochi, M; Shimamura, T; Shiino, T; Yoshimi, A; Kimura, H; Takasaki, Y; Wang, C; Xing, J; Ishizuka, K; Oya-Ito, T; Nakamura, Y; Arioka, Y; Maeda, T; Yamamoto, M; Yoshida, M; Noma, H; Hamada, S; Morikawa, M; Uno, Y; Okada, T; Iidaka, T; Iritani, S; Yamamoto, T; Miyashita, M; Kobori, A; Arai, M; Itokawa, M; Cheng, M-C; Chuang, Y-A; Chen, C-H; Suzuki, M; Takahashi, T; Hashimoto, R; Yamamori, H; Yasuda, Y; Watanabe, Y; Nunokawa, A; Someya, T; Ikeda, M; Toyota, T; Yoshikawa, T; Numata, S; Ohmori, T; Kunimoto, S; Mori, D; Iwata, N; Ozaki, N

    2017-03-01

    Recent schizophrenia (SCZ) studies have reported an increased burden of de novo copy number variants (CNVs) and identified specific high-risk CNVs, although with variable phenotype expressivity. However, the pathogenesis of SCZ has not been fully elucidated. Using array comparative genomic hybridization, we performed a high-resolution genome-wide CNV analysis on a mainly (92%) Japanese population (1699 SCZ cases and 824 controls) and identified 7066 rare CNVs, 70.0% of which were small (<100 kb). Clinically significant CNVs were significantly more frequent in cases than in controls (odds ratio=3.04, P=9.3 × 10(-9), 9.0% of cases). We confirmed a significant association of X-chromosome aneuploidies with SCZ and identified 11 de novo CNVs (e.g., MBD5 deletion) in cases. In patients with clinically significant CNVs, 41.7% had a history of congenital/developmental phenotypes, and the rate of treatment resistance was significantly higher (odds ratio=2.79, P=0.0036). We found more severe clinical manifestations in patients with two clinically significant CNVs. Gene set analysis replicated previous findings (e.g., synapse, calcium signaling) and identified novel biological pathways including oxidative stress response, genomic integrity, kinase and small GTPase signaling. Furthermore, involvement of multiple SCZ candidate genes and biological pathways in the pathogenesis of SCZ was suggested in established SCZ-associated CNV loci. Our study shows the high genetic heterogeneity of SCZ and its clinical features and raises the possibility that genomic instability is involved in its pathogenesis, which may be related to the increased burden of de novo CNVs and variable expressivity of CNVs.

  18. Genomic Analysis of Attenuation in Pandemic Vibrio parahaemolyticus

    NASA Astrophysics Data System (ADS)

    Pinnell, L. J.; Tallman, J. J., III; Turner, J.

    2016-02-01

    A critical problem in the prevention and treatment of infectious disease is the ability to differentiate virulent from avirulent bacterial strains. The distinction is commonly based on the presence or absence of specific virulence-associated genes. Alternately, serotypic or phylogenetic typing can accurately differentiate virulent from avirulent strains. When these approaches fail, more discriminatory analysis is needed. Pandemic Vibiro parahaemolyticus, distinguishable by genotyping (thermostable direct hemolysin or tdh), serotyping (O3:K6) and multilocus sequence typing (ST3), is regarded as a highly virulent clonal complex. We have previously shown, through population genetics and cytotoxicity testing, that some pandemic strains isolated from environmental sources are avirulent. To investigate the basis for attenuation, we sequenced the draft genomes of 10 pandemic V. parahaemolyticus isolates originating from environmental (N = 7) and clinical sources (N = 3). Genomic comparison of these 10 draft genomes, and the pandemic type strain (RIMD2210633), revealed a large core genome (5,158,719 bp) and a much smaller accessory genome (141,403 bp). The accessory genome was largely comprised of hypothetical proteins; however, several genes encoded phage-related proteins. Phylogenetic analysis, based on 2,902 single nucleotide polymorphisms in the core genome, did not reveal a discernable pattern. Current efforts are focused on the identification of insertions, deletions and point mutations that may alter protein expression or protein function. Preliminary results show that attenuated strains lack the virulence-associated vacB gene (VP1890). This gene encodes a 741 amino acid exoribonuclease homologous to exoribonucleases known to modulate virulence in Salmonella enterica and Helicobacter pylori. The correlation between attenuation and the absence of this gene, suggests that VP1890 plays an important role in human pathogenesis.

  19. Evolution Analysis of Simple Sequence Repeats in Plant Genome.

    PubMed

    Qin, Zhen; Wang, Yanping; Wang, Qingmei; Li, Aixian; Hou, Fuyun; Zhang, Liming

    2015-01-01

    Simple sequence repeats (SSRs) are widespread units on genome sequences, and play many important roles in plants. In order to reveal the evolution of plant genomes, we investigated the evolutionary regularities of SSRs during the evolution of plant species and the plant kingdom by analysis of twelve sequenced plant genome sequences. First, in the twelve studied plant genomes, the main SSRs were those which contain repeats of 1-3 nucleotides combination. Second, in mononucleotide SSRs, the A/T percentage gradually increased along with the evolution of plants (except for P. patens). With the increase of SSRs repeat number the percentage of A/T in C. reinhardtii had no significant change, while the percentage of A/T in terrestrial plants species gradually declined. Third, in dinucleotide SSRs, the percentage of AT/TA increased along with the evolution of plant kingdom and the repeat number increased in terrestrial plants species. This trend was more obvious in dicotyledon than monocotyledon. The percentage of CG/GC showed the opposite pattern to the AT/TA. Forth, in trinucleotide SSRs, the percentages of combinations including two or three A/T were in a rising trend along with the evolution of plant kingdom; meanwhile with the increase of SSRs repeat number in plants species, different species chose different combinations as dominant SSRs. SSRs in C. reinhardtii, P. patens, Z. mays and A. thaliana showed their specific patterns related to evolutionary position or specific changes of genome sequences. The results showed that, SSRs not only had the general pattern in the evolution of plant kingdom, but also were associated with the evolution of the specific genome sequence. The study of the evolutionary regularities of SSRs provided new insights for the analysis of the plant genome evolution.

  20. Dyneins Across Eukaryotes: A Comparative Genomic Analysis

    PubMed Central

    Wickstead, Bill; Gull, Keith

    2007-01-01

    Dyneins are large minus-end-directed microtubule motors. Each dynein contains at least one dynein heavy chain (DHC) and a variable number of intermediate chains (IC), light intermediate chains (LIC) and light chains (LC). Here, we used genome sequence data from 24 diverse eukaryotes to assess the distribution of DHCs, ICs, LICs and LCs across Eukaryota. Phylogenetic inference identified nine DHC families (two cytoplasmic and seven axonemal) and six IC families (one cytoplasmic). We confirm that dyneins have been lost from higher plants and show that this is most likely because of a single loss of cytoplasmic dynein 1 from the ancestor of Rhodophyta and Viridiplantae, followed by lineage-specific losses of other families. Independent losses in Entamoeba mean that at least three extant eukaryotic lineages are entirely devoid of dyneins. Cytoplasmic dynein 2 is associated with intraflagellar transport (IFT), but in two chromalveolate organisms, we find an IFT footprint without the retrograde motor. The distribution of one family of outer-arm dyneins accounts for 2-headed or 3-headed outer-arm ultrastructures observed in different organisms. One diatom species builds motile axonemes without any inner-arm dyneins (IAD), and the unexpected conservation of IAD I1 in non-flagellate algae and LC8 (DYNLL1/2) in all lineages reveals a surprising fluidity to dynein function. PMID:17897317

  1. Functional Genomic Analysis of C. elegans Molting

    PubMed Central

    Frand, Alison R; Russel, Sascha

    2005-01-01

    Although the molting cycle is a hallmark of insects and nematodes, neither the endocrine control of molting via size, stage, and nutritional inputs nor the enzymatic mechanism for synthesis and release of the exoskeleton is well understood. Here, we identify endocrine and enzymatic regulators of molting in C. elegans through a genome-wide RNA-interference screen. Products of the 159 genes discovered include annotated transcription factors, secreted peptides, transmembrane proteins, and extracellular matrix enzymes essential for molting. Fusions between several genes and green fluorescent protein show a pulse of expression before each molt in epithelial cells that synthesize the exoskeleton, indicating that the corresponding proteins are made in the correct time and place to regulate molting. We show further that inactivation of particular genes abrogates expression of the green fluorescent protein reporter genes, revealing regulatory networks that might couple the expression of genes essential for molting to endocrine cues. Many molting genes are conserved in parasitic nematodes responsible for human disease, and thus represent attractive targets for pesticide and pharmaceutical development. PMID:16122351

  2. High-resolution analysis of cis-acting regulatory networks at the α-globin locus.

    PubMed

    Hughes, Jim R; Lower, Karen M; Dunham, Ian; Taylor, Stephen; De Gobbi, Marco; Sloane-Stanley, Jacqueline A; McGowan, Simon; Ragoussis, Jiannis; Vernimmen, Douglas; Gibbons, Richard J; Higgs, Douglas R

    2013-01-01

    We have combined the circular chromosome conformation capture protocol with high-throughput, genome-wide sequence analysis to characterize the cis-acting regulatory network at a single locus. In contrast to methods which identify large interacting regions (10-1000 kb), the 4C approach provides a comprehensive, high-resolution analysis of a specific locus with the aim of defining, in detail, the cis-regulatory elements controlling a single gene or gene cluster. Using the human α-globin locus as a model, we detected all known local and long-range interactions with this gene cluster. In addition, we identified two interactions with genes located 300 kb (NME4) and 625 kb (FAM173a) from the α-globin cluster.

  3. Revealing misassembled segments in the bovine reference genome by high resolution linkage disequilibrium scan

    USDA-ARS?s Scientific Manuscript database

    Misassembly signatures, created by shuffling the order of sequences while assembling a genome, can be easily seen by analyzing the unexpected behaviour of the linkage disequilibrium (LD) decay. A heuristic process was proposed to identify those misassembly signatures and presented the ones found in ...

  4. A High-Resolution View of Genome-Wide Pneumococcal Transformation

    PubMed Central

    Croucher, Nicholas J.; Harris, Simon R.; Barquist, Lars; Parkhill, Julian; Bentley, Stephen D.

    2012-01-01

    Transformation is an important mechanism of microbial evolution through which bacteria have been observed to rapidly adapt in response to clinical interventions; examples include facilitating vaccine evasion and the development of penicillin resistance in the major respiratory pathogen Streptococcus pneumoniae. To characterise the process in detail, the genomes of 124 S. pneumoniae isolates produced through in vitro transformation were sequenced and recombination events detected. Those recombinations importing the selected marker were independent of unselected events elsewhere in the genome, the positions of which were not significantly affected by local sequence similarity between donor and recipient or mismatch repair processes. However, both types of recombinations were sometimes mosaic, with multiple non-contiguous segments originating from the same molecule of donor DNA. The lengths of the unselected events were exponentially distributed with a mean of 2.3 kb, implying that recombinations are stochastically resolved with a fixed per base probability of 4.4×10−4 bp−1. This distribution of recombination sizes, coupled with an observed under representation of large insertions within transferred sequence, suggests transformation has the potential to reduce the size of bacterial genomes, and is unlikely to act as an efficient mechanism for the uptake of accessory genomic loci. PMID:22719250

  5. High-resolution genetic mapping of maize pan-genome sequence anchors

    USDA-ARS?s Scientific Manuscript database

    In addition to single-nucleotide polymorphisms, structural variation is abundant in many plant genomes. The structural variation across a species can be represented by a ‘pan-genome’, which is essential to fully understand the genetic control of phenotypes. However, the pan-genome’s complexity hinde...

  6. Castor Bean Organelle Genome Sequencing and Worldwide Genetic Diversity Analysis

    PubMed Central

    Chan, Agnes P.; Williams, Amber L.; Rice, Danny W.; Liu, Xinyue; Melake-Berhan, Admasu; Huot Creasy, Heather; Puiu, Daniela; Rosovitz, M. J.; Khouri, Hoda M.; Beckstrom-Sternberg, Stephen M.; Allan, Gerard J.; Keim, Paul; Ravel, Jacques; Rabinowicz, Pablo D.

    2011-01-01

    Castor bean is an important oil-producing plant in the Euphorbiaceae family. Its high-quality oil contains up to 90% of the unusual fatty acid ricinoleate, which has many industrial and medical applications. Castor bean seeds also contain ricin, a highly toxic Type 2 ribosome-inactivating protein, which has gained relevance in recent years due to biosafety concerns. In order to gain knowledge on global genetic diversity in castor bean and to ultimately help the development of breeding and forensic tools, we carried out an extensive chloroplast sequence diversity analysis. Taking advantage of the recently published genome sequence of castor bean, we assembled the chloroplast and mitochondrion genomes extracting selected reads from the available whole genome shotgun reads. Using the chloroplast reference genome we used the methylation filtration technique to readily obtain draft genome sequences of 7 geographically and genetically diverse castor bean accessions. These sequence data were used to identify single nucleotide polymorphism markers and phylogenetic analysis resulted in the identification of two major clades that were not apparent in previous population genetic studies using genetic markers derived from nuclear DNA. Two distinct sub-clades could be defined within each major clade and large-scale genotyping of castor bean populations worldwide confirmed previously observed low levels of genetic diversity and showed a broad geographic distribution of each sub-clade. PMID:21750729

  7. Fine mapping by composite genome-wide association analysis.

    PubMed

    Casellas, Joaquim; Cañas-Álvarez, Jhon Jacobo; Fina, Marta; Piedrafita, Jesús; Cecchinato, Alessio

    2017-06-06

    Genome-wide association (GWA) studies play a key role in current genetics research, unravelling genomic regions linked to phenotypic traits of interest in multiple species. Nevertheless, the extent of linkage disequilibrium (LD) may provide confounding results when significant genetic markers span along several contiguous cM. In this study, we have adapted the composite interval mapping approach to the GWA framework (composite GWA), in order to evaluate the impact of including competing (possibly linked) genetic markers when testing for the additive allelic effect inherent to a given genetic marker. We tested model performance on simulated data sets under different scenarios (i.e., qualitative trait loci effects, LD between genetic markers and width of the genomic region involved in the analysis). Our results showed that the genomic region had a small impact on the number of competing single nucleotide polymorphisms (SNPs) as well as on the precision of the composite GWA analysis. A similar conclusion was derived from the preferable range of LD between the tested SNP and competing SNPs, although moderate-to-high LD seemed to attenuate the loss of statistical power. The composite GWA improved specificity and reduced the number of significant genetic markers. The composite GWA model contributes a novel point of view for GWA analyses where testing circumscribed to the genomic region flanking each SNP (delimited by the nearest competing SNPs) and conditioning on linked markers increases the precision to locate causal mutations, but possibly at the expense of power.

  8. Castor bean organelle genome sequencing and worldwide genetic diversity analysis.

    PubMed

    Rivarola, Maximo; Foster, Jeffrey T; Chan, Agnes P; Williams, Amber L; Rice, Danny W; Liu, Xinyue; Melake-Berhan, Admasu; Huot Creasy, Heather; Puiu, Daniela; Rosovitz, M J; Khouri, Hoda M; Beckstrom-Sternberg, Stephen M; Allan, Gerard J; Keim, Paul; Ravel, Jacques; Rabinowicz, Pablo D

    2011-01-01

    Castor bean is an important oil-producing plant in the Euphorbiaceae family. Its high-quality oil contains up to 90% of the unusual fatty acid ricinoleate, which has many industrial and medical applications. Castor bean seeds also contain ricin, a highly toxic Type 2 ribosome-inactivating protein, which has gained relevance in recent years due to biosafety concerns. In order to gain knowledge on global genetic diversity in castor bean and to ultimately help the development of breeding and forensic tools, we carried out an extensive chloroplast sequence diversity analysis. Taking advantage of the recently published genome sequence of castor bean, we assembled the chloroplast and mitochondrion genomes extracting selected reads from the available whole genome shotgun reads. Using the chloroplast reference genome we used the methylation filtration technique to readily obtain draft genome sequences of 7 geographically and genetically diverse castor bean accessions. These sequence data were used to identify single nucleotide polymorphism markers and phylogenetic analysis resulted in the identification of two major clades that were not apparent in previous population genetic studies using genetic markers derived from nuclear DNA. Two distinct sub-clades could be defined within each major clade and large-scale genotyping of castor bean populations worldwide confirmed previously observed low levels of genetic diversity and showed a broad geographic distribution of each sub-clade.

  9. High resolution coherence analysis between planetary and climate oscillations

    NASA Astrophysics Data System (ADS)

    Scafetta, Nicola

    2016-05-01

    This study investigates the existence of a multi-frequency spectral coherence between planetary and global surface temperature oscillations by using advanced techniques of coherence analysis and statistical significance tests. The performance of the standard Matlab mscohere algorithms is compared versus high resolution coherence analysis methodologies such as the canonical correlation analysis. The Matlab mscohere function highlights large coherence peaks at 20 and 60-year periods although, due to the shortness of the global surface temperature record (1850-2014), the statistical significance of the result depends on the specific window function adopted for pre-processing the data. In fact, window functions disrupt the low frequency component of the spectrum. On the contrary, using the canonical correlation analysis at least five coherent frequencies at the 95% significance level are found at the following periods: 6.6, 7.4, 14, 20 and 60 years. Thus, high resolution coherence analysis confirms that the climate system can be partially modulated by astronomical forces of gravitational, electromagnetic and solar origin. A possible chain of the physical causes explaining this coherence is briefly discussed.

  10. Phylogenetic Resolution in Juglans Based on Complete Chloroplast Genomes and Nuclear DNA Sequences.

    PubMed

    Dong, Wenpan; Xu, Chao; Li, Wenqing; Xie, Xiaoman; Lu, Yizeng; Liu, Yanlei; Jin, Xiaobai; Suo, Zhili

    2017-01-01

    Walnuts (Juglans of the Juglandaceae) are well-known economically important resource plants for the edible nuts, high-quality wood, and medicinal use, with a distribution from tropical to temperate zones and from Asia to Europe and Americas. There are about 21 species in Juglans. Classification of Juglans at section level is problematic, because the phylogenetic position of Juglans cinerea is disputable. Lacking morphological and DNA markers severely inhibited the development of related researches. In this study, the complete chloroplast genomes and two nuclear DNA regions (the internal transcribed spacer and ubiquitin ligase gene) of 10 representative taxa of Juglans were used for comparative genomic analyses in order to deepen the understanding on the application value of genetic information for inferring the phylogenetic relationship of the genus. The Juglans chloroplast genomes possessed the typical quadripartite structure of angiosperms, consisting of a pair of inverted repeat regions separated by a large single-copy region and a small single-copy region. All the 10 chloroplast genomes possessed 112 unique genes arranged in the same order, including 78 protein-coding, 30 tRNA, and 4 rRNA genes. A combined sequence data set from two nuclear DNA regions revealed that Juglans plants could be classified into three branches: (1) section Juglans, (2) section Cardiocaryon including J. cinerea which is closer to J. mandshurica, and (3) section Rhysocaryon. However, three branches with a different phylogenetic topology were recognized in Juglans using the complete chloroplast genome sequences: (1) section Juglans, (2) section Cardiocaryon, and (3) section Rhysocaryon plus J. cinerea. The molecular taxonomy of Juglans is almost compatible to the morphological taxonomy except J. cinerea (section Trachycaryon). Based on the complete chloroplast genome sequence data, the divergence time between section Juglans and section Cardiocaryon was 44.77 Mya, while section

  11. Phylogenetic Resolution in Juglans Based on Complete Chloroplast Genomes and Nuclear DNA Sequences

    PubMed Central

    Dong, Wenpan; Xu, Chao; Li, Wenqing; Xie, Xiaoman; Lu, Yizeng; Liu, Yanlei; Jin, Xiaobai; Suo, Zhili

    2017-01-01

    Walnuts (Juglans of the Juglandaceae) are well-known economically important resource plants for the edible nuts, high-quality wood, and medicinal use, with a distribution from tropical to temperate zones and from Asia to Europe and Americas. There are about 21 species in Juglans. Classification of Juglans at section level is problematic, because the phylogenetic position of Juglans cinerea is disputable. Lacking morphological and DNA markers severely inhibited the development of related researches. In this study, the complete chloroplast genomes and two nuclear DNA regions (the internal transcribed spacer and ubiquitin ligase gene) of 10 representative taxa of Juglans were used for comparative genomic analyses in order to deepen the understanding on the application value of genetic information for inferring the phylogenetic relationship of the genus. The Juglans chloroplast genomes possessed the typical quadripartite structure of angiosperms, consisting of a pair of inverted repeat regions separated by a large single-copy region and a small single-copy region. All the 10 chloroplast genomes possessed 112 unique genes arranged in the same order, including 78 protein-coding, 30 tRNA, and 4 rRNA genes. A combined sequence data set from two nuclear DNA regions revealed that Juglans plants could be classified into three branches: (1) section Juglans, (2) section Cardiocaryon including J. cinerea which is closer to J. mandshurica, and (3) section Rhysocaryon. However, three branches with a different phylogenetic topology were recognized in Juglans using the complete chloroplast genome sequences: (1) section Juglans, (2) section Cardiocaryon, and (3) section Rhysocaryon plus J. cinerea. The molecular taxonomy of Juglans is almost compatible to the morphological taxonomy except J. cinerea (section Trachycaryon). Based on the complete chloroplast genome sequence data, the divergence time between section Juglans and section Cardiocaryon was 44.77 Mya, while section

  12. A customized high-resolution array-comparative genomic hybridization to explore copy number variations in Parkinson's disease.

    PubMed

    La Cognata, Valentina; Morello, Giovanna; Gentile, Giulia; D'Agata, Velia; Criscuolo, Chiara; Cavalcanti, Francesca; Cavallaro, Sebastiano

    2016-10-01

    Parkinson's disease (PD), the second most common progressive neurodegenerative disorder, was long believed to be a non-genetic sporadic syndrome. Today, only a small percentage of PD cases with genetic inheritance patterns are known, often complicated by reduced penetrance and variable expressivity. The few well-characterized Mendelian genes, together with a number of risk factors, contribute to the major sporadic forms of the disease, thus delineating an intricate genetic profile at the basis of this debilitating and incurable condition. Along with single nucleotide changes, gene-dosage abnormalities and copy number variations (CNVs) have emerged as significant disease-causing mutations in PD. However, due to their size variability and to the quantitative nature of the assay, CNV genotyping is particularly challenging. For this reason, innovative high-throughput platforms and bioinformatics algorithms are increasingly replacing classical CNV detection methods. Here, we report the design strategy, development, validation and implementation of NeuroArray, a customized exon-centric high-resolution array-based comparative genomic hybridization (aCGH) tailored to detect single/multi-exon deletions and duplications in a large panel of PD-related genes. This targeted design allows for a focused evaluation of structural imbalances in clinically relevant PD genes, combining exon-level resolution with genome-wide coverage. The NeuroArray platform may offer new insights in elucidating inherited potential or de novo structural alterations in PD patients and investigating new candidate genes.

  13. Whole genome sequencing and methylome analysis of the wild guinea pig.

    PubMed

    Weyrich, Alexandra; Schüllermann, Tino; Heeger, Felix; Jeschek, Marie; Mazzoni, Camila J; Chen, Wei; Schumann, Kathrin; Fickel, Joerns

    2014-11-28

    DNA methylation is a heritable mechanism that acts in response to environmental changes, lifestyle and diseases by influencing gene expression in eukaryotes. Epigenetic studies of wild organisms are mandatory to understand their role in e.g. adaptational processes in the great variety of ecological niches. However, strategies to address those questions on a methylome scale are widely missing. In this study we present such a strategy and describe a whole genome sequence and methylome analysis of the wild guinea pig. We generated a full Wild guinea pig (Cavia aperea) genome sequence with enhanced coverage of methylated regions, benefiting from the available sequence of the domesticated relative Cavia porcellus. This new genome sequence was then used as reference to map the sequence reads of bisulfite treated Wild guinea pig sequencing libraries to investigate DNA-methylation patterns at nucleotide-specific level, by using our here described method, named 'DNA-enrichment-bisulfite-sequencing' (MEBS). The results achieved using MEBS matched those of standard methods in other mammalian model species. The technique is cost efficient, and incorporates both methylation enrichment results and a nucleotide-specific resolution even without a whole genome sequence available. Thus MEBS can be easily applied to extend methylation enrichment studies to a nucleotide-specific level. The approach is suited to study methylomes of not yet sequenced mammals at single nucleotide resolution. The strategy is transferable to other mammalian species by applying the nuclear genome sequence of a close relative. It is therefore of interest for studies on a variety of wild species trying to answer evolutionary, adaptational, ecological or medical questions by epigenetic mechanisms.

  14. Computational analysis of high resolution unsteady airloads for rotor aeroacoustics

    NASA Technical Reports Server (NTRS)

    Quackenbush, Todd R.; Lam, C.-M. Gordon; Wachspress, Daniel A.; Bliss, Donald B.

    1994-01-01

    The study of helicopter aerodynamic loading for acoustics applications requires the application of efficient yet accurate simulations of the velocity field induced by the rotor's vortex wake. This report summarizes work to date on the development of such an analysis, which builds on the Constant Vorticity Contour (CVC) free wake model, previously implemented for the study of vibratory loading in the RotorCRAFT computer code. The present effort has focused on implementation of an airload reconstruction approach that computes high resolution airload solutions of rotor/rotor-wake interactions required for acoustics computations. Supplementary efforts on the development of improved vortex core modeling, unsteady aerodynamic effects, higher spatial resolution of rotor loading, and fast vortex wake implementations have substantially enhanced the capabilities of the resulting software, denoted RotorCRAFT/AA (AeroAcoustics). Results of validation calculations using recently acquired model rotor data show that by employing airload reconstruction it is possible to apply the CVC wake analysis with temporal and spatial resolution suitable for acoustics applications while reducing the computation time required by one to two orders of magnitude relative to that required by direct calculations. Promising correlation with this body of airload and noise data has been obtained for a variety of rotor configurations and operating conditions.

  15. Detecting Genomic Signatures of Natural Selection with Principal Component Analysis: Application to the 1000 Genomes Data.

    PubMed

    Duforet-Frebourg, Nicolas; Luu, Keurcien; Laval, Guillaume; Bazin, Eric; Blum, Michael G B

    2016-04-01

    To characterize natural selection, various analytical methods for detecting candidate genomic regions have been developed. We propose to perform genome-wide scans of natural selection using principal component analysis (PCA). We show that the common FST index of genetic differentiation between populations can be viewed as the proportion of variance explained by the principal components. Considering the correlations between genetic variants and each principal component provides a conceptual framework to detect genetic variants involved in local adaptation without any prior definition of populations. To validate the PCA-based approach, we consider the 1000 Genomes data (phase 1) considering 850 individuals coming from Africa, Asia, and Europe. The number of genetic variants is of the order of 36 millions obtained with a low-coverage sequencing depth (3×). The correlations between genetic variation and each principal component provide well-known targets for positive selection (EDAR, SLC24A5, SLC45A2, DARC), and also new candidate genes (APPBPP2, TP1A1, RTTN, KCNMA, MYO5C) and noncoding RNAs. In addition to identifying genes involved in biological adaptation, we identify two biological pathways involved in polygenic adaptation that are related to the innate immune system (beta defensins) and to lipid metabolism (fatty acid omega oxidation). An additional analysis of European data shows that a genome scan based on PCA retrieves classical examples of local adaptation even when there are no well-defined populations. PCA-based statistics, implemented in the PCAdapt R package and the PCAdapt fast open-source software, retrieve well-known signals of human adaptation, which is encouraging for future whole-genome sequencing project, especially when defining populations is difficult. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  16. Universal multifractal analysis of high-resolution snowfall data

    NASA Astrophysics Data System (ADS)

    Raupach, Timothy; Gires, Auguste; Tchiguirinskaia, Ioulia; Schertzer, Daniel; Berne, Alexis

    2016-04-01

    Universal multifractal analysis offers useful insights into the scaling properties of precipitation data. While much work has been done on the scaling properties of rainfall fields, less is known about the scaling properties of solid precipitation such as snowfall, especially at high resolution. We present results of a universal multifractal (UM) analysis of high-resolution solid precipitation data. The data were recorded using a 2D-video-disdrometer (2DVD) situated in the Swiss Alps. Analysis was performed on a one-hour period of snowfall, during which time the mean wind speed was zero, temperatures were low, and no hail was detected. The 2DVD recorded information on individual particles, from which we calculated snow mass. Three "cuts" of the spatio-temporal snowfall process were analysed using the UM framework. First, high-resolution timeseries of precipitation intensity at 100 ms temporal resolution were analysed. These results show two scaling regimes with a transition area between them. Second, we analysed reconstructed vertical columns of particle concentration and snow mass, assuming no horizontal wind and constant vertical velocity (equal to the one recorded on the ground). Strong scaling was observed in the particle concentration fields, with the influence of large (and therefore rare) snowflakes degrading the quality of the scaling observed for higher moments of the particle distribution. There was a clear difference between the measured fields and fields in which the vertical distribution of particles was made homogeneous, indicating that the measured snowfall fields contained non-homogeneous fields. Scaling behaviour was observed down to vertical scales of about 0.5 m, which is similar to published results using rain data. Finally, we used the UM framework to investigate the scaling properties of 2D maps of snow accumulation over a subset of the instrument collection area of 5.12 x 5.12 cm^2. As expected from the vertical column analysis, given that

  17. Integrated translational genomics for analysis of complex traits in sorghum

    USDA-ARS?s Scientific Manuscript database

    We will report on the integration of sequencing and genotype data from natural variation (by whole genome resequencing [wgs] or genotype by sequencing [gbs]), transcriptome (RNA-seq) and mutant analysis (also by wgs) with the goal of identifying genes controlling important agronomic traits and tran...

  18. GENOMIC ANALYSIS OF THE TESTICULAR TOXICITY OF HALOACETIC ACIDS

    EPA Science Inventory

    Genomic analysis of the testicular toxicity of haloacetic acids

    David J. Dix and John C. Rockett
    Reproductive Toxicology Division, National Health and Environmental Effects Research Laboratory, Office of Research and Development, U.S. Environmental Protection Agency, R...

  19. Thyroid insufficiency in developing rat brain: A genomic analysis.

    EPA Science Inventory

    Thyroid Insufficiency in the Developing Rat Brain: A Genomic Analysis. JE Royland and ME Gilbert, Neurotox. Div., U.S. EPA, RTP, NC, USA. Endocrine disruption (ED) is an area of major concern in environmental neurotoxicity. Severe deficits in thyroid hormone (TH) levels have bee...

  20. GENOMIC ANALYSIS OF THE TESTICULAR TOXICITY OF HALOACETIC ACIDS

    EPA Science Inventory

    Genomic analysis of the testicular toxicity of haloacetic acids

    David J. Dix and John C. Rockett
    Reproductive Toxicology Division, National Health and Environmental Effects Research Laboratory, Office of Research and Development, U.S. Environmental Protection Agency, R...

  1. Thyroid insufficiency in developing rat brain: A genomic analysis.

    EPA Science Inventory

    Thyroid Insufficiency in the Developing Rat Brain: A Genomic Analysis. JE Royland and ME Gilbert, Neurotox. Div., U.S. EPA, RTP, NC, USA. Endocrine disruption (ED) is an area of major concern in environmental neurotoxicity. Severe deficits in thyroid hormone (TH) levels have bee...

  2. High-resolution single-nucleotide polymorphism array-profiling in myeloproliferative neoplasms identifies novel genomic aberrations

    PubMed Central

    Stegelmann, Frank; Bullinger, Lars; Griesshammer, Martin; Holzmann, Karlheinz; Habdank, Marianne; Kuhn, Susanne; Maile, Carmen; Schauer, Stefanie; Döhner, Hartmut; Döhner, Konstanze

    2010-01-01

    Single-nucleotide polymorphism arrays allow for genome-wide profiling of copy-number alterations and copy-neutral runs of homozygosity at high resolution. To identify novel genetic lesions in myeloproliferative neoplasms, a large series of 151 clinically well characterized patients was analyzed in our study. Copy-number alterations were rare in essential thrombocythemia and polycythemia vera. In contrast, approximately one third of myelofibrosis patients exhibited small genomic losses (less than 5 Mb). In 2 secondary myelofibrosis cases the tumor suppressor gene NF1 in 17q11.2 was affected. Sequencing analyses revealed a mutation in the remaining NF1 allele of one patient. In terms of copy-neutral aberrations, no chromosomes other than 9p were recurrently affected. In conclusion, novel genomic aberrations were identified in our study, in particular in patients with myelofibrosis. Further analyses on single-gene level are necessary to uncover the mechanisms that are involved in the pathogenesis of myeloproliferative neoplasms. PMID:20015882

  3. High-resolution profiling of gammaH2AX around DNA double strand breaks in the mammalian genome.

    PubMed

    Iacovoni, Jason S; Caron, Pierre; Lassadi, Imen; Nicolas, Estelle; Massip, Laurent; Trouche, Didier; Legube, Gaëlle

    2010-04-21

    Chromatin acts as a key regulator of DNA-related processes such as DNA damage repair. Although ChIP-chip is a powerful technique to provide high-resolution maps of protein-genome interactions, its use to study DNA double strand break (DSB) repair has been hindered by the limitations of the available damage induction methods. We have developed a human cell line that permits induction of multiple DSBs randomly distributed and unambiguously positioned within the genome. Using this system, we have generated the first genome-wide mapping of gammaH2AX around DSBs. We found that all DSBs trigger large gammaH2AX domains, which spread out from the DSB in a bidirectional, discontinuous and not necessarily symmetrical manner. The distribution of gammaH2AX within domains is influenced by gene transcription, as parallel mappings of RNA Polymerase II and strand-specific expression showed that gammaH2AX does not propagate on active genes. In addition, we showed that transcription is accurately maintained within gammaH2AX domains, indicating that mechanisms may exist to protect gene transcription from gammaH2AX spreading and from the chromatin rearrangements induced by DSBs.

  4. Genome-wide profiling of RNA polymerase transcription at nucleotide resolution in human cells with native elongating transcript sequencing

    PubMed Central

    Mayer, Andreas; Churchman, L. Stirling

    2017-01-01

    Many features of gene transcription in human cells remain unclear, mainly due to a lack of quantitative approaches to follow genome transcription with nucleotide precision in vivo. Here we present a robust genome-wide approach to study RNA polymerase (Pol) II-mediated transcription in human cells at single-nucleotide resolution by native elongating transcript sequencing (NET-seq). Elongating RNA polymerase and the associated nascent RNA is prepared by cell fractionation, avoiding immunoprecipitation or RNA labeling. The 3′-ends of nascent RNAs are captured through barcode linker ligation and converted into a DNA sequencing library. The identity and abundance of the 3′-ends are determined by high-throughput sequencing, revealing the exact genomic locations of Pol II. Human NET-seq can be applied to study the full spectrum of Pol II transcriptional activities, including the production of unstable RNAs and transcriptional pausing. Using the protocol described here, a NET-seq library can be obtained from human cells in 5 days. PMID:27010758

  5. Genome-Wide High-Resolution Mapping of UV-Induced Mitotic Recombination Events in Saccharomyces cerevisiae

    PubMed Central

    Yin, Yi; Petes, Thomas D.

    2013-01-01

    In the yeast Saccharomyces cerevisiae and most other eukaryotes, mitotic recombination is important for the repair of double-stranded DNA breaks (DSBs). Mitotic recombination between homologous chromosomes can result in loss of heterozygosity (LOH). In this study, LOH events induced by ultraviolet (UV) light are mapped throughout the genome to a resolution of about 1 kb using single-nucleotide polymorphism (SNP) microarrays. UV doses that have little effect on the viability of diploid cells stimulate crossovers more than 1000-fold in wild-type cells. In addition, UV stimulates recombination in G1-synchronized cells about 10-fold more efficiently than in G2-synchronized cells. Importantly, at high doses of UV, most conversion events reflect the repair of two sister chromatids that are broken at approximately the same position whereas at low doses, most conversion events reflect the repair of a single broken chromatid. Genome-wide mapping of about 380 unselected crossovers, break-induced replication (BIR) events, and gene conversions shows that UV-induced recombination events occur throughout the genome without pronounced hotspots, although the ribosomal RNA gene cluster has a significantly lower frequency of crossovers. PMID:24204306

  6. Performance analysis of multiple PRF technique for ambiguity resolution

    NASA Technical Reports Server (NTRS)

    Chang, C. Y.; Curlander, J. C.

    1992-01-01

    For short wavelength spaceborne synthetic aperture radar (SAR), ambiguity in Doppler centroid estimation occurs when the azimuth squint angle uncertainty is larger than the azimuth antenna beamwidth. Multiple pulse recurrence frequency (PRF) hopping is a technique developed to resolve the ambiguity by operating the radar in different PRF's in the pre-imaging sequence. Performance analysis results of the multiple PRF technique are presented, given the constraints of the attitude bound, the drift rate uncertainty, and the arbitrary numerical values of PRF's. The algorithm performance is derived in terms of the probability of correct ambiguity resolution. Examples, using the Shuttle Imaging Radar-C (SIR-C) and X-SAR parameters, demonstrate that the probability of correct ambiguity resolution obtained by the multiple PRF technique is greater than 95 percent and 80 percent for the SIR-C and X-SAR applications, respectively. The success rate is significantly higher than that achieved by the range cross correlation technique.

  7. Performance analysis of multiple PRF technique for ambiguity resolution

    NASA Technical Reports Server (NTRS)

    Chang, C. Y.; Curlander, J. C.

    1992-01-01

    For short wavelength spaceborne synthetic aperture radar (SAR), ambiguity in Doppler centroid estimation occurs when the azimuth squint angle uncertainty is larger than the azimuth antenna beamwidth. Multiple pulse recurrence frequency (PRF) hopping is a technique developed to resolve the ambiguity by operating the radar in different PRF's in the pre-imaging sequence. Performance analysis results of the multiple PRF technique are presented, given the constraints of the attitude bound, the drift rate uncertainty, and the arbitrary numerical values of PRF's. The algorithm performance is derived in terms of the probability of correct ambiguity resolution. Examples, using the Shuttle Imaging Radar-C (SIR-C) and X-SAR parameters, demonstrate that the probability of correct ambiguity resolution obtained by the multiple PRF technique is greater than 95 percent and 80 percent for the SIR-C and X-SAR applications, respectively. The success rate is significantly higher than that achieved by the range cross correlation technique.

  8. Sequencing and Analysis of Neanderthal Genomic DNA

    SciTech Connect

    Noonan, James P.; Coop, Graham; Kudaravalli, Sridhar; Smith,Doug; Krause, Johannes; Alessi, Joe; Chen, Feng; Platt, Darren; Paabo,Svante; Pritchard, Jonathan K.; Rubin, Edward M.

    2006-06-13

    Recovery and analysis of multiple Neanderthal autosomalsequences using a metagenomic approach reveals that modern humans andNeanderthals split ~;400,000 years ago, without significant evidence ofsubsequent admixture.

  9. A categorical analysis of coreference resolution errors in biomedical texts.

    PubMed

    Choi, Miji; Zobel, Justin; Verspoor, Karin

    2016-04-01

    Coreference resolution is an essential task in information extraction from the published biomedical literature. It supports the discovery of complex information by linking referring expressions such as pronouns and appositives to their referents, which are typically entities that play a central role in biomedical events. Correctly establishing these links allows detailed understanding of all the participants in events, and connecting events together through their shared participants. As an initial step towards the development of a novel coreference resolution system for the biomedical domain, we have categorised the characteristics of coreference relations by type of anaphor as well as broader syntactic and semantic characteristics, and have compared the performance of a domain adaptation of a state-of-the-art general system to published results from domain-specific systems in terms of this categorisation. We also develop a rule-based system for anaphoric coreference resolution in the biomedical domain with simple modules derived from available systems. Our results show that the domain-specific systems outperform the general system overall. Whilst this result is unsurprising, our proposed categorisation enables a detailed quantitative analysis of the system performance. We identify limitations of each system and find that there remain important gaps in the state-of-the-art systems, which are clearly identifiable with respect to the categorisation. We have analysed in detail the performance of existing coreference resolution systems for the biomedical literature and have demonstrated that there clear gaps in their coverage. The approach developed in the general domain needs to be tailored for portability to the biomedical domain. The specific framework for class-based error analysis of existing systems that we propose has benefits for identifying specific limitations of those systems. This in turn provides insights for further system development. Copyright © 2016

  10. Methy-Pipe: an integrated bioinformatics pipeline for whole genome bisulfite sequencing data analysis.

    PubMed

    Jiang, Peiyong; Sun, Kun; Lun, Fiona M F; Guo, Andy M; Wang, Huating; Chan, K C Allen; Chiu, Rossa W K; Lo, Y M Dennis; Sun, Hao

    2014-01-01

    DNA methylation, one of the most important epigenetic modifications, plays a crucial role in various biological processes. The level of DNA methylation can be measured using whole-genome bisulfite sequencing at single base resolution. However, until now, there is a paucity of publicly available software for carrying out integrated methylation data analysis. In this study, we implemented Methy-Pipe, which not only fulfills the core data analysis requirements (e.g. sequence alignment, differential methylation analysis, etc.) but also provides useful tools for methylation data annotation and visualization. Specifically, it uses Burrow-Wheeler Transform (BWT) algorithm to directly align bisulfite sequencing reads to a reference genome and implements a novel sliding window based approach with statistical methods for the identification of differentially methylated regions (DMRs). The capability of processing data parallelly allows it to outperform a number of other bisulfite alignment software packages. To demonstrate its utility and performance, we applied it to both real and simulated bisulfite sequencing datasets. The results indicate that Methy-Pipe can accurately estimate methylation densities, identify DMRs and provide a variety of utility programs for downstream methylation data analysis. In summary, Methy-Pipe is a useful pipeline that can process whole genome bisulfite sequencing data in an efficient, accurate, and user-friendly manner. Software and test dataset are available at http://sunlab.lihs.cuhk.edu.hk/methy-pipe/.

  11. BEDTools: the Swiss-army tool for genome feature analysis

    PubMed Central

    Quinlan, Aaron R.

    2014-01-01

    Technological advances have enabled the use of DNA sequencing as a flexible tool to characterize genetic variation and to measure the activity of diverse cellular phenomena such as gene isoform expression and transcription factor binding. Extracting biological insight from the experiments enabled by these advances demands the analysis of large, multi-dimensional datasets. This unit describes the use of the BEDTools toolkit for the exploration of high-throughput genomics datasets. I present several protocols for common genomic analyses and demonstrate how simple BEDTools operations may be combined to create bespoke pipelines addressing complex questions. PMID:25199790

  12. The Cancer Genome Atlas Pan-Cancer analysis project.

    PubMed

    Weinstein, John N; Collisson, Eric A; Mills, Gordon B; Shaw, Kenna R Mills; Ozenberger, Brad A; Ellrott, Kyle; Shmulevich, Ilya; Sander, Chris; Stuart, Joshua M

    2013-10-01

    The Cancer Genome Atlas (TCGA) Research Network has profiled and analyzed large numbers of human tumors to discover molecular aberrations at the DNA, RNA, protein and epigenetic levels. The resulting rich data provide a major opportunity to develop an integrated picture of commonalities, differences and emergent themes across tumor lineages. The Pan-Cancer initiative compares the first 12 tumor types profiled by TCGA. Analysis of the molecular aberrations and their functional roles across tumor types will teach us how to extend therapies effective in one cancer type to others with a similar genomic profile.

  13. Genome-Wide Single-Cell Analysis of Recombination Activity and de novo Mutation Rates in Human Sperm

    PubMed Central

    Wang, Jianbin; Fan, H. Christina; Behr, Barry; Quake, Stephen R.

    2012-01-01

    SUMMARY Meiotic recombination and de novo mutation are the two main contributions towards gamete genome diversity, and many questions remain about how an individual human’s genome is edited by these two processes. Here, we describe a high-throughput method for single-cell whole-genome analysis which was used to measure the genomic diversity in one individual’s gamete genomes. A microfluidic system was used for highly parallel sample processing and to minimize non-specific amplification. High-density genotyping results from 91 single cells were used to create a personal recombination map, which was consistent with population-wide data at low resolution but revealed significant differences from pedigree data at higher resolution. We used the data to test for meiotic drive and found evidence for gene conversion. High throughput sequencing on 31 single cells was used to measure the frequency of large-scale genome instability, and deeper sequencing of eight single cells revealed de novo mutation rates with distinct characteristics. PMID:22817899

  14. Advances in computer simulation of genome evolution: toward more realistic evolutionary genomics analysis by approximate bayesian computation.

    PubMed

    Arenas, Miguel

    2015-04-01

    NGS technologies present a fast and cheap generation of genomic data. Nevertheless, ancestral genome inference is not so straightforward due to complex evolutionary processes acting on this material such as inversions, translocations, and other genome rearrangements that, in addition to their implicit complexity, can co-occur and confound ancestral inferences. Recently, models of genome evolution that accommodate such complex genomic events are emerging. This letter explores these novel evolutionary models and proposes their incorporation into robust statistical approaches based on computer simulations, such as approximate Bayesian computation, that may produce a more realistic evolutionary analysis of genomic data. Advantages and pitfalls in using these analytical methods are discussed. Potential applications of these ancestral genomic inferences are also pointed out.

  15. Conflict Resolution in the Genome: How Transcription and Replication Make It Work.

    PubMed

    Hamperl, Stephan; Cimprich, Karlene A

    2016-12-01

    The complex machineries involved in replication and transcription translocate along the same DNA template, often in opposing directions and at different rates. These processes routinely interfere with each other in prokaryotes, and mounting evidence now suggests that RNA polymerase complexes also encounter replication forks in higher eukaryotes. Indeed, cells rely on numerous mechanisms to avoid, tolerate, and resolve such transcription-replication conflicts, and the absence of these mechanisms can lead to catastrophic effects on genome stability and cell viability. In this article, we review the cellular responses to transcription-replication conflicts and highlight how these inevitable encounters shape the genome and impact diverse cellular processes. Copyright © 2016 Elsevier Inc. All rights reserved.

  16. High-resolution interrogation of functional elements in the noncoding genome.

    PubMed

    Sanjana, Neville E; Wright, Jason; Zheng, Kaijie; Shalem, Ophir; Fontanillas, Pierre; Joung, Julia; Cheng, Christine; Regev, Aviv; Zhang, Feng

    2016-09-30

    The noncoding genome affects gene regulation and disease, yet we lack tools for rapid identification and manipulation of noncoding elements. We developed a CRISPR screen using ~18,000 single guide RNAs targeting >700 kilobases surrounding the genes NF1, NF2, and CUL3, which are involved in BRAF inhibitor resistance in melanoma. We find that noncoding locations that modulate drug resistance also harbor predictive hallmarks of noncoding function. With a subset of regions at the CUL3 locus, we demonstrate that engineered mutations alter transcription factor occupancy and long-range and local epigenetic environments, implicating these sites in gene regulation and chemotherapeutic resistance. Through our expansion of the potential of pooled CRISPR screens, we provide tools for genomic discovery and for elucidating biologically relevant mechanisms of gene regulation. Copyright © 2016, American Association for the Advancement of Science.

  17. High-Resolution Analysis of Cytosine Methylation in Ancient DNA

    PubMed Central

    Cropley, Jennifer E.; Cooper, Alan; Suter, Catherine M.

    2012-01-01

    Epigenetic changes to gene expression can result in heritable phenotypic characteristics that are not encoded in the DNA itself, but rather by biochemical modifications to the DNA or associated chromatin proteins. Interposed between genes and environment, these epigenetic modifications can be influenced by environmental factors to affect phenotype for multiple generations. This raises the possibility that epigenetic states provide a substrate for natural selection, with the potential to participate in the rapid adaptation of species to changes in environment. Any direct test of this hypothesis would require the ability to measure epigenetic states over evolutionary timescales. Here we describe the first single-base resolution of cytosine methylation patterns in an ancient mammalian genome, by bisulphite allelic sequencing of loci from late Pleistocene Bison priscus remains. Retrotransposons and the differentially methylated regions of imprinted loci displayed methylation patterns identical to those derived from fresh bovine tissue, indicating that methylation patterns are preserved in the ancient DNA. Our findings establish the biochemical stability of methylated cytosines over extensive time frames, and provide the first direct evidence that cytosine methylation patterns are retained in DNA from ancient specimens. The ability to resolve cytosine methylation in ancient DNA provides a powerful means to study the role of epigenetics in evolution. PMID:22276161

  18. Genome analysis of the platypus reveals unique signatures of evolution

    PubMed Central

    Warren, Wesley C.; Hillier, LaDeana W.; Marshall Graves, Jennifer A.; Birney, Ewan; Ponting, Chris P.; Grützner, Frank; Belov, Katherine; Miller, Webb; Clarke, Laura; Chinwalla, Asif T.; Yang, Shiaw-Pyng; Heger, Andreas; Locke, Devin P.; Miethke, Pat; Waters, Paul D.; Veyrunes, Frédéric; Fulton, Lucinda; Fulton, Bob; Graves, Tina; Wallis, John; Puente, Xose S.; López-Otín, Carlos; Ordóñez, Gonzalo R.; Eichler, Evan E.; Chen, Lin; Cheng, Ze; Deakin, Janine E.; Alsop, Amber; Thompson, Katherine; Kirby, Patrick; Papenfuss, Anthony T.; Wakefield, Matthew J.; Olender, Tsviya; Lancet, Doron; Huttley, Gavin A.; Smit, Arian F. A.; Pask, Andrew; Temple-Smith, Peter; Batzer, Mark A.; Walker, Jerilyn A.; Konkel, Miriam K.; Harris, Robert S.; Whittington, Camilla M.; Wong, Emily S. W.; Gemmell, Neil J.; Buschiazzo, Emmanuel; Vargas Jentzsch, Iris M.; Merkel, Angelika; Schmitz, Juergen; Zemann, Anja; Churakov, Gennady; Kriegs, Jan Ole; Brosius, Juergen; Murchison, Elizabeth P.; Sachidanandam, Ravi; Smith, Carly; Hannon, Gregory J.; Tsend-Ayush, Enkhjargal; McMillan, Daniel; Attenborough, Rosalind; Rens, Willem; Ferguson-Smith, Malcolm; Lefèvre, Christophe M.; Sharp, Julie A.; Nicholas, Kevin R.; Ray, David A.; Kube, Michael; Reinhardt, Richard; Pringle, Thomas H.; Taylor, James; Jones, Russell C.; Nixon, Brett; Dacheux, Jean-Louis; Niwa, Hitoshi; Sekita, Yoko; Huang, Xiaoqiu; Stark, Alexander; Kheradpour, Pouya; Kellis, Manolis; Flicek, Paul; Chen, Yuan; Webber, Caleb; Hardison, Ross; Nelson, Joanne; Hallsworth-Pepin, Kym; Delehaunty, Kim; Markovic, Chris; Minx, Pat; Feng, Yucheng; Kremitzki, Colin; Mitreva, Makedonka; Glasscock, Jarret; Wylie, Todd; Wohldmann, Patricia; Thiru, Prathapan; Nhan, Michael N.; Pohl, Craig S.; Smith, Scott M.; Hou, Shunfeng; Renfree, Marilyn B.; Mardis, Elaine R.; Wilson, Richard K.

    2009-01-01

    We present a draft genome sequence of the platypus, Ornithorhynchus anatinus. This monotreme exhibits a fascinating combination of reptilian and mammalian characters. For example, platypuses have a coat of fur adapted to an aquatic lifestyle; platypus females lactate, yet lay eggs; and males are equipped with venom similar to that of reptiles. Analysis of the first monotreme genome aligned these features with genetic innovations. We find that reptile and platypus venom proteins have been co-opted independently from the same gene families; milk protein genes are conserved despite platypuses laying eggs; and immune gene family expansions are directly related to platypus biology. Expansions of protein, non-protein-coding RNA and microRNA families, as well as repeat elements, are identified. Sequencing of this genome now provides a valuable resource for deep mammalian comparative analyses, as well as for monotreme biology and conservation. PMID:18464734

  19. Phylogeny and comparative genome analysis of a Basidiomycete fungi

    SciTech Connect

    Riley, Robert W.; Salamov, Asaf; Grigoriev, Igor; Hibbett, David

    2011-03-14

    Fungi of the phylum Basidiomycota, make up some 37percent of the described fungi, and are important from the perspectives of forestry, agriculture, medicine, and bioenergy. This diverse phylum includes the mushrooms, wood rots, plant pathogenic rusts and smuts, and some human pathogens. To better understand these important fungi, we have undertaken a comparative genomic analysis of the Basidiomycetes with available sequenced genomes. We report a phylogeny that sheds light on previously unclear evolutionary relationships among the Basidiomycetes. We also define a `core proteome? based on protein families conserved in all Basidiomycetes. We identify key expansions and contractions in protein families that may be responsible for the degradation of plant biomass such as cellulose, hemicellulose, and lignin. Finally, we speculate as to the genomic changes that drove such expansions and contractions.

  20. Genome analysis of the platypus reveals unique signatures of evolution.

    PubMed

    Warren, Wesley C; Hillier, LaDeana W; Marshall Graves, Jennifer A; Birney, Ewan; Ponting, Chris P; Grützner, Frank; Belov, Katherine; Miller, Webb; Clarke, Laura; Chinwalla, Asif T; Yang, Shiaw-Pyng; Heger, Andreas; Locke, Devin P; Miethke, Pat; Waters, Paul D; Veyrunes, Frédéric; Fulton, Lucinda; Fulton, Bob; Graves, Tina; Wallis, John; Puente, Xose S; López-Otín, Carlos; Ordóñez, Gonzalo R; Eichler, Evan E; Chen, Lin; Cheng, Ze; Deakin, Janine E; Alsop, Amber; Thompson, Katherine; Kirby, Patrick; Papenfuss, Anthony T; Wakefield, Matthew J; Olender, Tsviya; Lancet, Doron; Huttley, Gavin A; Smit, Arian F A; Pask, Andrew; Temple-Smith, Peter; Batzer, Mark A; Walker, Jerilyn A; Konkel, Miriam K; Harris, Robert S; Whittington, Camilla M; Wong, Emily S W; Gemmell, Neil J; Buschiazzo, Emmanuel; Vargas Jentzsch, Iris M; Merkel, Angelika; Schmitz, Juergen; Zemann, Anja; Churakov, Gennady; Kriegs, Jan Ole; Brosius, Juergen; Murchison, Elizabeth P; Sachidanandam, Ravi; Smith, Carly; Hannon, Gregory J; Tsend-Ayush, Enkhjargal; McMillan, Daniel; Attenborough, Rosalind; Rens, Willem; Ferguson-Smith, Malcolm; Lefèvre, Christophe M; Sharp, Julie A; Nicholas, Kevin R; Ray, David A; Kube, Michael; Reinhardt, Richard; Pringle, Thomas H; Taylor, James; Jones, Russell C; Nixon, Brett; Dacheux, Jean-Louis; Niwa, Hitoshi; Sekita, Yoko; Huang, Xiaoqiu; Stark, Alexander; Kheradpour, Pouya; Kellis, Manolis; Flicek, Paul; Chen, Yuan; Webber, Caleb; Hardison, Ross; Nelson, Joanne; Hallsworth-Pepin, Kym; Delehaunty, Kim; Markovic, Chris; Minx, Pat; Feng, Yucheng; Kremitzki, Colin; Mitreva, Makedonka; Glasscock, Jarret; Wylie, Todd; Wohldmann, Patricia; Thiru, Prathapan; Nhan, Michael N; Pohl, Craig S; Smith, Scott M; Hou, Shunfeng; Nefedov, Mikhail; de Jong, Pieter J; Renfree, Marilyn B; Mardis, Elaine R; Wilson, Richard K

    2008-05-08

    We present a draft genome sequence of the platypus, Ornithorhynchus anatinus. This monotreme exhibits a fascinating combination of reptilian and mammalian characters. For example, platypuses have a coat of fur adapted to an aquatic lifestyle; platypus females lactate, yet lay eggs; and males are equipped with venom similar to that of reptiles. Analysis of the first monotreme genome aligned these features with genetic innovations. We find that reptile and platypus venom proteins have been co-opted independently from the same gene families; milk protein genes are conserved despite platypuses laying eggs; and immune gene family expansions are directly related to platypus biology. Expansions of protein, non-protein-coding RNA and microRNA families, as well as repeat elements, are identified. Sequencing of this genome now provides a valuable resource for deep mammalian comparative analyses, as well as for monotreme biology and conservation.

  1. Analysis of Complete Genome Sequences of Human Rhinovirus

    PubMed Central

    Palmenberg, Ann C.; Rathe, Jennifer A.; Liggett, Stephen B.

    2010-01-01

    Human Rhinovirus (HRV) infection is the cause of about one-half of asthma and COPD exacerbations. With >100 serotypes in the HRV reference set an effort was undertaken to sequence their complete genomes so as to understand diversity, structural variation, and evolution of the virus. Analysis revealed conserved motifs, hypervariable regions, a potential fourth HRV species, within-serotype variation in field isolates, a non-scanning internal ribosome entry site, and evidence for HRV recombination. Techniques have now been developed using next generation sequencing to generate complete genomes from patient isolates with high throughput, deep coverage, and low costs. Thus relationships can now be sought between obstructive lung phenotypes and variation in HRV genomes in infected patients, and, potential novel therapeutic strategies developed based on HRV sequence. PMID:20471068

  2. Comparative Analysis of Genome Diversity in Bullmastiff Dogs

    PubMed Central

    Mortlock, Sally-Anne; Khatkar, Mehar S.; Williamson, Peter

    2016-01-01

    Management and preservation of genomic diversity in dog breeds is a major objective for maintaining health. The present study was undertaken to characterise genomic diversity in Bullmastiff dogs using both genealogical and molecular analysis. Genealogical analysis of diversity was conducted using a database consisting of 16,378 Bullmastiff pedigrees from year 1980 to 2013. Additionally, a total of 188 Bullmastiff dogs were genotyped using the 170,000 SNP Illumina CanineHD Beadchip. Genealogical parameters revealed a mean inbreeding coefficient of 0.047; 142 total founders (f); an effective number of founders (fe) of 79; an effective number of ancestors (fa) of 62; and an effective population size of the reference population of 41. Genetic diversity and the degree of genome-wide homogeneity within the breed were also investigated using molecular data. Multiple-locus heterozygosity (MLH) was equal to 0.206; runs of homozygosity (ROH) as proportion of the genome, averaged 16.44%; effective population size was 29.1, with an average inbreeding coefficient of 0.035, all estimated using SNP Data. Fine-scale population structure was analysed using NETVIEW, a population analysis pipeline. Visualisation of the high definition network captured relationships among individuals within and between subpopulations. Effects of unequal founder use, and ancestral inbreeding and selection, were evident. While current levels of Bullmastiff heterozygosity, inbreeding and homozygosity are not unusual, a relatively small effective population size indicates that a breeding strategy to reduce the inbreeding rate may be beneficial. PMID:26824579

  3. Comparative Analysis of Genome Diversity in Bullmastiff Dogs.

    PubMed

    Mortlock, Sally-Anne; Khatkar, Mehar S; Williamson, Peter

    2016-01-01

    Management and preservation of genomic diversity in dog breeds is a major objective for maintaining health. The present study was undertaken to characterise genomic diversity in Bullmastiff dogs using both genealogical and molecular analysis. Genealogical analysis of diversity was conducted using a database consisting of 16,378 Bullmastiff pedigrees from year 1980 to 2013. Additionally, a total of 188 Bullmastiff dogs were genotyped using the 170,000 SNP Illumina CanineHD Beadchip. Genealogical parameters revealed a mean inbreeding coefficient of 0.047; 142 total founders (f); an effective number of founders (fe) of 79; an effective number of ancestors (fa) of 62; and an effective population size of the reference population of 41. Genetic diversity and the degree of genome-wide homogeneity within the breed were also investigated using molecular data. Multiple-locus heterozygosity (MLH) was equal to 0.206; runs of homozygosity (ROH) as proportion of the genome, averaged 16.44%; effective population size was 29.1, with an average inbreeding coefficient of 0.035, all estimated using SNP Data. Fine-scale population structure was analysed using NETVIEW, a population analysis pipeline. Visualisation of the high definition network captured relationships among individuals within and between subpopulations. Effects of unequal founder use, and ancestral inbreeding and selection, were evident. While current levels of Bullmastiff heterozygosity, inbreeding and homozygosity are not unusual, a relatively small effective population size indicates that a breeding strategy to reduce the inbreeding rate may be beneficial.

  4. Genome-wide assessment of sequence-intrinsic enhancer responsiveness at single-base-pair resolution.

    PubMed

    Arnold, Cosmas D; Zabidi, Muhammad A; Pagani, Michaela; Rath, Martina; Schernhuber, Katharina; Kazmar, Tomáš; Stark, Alexander

    2017-02-01

    Gene expression is controlled by enhancers that activate transcription from the core promoters of their target genes. Although a key function of core promoters is to convert enhancer activities into gene transcription, whether and how strongly they activate transcription in response to enhancers has not been systematically assessed on a genome-wide level. Here we describe self-transcribing active core promoter sequencing (STAP-seq), a method to determine the responsiveness of genomic sequences to enhancers, and apply it to the Drosophila melanogaster genome. We cloned candidate fragments at the position of the core promoter (also called minimal promoter) in reporter plasmids with or without a strong enhancer, transfected the resulting library into cells, and quantified the transcripts that initiated from each candidate for each setup by deep sequencing. In the presence of a single strong enhancer, the enhancer responsiveness of different sequences differs by several orders of magnitude, and different levels of responsiveness are associated with genes of different functions. We also identify sequence features that predict enhancer responsiveness and discuss how different core promoters are employed for the regulation of gene expression.

  5. Sequence analysis of the genome of carnation (Dianthus caryophyllus L.).

    PubMed

    Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi

    2014-06-01

    The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. 'Francesco' was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568,887,315 bp, consisting of 45,088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16,644 bp and 60,737 bp, respectively, and the longest scaffold was 1,287,144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ∼ 98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp.

  6. Genome-Assisted Analysis of Dissimilatory Metal-Reducing Bacteria

    SciTech Connect

    Fredrickson, Jim K.; Romine, Margaret F.

    2005-06-01

    Whole genome sequence for Shewanella oneidensis and Geobacter sulfurreducens has provided numerous new biological insights into the function of these model dissimilatory metal-reducing bacteria. Many of the discoveries, including the identification of a high number of c-type cytochromes in both organisms, have been the result of comparative genomic analyses including several that were experimentally confirmed. Genome sequence has also aided the identification of genes important for the reduction of metal ions and other electron acceptors utilized by these organisms during anaerobic growth by facilitating the identification of genes disrupted by random insertions. Technologies for assaying global expression patterns for genes (mRNA) and proteins have also been enabled by the availability of genome sequence but their application has been limited mainly to the analysis of the role of global regulatory genes and to identifying genes expressed or repressed in response to specific electron acceptors. It is anticipated that details regarding the mechanisms of metal ion respiration, and metabolism in general, will eventually be revealed by comprehensive, systems-level analyses enabled by functional genomic analyses.

  7. Sequence Analysis of the Genome of Carnation (Dianthus caryophyllus L.)

    PubMed Central

    Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi

    2014-01-01

    The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. ‘Francesco’ was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568 887 315 bp, consisting of 45 088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16 644 bp and 60 737 bp, respectively, and the longest scaffold was 1 287 144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ∼98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp. PMID:24344172

  8. A comprehensive analysis of bilaterian mitochondrial genomes and phylogeny.

    PubMed

    Bernt, Matthias; Bleidorn, Christoph; Braband, Anke; Dambach, Johannes; Donath, Alexander; Fritzsch, Guido; Golombek, Anja; Hadrys, Heike; Jühling, Frank; Meusemann, Karen; Middendorf, Martin; Misof, Bernhard; Perseke, Marleen; Podsiadlowski, Lars; von Reumont, Björn; Schierwater, Bernd; Schlegel, Martin; Schrödl, Michael; Simon, Sabrina; Stadler, Peter F; Stöger, Isabella; Struck, Torsten H

    2013-11-01

    About 2800 mitochondrial genomes of Metazoa are present in NCBI RefSeq today, two thirds belonging to vertebrates. Metazoan phylogeny was recently challenged by large scale EST approaches (phylogenomics), stabilizing classical nodes while simultaneously supporting new sister group hypotheses. The use of mitochondrial data in deep phylogeny analyses was often criticized because of high substitution rates on nucleotides, large differences in amino acid substitution rate between taxa, and biases in nucleotide frequencies. Nevertheless, mitochondrial genome data might still be promising as it allows for a larger taxon sampling, while presenting a smaller amount of sequence information. We present the most comprehensive analysis of bilaterian relationships based on mitochondrial genome data. The analyzed data set comprises more than 650 mitochondrial genomes that have been chosen to represent a profound sample of the phylogenetic as well as sequence diversity. The results are based on high quality amino acid alignments obtained from a complete reannotation of the mitogenomic sequences from NCBI RefSeq database. However, the results failed to give support for many otherwise undisputed high-ranking taxa, like Mollusca, Hexapoda, Arthropoda, and suffer from extreme long branches of Nematoda, Platyhelminthes, and some other taxa. In order to identify the sources of misleading phylogenetic signals, we discuss several problems associated with mitochondrial genome data sets, e.g. the nucleotide and amino acid landscapes and a strong correlation of gene rearrangements with long branches. Copyright © 2013 Elsevier Inc. All rights reserved.

  9. Viral genome analysis and knowledge management.

    PubMed

    Kuiken, Carla; Yoon, Hyejin; Abfalterer, Werner; Gaschen, Brian; Lo, Chienchi; Korber, Bette

    2013-01-01

    One of the challenges of genetic data analysis is to combine information from sources that are distributed around the world and accessible through a wide array of different methods and interfaces. The HIV database and its footsteps, the hepatitis C virus (HCV) and hemorrhagic fever virus (HFV) databases, have made it their mission to make different data types easily available to their users. This involves a large amount of behind-the-scenes processing, including quality control and analysis of the sequences and their annotation. Gene and protein sequences are distilled from the sequences that are stored in GenBank; to this end, both submitter annotation and script-generated sequences are used. Alignments of both nucleotide and amino acid sequences are generated, manually curated, distilled into an alignment model, and regenerated in an iterative cycle that results in ever better new alignments. Annotation of epidemiological and clinical information is parsed, checked, and added to the database. User interfaces are updated, and new interfaces are added based upon user requests. Vital for its success, the database staff are heavy users of the system, which enables them to fix bugs and find opportunities for improvement. In this chapter we describe some of the infrastructure that keeps these heavily used analysis platforms alive and vital after nearly 25 years of use. The database/analysis platforms described in this chapter can be accessed at http://hiv.lanl.gov http://hcv.lanl.gov http://hfv.lanl.gov.

  10. Genome sequencing and analysis conference grant

    SciTech Connect

    Venter, J.C.

    1995-10-01

    The 14 plenary session presentations focused on nematode; yeast; fruit fly; plants; mycobacteria; and man. In addition there were presentations on a variety of technical innovations including database developments and refinements, bioelectronic genesensors, computer-assisted multiplex techniques, and hybridization analysis with DNA chip technology. This document includes a list of exhibitors and abstracts of sessions.

  11. Genome-Wide Analysis in Brazilians Reveals Highly Differentiated Native American Genome Regions.

    PubMed

    Mychaleckyj, Josyf C; Havt, Alexandre; Nayak, Uma; Pinkerton, Relana; Farber, Emily; Concannon, Patrick; Lima, Aldo A; Guerrant, Richard L

    2017-03-01

    Despite its population, geographic size, and emerging economic importance, disproportionately little genome-scale research exists into genetic factors that predispose Brazilians to disease, or the population genetics of risk. After identification of suitable proxy populations and careful analysis of tri-continental admixture in 1,538 North-Eastern Brazilians to estimate individual ancestry and ancestral allele frequencies, we computed 400,000 genome-wide locus-specific branch length (LSBL) Fst statistics of Brazilian Amerindian ancestry compared to European and African; and a similar set of differentiation statistics for their Amerindian component compared with the closest Asian 1000 Genomes population (surprisingly, Bengalis in Bangladesh). After ranking SNPs by these statistics, we identified the top 10 highly differentiated SNPs in five genome regions in the LSBL tests of Brazilian Amerindian ancestry compared to European and African; and the top 10 SNPs in eight regions comparing their Amerindian component to the closest Asian 1000 Genomes population. We found SNPs within or proximal to the genes CIITA (rs6498115), SMC6 (rs1834619), and KLHL29 (rs2288697) were most differentiated in the Amerindian-specific branch, while SNPs in the genes ADAMTS9 (rs7631391), DOCK2 (rs77594147), SLC28A1 (rs28649017), ARHGAP5 (rs7151991), and CIITA (rs45601437) were most highly differentiated in the Asian comparison. These genes are known to influence immune function, metabolic and anthropometry traits, and embryonic development. These analyses have identified candidate genes for selection within Amerindian ancestry, and by comparison of the two analyses, those for which the differentiation may have arisen during the migration from Asia to the Americas.

  12. Genome-Wide Analysis in Brazilians Reveals Highly Differentiated Native American Genome Regions

    PubMed Central

    Havt, Alexandre; Nayak, Uma; Pinkerton, Relana; Farber, Emily; Concannon, Patrick; Lima, Aldo A.; Guerrant, Richard L.

    2017-01-01

    Despite its population, geographic size, and emerging economic importance, disproportionately little genome-scale research exists into genetic factors that predispose Brazilians to disease, or the population genetics of risk. After identification of suitable proxy populations and careful analysis of tri-continental admixture in 1,538 North-Eastern Brazilians to estimate individual ancestry and ancestral allele frequencies, we computed 400,000 genome-wide locus-specific branch length (LSBL) Fst statistics of Brazilian Amerindian ancestry compared to European and African; and a similar set of differentiation statistics for their Amerindian component compared with the closest Asian 1000 Genomes population (surprisingly, Bengalis in Bangladesh). After ranking SNPs by these statistics, we identified the top 10 highly differentiated SNPs in five genome regions in the LSBL tests of Brazilian Amerindian ancestry compared to European and African; and the top 10 SNPs in eight regions comparing their Amerindian component to the closest Asian 1000 Genomes population. We found SNPs within or proximal to the genes CIITA (rs6498115), SMC6 (rs1834619), and KLHL29 (rs2288697) were most differentiated in the Amerindian-specific branch, while SNPs in the genes ADAMTS9 (rs7631391), DOCK2 (rs77594147), SLC28A1 (rs28649017), ARHGAP5 (rs7151991), and CIITA (rs45601437) were most highly differentiated in the Asian comparison. These genes are known to influence immune function, metabolic and anthropometry traits, and embryonic development. These analyses have identified candidate genes for selection within Amerindian ancestry, and by comparison of the two analyses, those for which the differentiation may have arisen during the migration from Asia to the Americas. PMID:28100790

  13. A High-Resolution Whole-Genome Map of Key Chromatin Modifications in the Adult Drosophila melanogaster

    PubMed Central

    Yin, Hang; Sweeney, Sarah; Raha, Debasish; Snyder, Michael; Lin, Haifan

    2011-01-01

    Epigenetic research has been focused on cell-type-specific regulation; less is known about common features of epigenetic programming shared by diverse cell types within an organism. Here, we report a modified method for chromatin immunoprecipitation and deep sequencing (ChIP–Seq) and its use to construct a high-resolution map of the Drosophila melanogaster key histone marks, heterochromatin protein 1a (HP1a) and RNA polymerase II (polII). These factors are mapped at 50-bp resolution genome-wide and at 5-bp resolution for regulatory sequences of genes, which reveals fundamental features of chromatin modification landscape shared by major adult Drosophila cell types: the enrichment of both heterochromatic and euchromatic marks in transposons and repetitive sequences, the accumulation of HP1a at transcription start sites with stalled polII, the signatures of histone code and polII level/position around the transcriptional start sites that predict both the mRNA level and functionality of genes, and the enrichment of elongating polII within exons at splicing junctions. These features, likely conserved among diverse epigenomes, reveal general strategies for chromatin modifications. PMID:22194694

  14. A high-resolution whole-genome map of key chromatin modifications in the adult Drosophila melanogaster.

    PubMed

    Yin, Hang; Sweeney, Sarah; Raha, Debasish; Snyder, Michael; Lin, Haifan

    2011-12-01

    Epigenetic research has been focused on cell-type-specific regulation; less is known about common features of epigenetic programming shared by diverse cell types within an organism. Here, we report a modified method for chromatin immunoprecipitation and deep sequencing (ChIP-Seq) and its use to construct a high-resolution map of the Drosophila melanogaster key histone marks, heterochromatin protein 1a (HP1a) and RNA polymerase II (polII). These factors are mapped at 50-bp resolution genome-wide and at 5-bp resolution for regulatory sequences of genes, which reveals fundamental features of chromatin modification landscape shared by major adult Drosophila cell types: the enrichment of both heterochromatic and euchromatic marks in transposons and repetitive sequences, the accumulation of HP1a at transcription start sites with stalled polII, the signatures of histone code and polII level/position around the transcriptional start sites that predict both the mRNA level and functionality of genes, and the enrichment of elongating polII within exons at splicing junctions. These features, likely conserved among diverse epigenomes, reveal general strategies for chromatin modifications.

  15. Emerging pathogens of gilthead seabream: characterisation and genomic analysis of novel intracellular β-proteobacteria

    PubMed Central

    Seth-Smith, Helena M.B.; Dourala, Nancy; Fehr, Alexander; Qi, Weihong; Katharios, Pantelis; Ruetten, Maja; Mateos, José M.; Nufer, Lisbeth; Weilenmann, Roseline; Ziegler, Urs; Thomson, Nicholas R; Schlapbach, Ralph; Vaughan, Lloyd

    2015-01-01

    New and emerging environmental pathogens pose some of the greatest threats to modern aquaculture, a critical source of food protein globally. As with other intensive farming practices, increasing our understanding of the biology of infections is important to improve animal welfare and husbandry. The gill infection epitheliocystis is increasingly problematic in gilthead seabream (Sparus aurata), a major Mediterranean aquaculture species. Epitheliocystis is generally associated with chlamydial bacteria, yet we were not able to localise chlamydial targets within the major gilthead seabream lesions. Two previously unidentified species within a novel β-proteobacterial genus were instead identified. These co-infecting intracellular bacteria have been characterised using high resolution imaging and genomics, presenting the most comprehensive study on epitheliocystis agents to date. The genomes of the two uncultured species, Ca. Ichthyocystis hellenicum and Ca. Ichthyocystis sparus, have been de novo sequenced and annotated from preserved material. Analysis of the genomes shows a compact core indicating a metabolic dependency on the host, and an accessory genome with an unprecedented number of tandemly arrayed gene families. This study represents a critical insight into novel, emerging fish pathogens and will be used to underpin future investigations into the bacterial origins, and to develop diagnostic and treatment strategies. PMID:26849311

  16. Ground Truth Analysis Supporting the High Resolution Flyover.

    DTIC Science & Technology

    1983-03-01

    h -Ai2S 026 GROUND TRUTH ANALYSIS SUPPORTING THE NIGH RESOLUTIONJ 1/1 FLYOYER(U) NAYAL CORSTAL SYSTEMS CENTER PANAMA CITY FL D F LOTT MAR 83,NCSC-TM...378 83 SB1 AD-F208 051 UNCLR7SIFIED F/G 8/3 N 2. 11111=1.25 klf(RCP ,L- I(N , 011 0 0 0 TECHNICAL MEMORANDUM NCSC TM 370-83 MARCH 1983 GROUND TRUTH ...provided ground truth measurements in support of the High Resolution Flyover (NAVAIR Task No. A370370G/076B/lF590550-000). Ground truthing was provided

  17. Comparative Analysis of Chloroplast Genomes: Functional Annotation, Genome-Based Phylogeny, and Deduced Evolutionary Patterns

    PubMed Central

    Rivas, Javier De Las; Lozano, Juan Jose; Ortiz, Angel R.

    2002-01-01

    All protein sequences from 19 complete chloroplast genomes (cpDNA) have been studied using a new computational method able to analyze functional correlations among series of protein sequences contained in complete proteomes. First, all open reading frames (ORFs) from the cpDNAs, comprising a total of 2266 protein sequences, were compared against the 3168 proteins from Synechocystis PCC6803 complete genome to find functionally related orthologous proteins. Additionally, all cpDNA genomes were pairwise compared to find orthologous groups not present in cyanobacteria. Annotations in the cluster of othologous proteins database and CyanoBase were used as reference for the functional assignments. Following this protocol, new functional assignments were made for ORFs of unknown function and for ycfs (hypothetical chloroplast frames), which still lack a functional assignment. Using this information, a matrix of functional relationships was derived from profiles of the presence and/or absence of orthologous proteins; the matrix included 1837 proteins in 277 orthologous clusters. A factor analysis study of this matrix, followed by cluster analysis, allowed us to obtain accurate phylogenetic reconstructions and the detection of genes probably involved in speciation as phylogenetic correlates. Finally, by grouping common evolutionary patterns, we show that it is possible to determine functionally linked protein networks. This has allowed us to suggest putative associations for some unknown ORFs. PMID:11932241

  18. Improved protocol for rapid identification of certain spa types using high resolution melting curve analysis.

    PubMed

    Mayerhofer, Benjamin; Stöger, Anna; Pietzka, Ariane T; Fernandez, Haizpea Lasa; Prewein, Bernhard; Sorschag, Sieglinde; Kunert, Renate; Allerberger, Franz; Ruppitsch, Werner

    2015-01-01

    Methicillin-resistant Staphylococcus aureus is one of the most significant pathogens associated with health care. For efficient surveillance, control and outbreak investigation, S. aureus typing is essential. A high resolution melting curve analysis was developed and evaluated for rapid identification of the most frequent spa types found in an Austrian hospital consortium covering 2,435 beds. Among 557 methicillin-resistant Staphylococcus aureus isolates 38 different spa types were identified by sequence analysis of the hypervariable region X of the protein A gene (spa). Identification of spa types through their characteristic high resolution melting curve profiles was considerably improved by double spiking with genomic DNA from spa type t030 and spa type t003 and allowed unambiguous and fast identification of the ten most frequent spa types t001 (58%), t003 (12%), t190 (9%), t041 (5%), t022 (2%), t032 (2%), t008 (2%), t002 (1%), t5712 (1%) and t2203 (1%), representing 93% of all isolates within this hospital consortium. The performance of the assay was evaluated by testing samples with unknown spa types from the daily routine and by testing three different high resolution melting curve analysis real-time PCR instruments. The ten most frequent spa types were identified from all samples and on all instruments with 100% specificity and 100% sensitivity. Compared to classical spa typing by sequence analysis, this gene scanning assay is faster, cheaper and can be performed in a single closed tube assay format. Therefore it is an optimal screening tool to detect the most frequent endemic spa types and to exclude non-endemic spa types within a hospital.

  19. Improved Protocol for Rapid Identification of Certain Spa Types Using High Resolution Melting Curve Analysis

    PubMed Central

    Mayerhofer, Benjamin; Stöger, Anna; Pietzka, Ariane T.; Fernandez, Haizpea Lasa; Prewein, Bernhard; Sorschag, Sieglinde; Kunert, Renate; Allerberger, Franz; Ruppitsch, Werner

    2015-01-01

    Methicillin-resistant Staphylococcus aureus is one of the most significant pathogens associated with health care. For efficient surveillance, control and outbreak investigation, S. aureus typing is essential. A high resolution melting curve analysis was developed and evaluated for rapid identification of the most frequent spa types found in an Austrian hospital consortium covering 2,435 beds. Among 557 methicillin-resistant Staphylococcus aureus isolates 38 different spa types were identified by sequence analysis of the hypervariable region X of the protein A gene (spa). Identification of spa types through their characteristic high resolution melting curve profiles was considerably improved by double spiking with genomic DNA from spa type t030 and spa type t003 and allowed unambiguous and fast identification of the ten most frequent spa types t001 (58%), t003 (12%), t190 (9%), t041 (5%), t022 (2%), t032 (2%), t008 (2%), t002 (1%), t5712 (1%) and t2203 (1%), representing 93% of all isolates within this hospital consortium. The performance of the assay was evaluated by testing samples with unknown spa types from the daily routine and by testing three different high resolution melting curve analysis real-time PCR instruments. The ten most frequent spa types were identified from all samples and on all instruments with 100% specificity and 100% sensitivity. Compared to classical spa typing by sequence analysis, this gene scanning assay is faster, cheaper and can be performed in a single closed tube assay format. Therefore it is an optimal screening tool to detect the most frequent endemic spa types and to exclude non-endemic spa types within a hospital. PMID:25768007

  20. Genomic analysis and selected molecular pathways in rare cancers

    NASA Astrophysics Data System (ADS)

    Liu, Stephen V.; Lenkiewicz, Elizabeth; Evers, Lisa; Holley, Tara; Kiefer, Jeffrey; Ruiz, Christian; Glatz, Katharina; Bubendorf, Lukas; Demeure, Michael J.; Eng, Cathy; Ramanathan, Ramesh K.; Von Hoff, Daniel D.; Barrett, Michael T.

    2012-12-01

    It is widely accepted that many cancers arise as a result of an acquired genomic instability and the subsequent evolution of tumor cells with variable patterns of selected and background aberrations. The presence and behaviors of distinct neoplastic cell populations within a patient's tumor may underlie multiple clinical phenotypes in cancers. A goal of many current cancer genome studies is the identification of recurring selected driver events that can be advanced for the development of personalized therapies. Unfortunately, in the majority of rare tumors, this type of analysis can be particularly challenging. Large series of specimens for analysis are simply not available, allowing recurring patterns to remain hidden. In this paper, we highlight the use of DNA content-based flow sorting to identify and isolate DNA-diploid and DNA-aneuploid populations from tumor biopsies as a strategy to comprehensively study the genomic composition and behaviors of individual cancers in a series of rare solid tumors: intrahepatic cholangiocarcinoma, anal carcinoma, adrenal leiomyosarcoma, and pancreatic neuroendocrine tumors. We propose that the identification of highly selected genomic events in distinct tumor populations within each tumor can identify candidate driver events that can facilitate the development of novel, personalized treatment strategies for patients with cancer.

  1. Genome-Wide Detection and Analysis of Multifunctional Genes

    PubMed Central

    Pritykin, Yuri; Ghersi, Dario; Singh, Mona

    2015-01-01

    Many genes can play a role in multiple biological processes or molecular functions. Identifying multifunctional genes at the genome-wide level and studying their properties can shed light upon the complexity of molecular events that underpin cellular functioning, thereby leading to a better understanding of the functional landscape of the cell. However, to date, genome-wide analysis of multifunctional genes (and the proteins they encode) has been limited. Here we introduce a computational approach that uses known functional annotations to extract genes playing a role in at least two distinct biological processes. We leverage functional genomics data sets for three organisms—H. sapiens, D. melanogaster, and S. cerevisiae—and show that, as compared to other annotated genes, genes involved in multiple biological processes possess distinct physicochemical properties, are more broadly expressed, tend to be more central in protein interaction networks, tend to be more evolutionarily conserved, and are more likely to be essential. We also find that multifunctional genes are significantly more likely to be involved in human disorders. These same features also hold when multifunctionality is defined with respect to molecular functions instead of biological processes. Our analysis uncovers key features about multifunctional genes, and is a step towards a better genome-wide understanding of gene multifunctionality. PMID:26436655

  2. Natural selection on functional modules, a genome-wide analysis.

    PubMed

    Serra, François; Arbiza, Leonardo; Dopazo, Joaquín; Dopazo, Hernán

    2011-03-01

    Classically, the functional consequences of natural selection over genomes have been analyzed as the compound effects of individual genes. The current paradigm for large-scale analysis of adaptation is based on the observed significant deviations of rates of individual genes from neutral evolutionary expectation. This approach, which assumed independence among genes, has not been able to identify biological functions significantly enriched in positively selected genes in individual species. Alternatively, pooling related species has enhanced the search for signatures of selection. However, grouping signatures does not allow testing for adaptive differences between species. Here we introduce the Gene-Set Selection Analysis (GSSA), a new genome-wide approach to test for evidences of natural selection on functional modules. GSSA is able to detect lineage specific evolutionary rate changes in a notable number of functional modules. For example, in nine mammal and Drosophilae genomes GSSA identifies hundreds of functional modules with significant associations to high and low rates of evolution. Many of the detected functional modules with high evolutionary rates have been previously identified as biological functions under positive selection. Notably, GSSA identifies conserved functional modules with many positively selected genes, which questions whether they are exclusively selected for fitting genomes to environmental changes. Our results agree with previous studies suggesting that adaptation requires positive selection, but not every mutation under positive selection contributes to the adaptive dynamical process of the evolution of species.

  3. Comparative genomic analysis of catfish linkage group 8 reveals two homologous chromosomes in zebrafish and other teleosts with extensive inter-chromosomal rearrangements

    PubMed Central

    2013-01-01

    Background Comparative genomics is a powerful tool to transfer genomic information from model species to related non-model species. Channel catfish (Ictalurus punctatus) is the primary aquaculture species in the United States. Its existing genome resources such as genomic sequences generated from next generation sequencing, BAC end sequences (BES), physical maps, linkage maps, and integrated linkage and physical maps using BES-associated markers provide a platform for comparative genomic analysis between catfish and other model teleost fish species. This study aimed to gain understanding of genome organizations and similarities among catfish and several sequenced teleost genomes using linkage group 8 (LG8) as a pilot study. Results With existing genome resources, 287 unique genes were identified in LG8. Comparative genome analysis indicated that most of these 287 genes on catfish LG8 are located on two homologous chromosomes of zebrafish, medaka, stickleback, and three chromosomes of green-spotted pufferfish. Large numbers of conserved syntenies were identified. Detailed analysis of the conserved syntenies in relation to chromosome level similarities revealed extensive inter-chromosomal and intra-chromosomal rearrangements during evolution. Of the 287 genes, 35 genes were found to be duplicated in the catfish genome, with the vast majority of the duplications being interchromosomal. Conclusions Comparative genome analysis is a powerful tool even in the absence of a well-assembled whole genome sequence. In spite of sequence stacking due to low resolution of the linkage and physical maps, conserved syntenies can be identified although the exact gene order and orientation are unknown at present. Through chromosome-level comparative analysis, homologous chromosomes among teleosts can be identified. Syntenic analysis should facilitate annotation of the catfish genome, which in turn, should facilitate functional inference of genes based on their orthology. PMID:23758806

  4. Sensitive quantitative analysis of murine LINE1 DNA methylation using high resolution melt analysis

    PubMed Central

    Newman, Michelle; Blyth, Benjamin J.; Hussey, Damian J.; Jardine, Daniel; Ormsby, Rebecca J.

    2012-01-01

    We present here the first high resolution melt (HRM) assay to quantitatively analyze differences in murine DNA methylation levels utilizing CpG methylation of Long Interspersed Elements-1 (LINE1 or L1). By calculating the integral difference in melt temperature between samples and a methylated control, and biasing PCR primers for unmethylated CpGs, the assay demonstrates enhanced sensitivity to detect changes in methylation in a cell line treated with low doses of 5-aza-2’-deoxycytidine (5-aza). The L1 assay was confirmed to be a good marker of changes in DNA methylation of L1 elements at multiple regions across the genome when compared with total 5-methyl-cytosine content, measured by Liquid Chromatography-Mass Spectrometry (LC-MS). The assay design was also used to detect changes in methylation at other murine repeat elements (B1 and Intracisternal-A-particle Long-terminal Repeat elements). Pyrosequencing analysis revealed that L1 methylation changes were non-uniform across the CpGs within the L1-HRM target region, demonstrating that the L1 assay can detect small changes in CpG methylation among a large pool of heterogeneously methylated DNA templates. Application of the assay to various tissues from Balb/c and CBA mice, including previously unreported peripheral blood (PB), revealed a tissue hierarchy (from hypermethylated to hypomethylated) of PB > kidney > liver > prostate > spleen. CBA mice demonstrated overall greater methylation than Balb/c mice, and male mice demonstrated higher tissue methylation compared with female mice in both strains. Changes in DNA methylation have been reported to be an early and fundamental event in the pathogenesis of many human diseases, including cancer. Mouse studies designed to identify modulators of DNA methylation, the critical doses, relevant time points and the tissues affected are limited by the low throughput nature and exorbitant cost of many DNA methylation assays. The L1 assay provides a high throughput, inexpensive

  5. Sensitive quantitative analysis of murine LINE1 DNA methylation using high resolution melt analysis.

    PubMed

    Newman, Michelle; Blyth, Benjamin J; Hussey, Damian J; Jardine, Daniel; Sykes, Pamela J; Ormsby, Rebecca J

    2012-01-01

    We present here the first high resolution melt (HRM) assay to quantitatively analyze differences in murine DNA methylation levels utilizing CpG methylation of Long Interspersed Elements-1 (LINE1 or L1). By calculating the integral difference in melt temperature between samples and a methylated control, and biasing PCR primers for unmethylated CpGs, the assay demonstrates enhanced sensitivity to detect changes in methylation in a cell line treated with low doses of 5-aza-2'-deoxycytidine (5-aza). The L1 assay was confirmed to be a good marker of changes in DNA methylation of L1 elements at multiple regions across the genome when compared with total 5-methyl-cytosine content, measured by Liquid Chromatography-Mass Spectrometry (LC-MS). The assay design was also used to detect changes in methylation at other murine repeat elements (B1 and Intracisternal-A-particle Long-terminal Repeat elements). Pyrosequencing analysis revealed that L1 methylation changes were non-uniform across the CpGs within the L1-HRM target region, demonstrating that the L1 assay can detect small changes in CpG methylation among a large pool of heterogeneously methylated DNA templates. Application of the assay to various tissues from Balb/c and CBA mice, including previously unreported peripheral blood (PB), revealed a tissue hierarchy (from hypermethylated to hypomethylated) of PB > kidney > liver > prostate > spleen. CBA mice demonstrated overall greater methylation than Balb/c mice, and male mice demonstrated higher tissue methylation compared with female mice in both strains. Changes in DNA methylation have been reported to be an early and fundamental event in the pathogenesis of many human diseases, including cancer. Mouse studies designed to identify modulators of DNA methylation, the critical doses, relevant time points and the tissues affected are limited by the low throughput nature and exorbitant cost of many DNA methylation assays. The L1 assay provides a high throughput, inexpensive

  6. Comparative Genome Analysis of Basidiomycete Fungi

    SciTech Connect

    Riley, Robert; Salamov, Asaf; Morin, Emmanuelle; Nagy, Laszlo; Manning, Gerard; Baker, Scott; Brown, Daren; Henrissat, Bernard; Levasseur, Anthony; Hibbett, David; Martin, Francis; Grigoriev, Igor

    2012-03-19

    Fungi of the phylum Basidiomycota (basidiomycetes), make up some 37percent of the described fungi, and are important in forestry, agriculture, medicine, and bioenergy. This diverse phylum includes the mushrooms, wood rots, symbionts, and plant and animal pathogens. To better understand the diversity of phenotypes in basidiomycetes, we performed a comparative analysis of 35 basidiomycete fungi spanning the diversity of the phylum. Phylogenetic patterns of lignocellulose degrading genes suggest a continuum rather than a sharp dichotomy between the white rot and brown rot modes of wood decay. Patterns of secondary metabolic enzymes give additional insight into the broad array of phenotypes found in the basidiomycetes. We suggest that the profile of an organism in lignocellulose-targeting genes can be used to predict its nutritional mode, and predict Dacryopinax sp. as a brown rot; Botryobasidium botryosum and Jaapia argillacea as white rots.

  7. Comparative Analysis of Genome Sequences Covering the Seven Cronobacter Species

    PubMed Central

    Cummings, Craig A.; Shih, Rita; Degoricija, Lovorka; Rico, Alain; Brzoska, Pius; Hamby, Stephen E.; Masood, Naqash; Hariri, Sumyya; Sonbol, Hana; Chuzhanova, Nadia; McClelland, Michael; Furtado, Manohar R.; Forsythe, Stephen J.

    2012-01-01

    Background Species of Cronobacter are widespread in the environment and are occasional food-borne pathogens associated with serious neonatal diseases, including bacteraemia, meningitis, and necrotising enterocolitis. The genus is composed of seven species: C. sakazakii, C. malonaticus, C. turicensis, C. dublinensis, C. muytjensii, C. universalis, and C. condimenti. Clinical cases are associated with three species, C. malonaticus, C. turicensis and, in particular, with C. sakazakii multilocus sequence type 4. Thus, it is plausible that virulence determinants have evolved in certain lineages. Methodology/Principal Findings We generated high quality sequence drafts for eleven Cronobacter genomes representing the seven Cronobacter species, including an ST4 strain of C. sakazakii. Comparative analysis of these genomes together with the two publicly available genomes revealed Cronobacter has over 6,000 genes in one or more strains and over 2,000 genes shared by all Cronobacter. Considerable variation in the presence of traits such as type six secretion systems, metal resistance (tellurite, copper and silver), and adhesins were found. C. sakazakii is unique in the Cronobacter genus in encoding genes enabling the utilization of exogenous sialic acid which may have clinical significance. The C. sakazakii ST4 strain 701 contained additional genes as compared to other C. sakazakii but none of them were known specific virulence-related genes. Conclusions/Significance Genome comparison revealed that pair-wise DNA sequence identity varies between 89 and 97% in the seven Cronobacter species, and also suggested various degrees of divergence. Sets of universal core genes and accessory genes unique to each strain were identified. These gene sequences can be used for designing genus/species specific detection assays. Genes encoding adhesins, T6SS, and metal resistance genes as well as prophages are found in only subsets of genomes and have contributed considerably to the variation of

  8. Genome-wide association analysis reveals new targets for carotenoid biofortification in maize.

    PubMed

    Suwarno, Willy B; Pixley, Kevin V; Palacios-Rojas, Natalia; Kaeppler, Shawn M; Babu, Raman

    2015-05-01

    Genome-wide association analysis in CIMMYT's association panel revealed new favorable native genomic variations in/nearby important genes such as hydroxylases and CCD1 that have potential for carotenoid biofortification in maize. Genome-wide association studies (GWAS) have been used extensively to identify allelic variation for genes controlling important agronomic and nutritional traits in plants. Provitamin A (proVA) enhancing alleles of lycopene epsilon cyclase (LCYE) and β-carotene hydroxylase 1 (CRTRB1), previously identified through candidate-gene based GWAS, are currently used in CIMMYT's maize breeding program. The objective of this study was to identify genes or genomic regions controlling variation for carotenoid concentrations in grain for CIMMYT's carotenoid association mapping panel of 380 inbred maize lines, using high-density genome-wide platforms with ~476,000 SNP markers. Population structure effects were minimized by adjustments using principal components and kinship matrix with mixed models. Genome-wide linkage disequilibrium (LD) analysis indicated faster LD decay (3.9 kb; r (2) = 0.1) than commonly reported for temperate germplasm, and therefore the possibility of achieving higher mapping resolution with our mostly tropical diversity panel. GWAS for various carotenoids identified CRTRB1, LCYE and other key genes or genomic regions that govern rate-critical steps in the upstream pathway, such as DXS1, GGPS1, and GGPS2 that are known to play important roles in the accumulation of precursor isoprenoids as well as downstream genes HYD5, CCD1, and ZEP1, which are involved in hydroxylation and carotenoid degradation. SNPs at or near all of these regions were identified and may be useful target regions for carotenoid biofortification breeding efforts in maize; for example a genomic region on chromosome 2 explained ~16% of the phenotypic variance for β-carotene independently of CRTRB1, and a variant of CCD1 that resulted in reduced

  9. Finite-Resolution Effects in p -Leader Multifractal Analysis

    NASA Astrophysics Data System (ADS)

    Leonarduzzi, Roberto; Wendt, Herwig; Abry, Patrice; Jaffard, Stephane; Melot, Clothilde

    2017-07-01

    Multifractal analysis has become a standard signal processing tool,for which a promising new formulation, the p-leader multifractal formalism, has recently been proposed. It relies on novel multiscale quantities, the p-leaders, defined as local l^p norms of sets of wavelet coefficients located at infinitely many fine scales. Computing such infinite sums from actual finite-resolution data requires truncations to the finest available scale, which results in biased p-leaders and thus in inaccurate estimates of multifractal properties. A systematic study of such finite-resolution effects leads to conjecture an explicit and universal closed-form correction that permits an accurate estimation of scaling exponents. This conjecture is formulated from the theoretical study of a particular class of models for multifractal processes, the wavelet-based cascades. The relevance and generality of the proposed conjecture is assessed by numerical simulations conducted over a large variety of multifractal processes. Finally, the relevance of the proposed corrected estimators is demonstrated on the analysis of heart rate variability data.

  10. A De-Novo Genome Analysis Pipeline (DeNoGAP) for large-scale comparative prokaryotic genomics studies.

    PubMed

    Thakur, Shalabh; Guttman, David S

    2016-06-30

    Comparative analysis of whole genome sequence data from closely related prokaryotic species or strains is becoming an increasingly important and accessible approach for addressing both fundamental and applied biological questions. While there are number of excellent tools developed for performing this task, most scale poorly when faced with hundreds of genome sequences, and many require extensive manual curation. We have developed a de-novo genome analysis pipeline (DeNoGAP) for the automated, iterative and high-throughput analysis of data from comparative genomics projects involving hundreds of whole genome sequences. The pipeline is designed to perform reference-assisted and de novo gene prediction, homolog protein family assignment, ortholog prediction, functional annotation, and pan-genome analysis using a range of proven tools and databases. While most existing methods scale quadratically with the number of genomes since they rely on pairwise comparisons among predicted protein sequences, DeNoGAP scales linearly since the homology assignment is based on iteratively refined hidden Markov models. This iterative clustering strategy enables DeNoGAP to handle a very large number of genomes using minimal computational resources. Moreover, the modular structure of the pipeline permits easy updates as new analysis programs become available. DeNoGAP integrates bioinformatics tools and databases for comparative analysis of a large number of genomes. The pipeline offers tools and algorithms for annotation and analysis of completed and draft genome sequences. The pipeline is developed using Perl, BioPerl and SQLite on Ubuntu Linux version 12.04 LTS. Currently, the software package accompanies script for automated installation of necessary external programs on Ubuntu Linux; however, the pipeline should be also compatible with other Linux and Unix systems after necessary external programs are installed. DeNoGAP is freely available at https://sourceforge.net/projects/denogap/ .

  11. Objective high Resolution Analysis over Complex Terrain with VERA

    NASA Astrophysics Data System (ADS)

    Mayer, D.; Steinacker, R.; Steiner, A.

    2012-04-01

    VERA (Vienna Enhanced Resolution Analysis) is a model independent, high resolution objective analysis of meteorological fields over complex terrain. This system consists of a special developed quality control procedure and a combination of an interpolation and a downscaling technique. Whereas the so called VERA-QC is presented at this conference in the contribution titled "VERA-QC, an approved Data Quality Control based on Self-Consistency" by Andrea Steiner, this presentation will focus on the method and the characteristics of the VERA interpolation scheme which enables one to compute grid point values of a meteorological field based on irregularly distributed observations and topography related aprior knowledge. Over a complex topography meteorological fields are not smooth in general. The roughness which is induced by the topography can be explained physically. The knowledge about this behavior is used to define the so called Fingerprints (e.g. a thermal Fingerprint reproducing heating or cooling over mountainous terrain or a dynamical Fingerprint reproducing positive pressure perturbation on the windward side of a ridge) under idealized conditions. If the VERA algorithm recognizes patterns of one or more Fingerprints at a few observation points, the corresponding patterns are used to downscale the meteorological information in a greater surrounding. This technique allows to achieve an analysis with a resolution much higher than the one of the observational network. The interpolation of irregularly distributed stations to a regular grid (in space and time) is based on a variational principle applied to first and second order spatial and temporal derivatives. Mathematically, this can be formulated as a cost function that is equivalent to the penalty function of a thin plate smoothing spline. After the analysis field has been divided into the Fingerprint components and the unexplained part respectively, the requirement of a smooth distribution is applied to the

  12. Genome-wide association interaction analysis for Alzheimer's disease

    PubMed Central

    Gusareva, Elena S.; Carrasquillo, Minerva M.; Bellenguez, Céline; Cuyvers, Elise; Colon, Samuel; Graff-Radford, Neill R.; Petersen, Ronald C.; Dickson, Dennis W.; Mahachie Johna, Jestinah M.; Bessonov, Kyrylo; Van Broeckhoven, Christine; Williams, Julie; Amouyel, Philippe; Sleegers, Kristel; Ertekin-Taner, Nilüfer; Lambert, Jean-Charles; Van Steen, Kristel

    2015-01-01

    We propose a minimal protocol for exhaustive genome-wide association interaction analysis that involves screening for epistasis over large-scale genomic data combining strengths of different methods and statistical tools. The different steps of this protocol are illustrated on a real-life data application for Alzheimer's disease (AD) (2259 patients and 6017 controls from France). Particularly, in the exhaustive genome-wide epistasis screening we identified AD-associated interacting SNPs-pair from chromosome 6q11.1 (rs6455128, the KHDRBS2 gene) and 13q12.11 (rs7989332, the CRYL1 gene) (p = 0.006, corrected for multiple testing). A replication analysis in the independent AD cohort from Germany (555 patients and 824 controls) confirmed the discovered epistasis signal (p = 0.036). This signal was also supported by a meta-analysis approach in 5 independent AD cohorts that was applied in the context of epistasis for the first time. Transcriptome analysis revealed negative correlation between expression levels of KHDRBS2 and CRYL1 in both the temporal cortex (β = −0.19, p = 0.0006) and cerebellum (β = −0.23, p < 0.0001) brain regions. This is the first time a replicable epistasis associated with AD was identified using a hypothesis free screening approach. PMID:24958192

  13. Comparative genomic analysis of two Burkholderia glumae strains from different geographic origins reveals a high degree of plasticity in genome structure associated with genomic islands.

    PubMed

    Francis, Felix; Kim, Joohyun; Ramaraj, Thiru; Farmer, Andrew; Rush, Milton C; Ham, Jong Hyun

    2013-04-01

    Burkholderia glumae is the major causal agent of bacterial panicle blight of rice, a growing disease problem in global rice production. To better understand its genome-scale characteristics, the genome of the highly virulent B. glumae strain 336gr-1 isolated from Louisiana, USA was sequenced using the Illumina Genome Analyser II system. De novo assembled 336gr-1 contigs were aligned and compared with the previously sequenced genome of B. glumae strain BGR1, which was isolated from an infected rice plant in South Korea. Comparative analysis of the whole genomes of B. glumae 336gr-1 and B. glumae BGR1 revealed numerous unique genomic regions present only in one of the two strains. These unique regions contained accessory genes including mobile elements and phage-related genes, and some of the unique regions in B. glumae BGR1 corresponded to predicted genomic islands. In contrast, little variation was observed in known and potential virulence genes between the two genomes. The considerable amount of plasticity largely based on accessory genes and genome islands observed from the comparison of the genomes of these two strains of B. glumae may explain the versatility of this bacterial species in various environmental conditions and geographic locations.

  14. Genome and Proteome Analysis of Industrial Fungi

    SciTech Connect

    Baker, Scott E.; Wend, Christopher F.; Martinez, Antonio D.; Magnuson, Jon K.; Panisko, Ellen A.; Dai, Ziyu; Bruno, Kenneth S.; Anderson, Kevin K.; Monroe, Matthew E.; Daly, Don S.; Lasure, Linda L.

    2007-09-06

    In order to decrease dependence on petroleum, the United States Department of Energy (USDOE) Office of the Biomass Program (OBP) is investing in research and development to enable its vision of the biorefinery. The biorefinery will decrease the use of petroleum through conversion of biomass such as crops or agricultural waste into fuels and products. How do fungi fit into the biorefinery? Analysis of the “Top Ten” study indicates that nine of the top twelve chemical building blocks are currently produced or may potentially be produced by fungal fermentation processes. However, a significant barrier to the use of bio-based products is the economic feasibility – fuels and products must be price-competitive with those derived from petroleum. An obvious way to decrease the costs of biobased products from fungi is to make fermentation strains more productive and processes more efficient. Traditional strain improvement programs typically span a time scale measured in decades and process development done through the use of batch cultures is extremely labor intensive.

  15. Tissue enrichment analysis for C. elegans genomics.

    PubMed

    Angeles-Albores, David; N Lee, Raymond Y; Chan, Juancarlos; Sternberg, Paul W

    2016-09-13

    Over the last ten years, there has been explosive development in methods for measuring gene expression. These methods can identify thousands of genes altered between conditions, but understanding these datasets and forming hypotheses based on them remains challenging. One way to analyze these datasets is to associate ontologies (hierarchical, descriptive vocabularies with controlled relations between terms) with genes and to look for enrichment of specific terms. Although Gene Ontology (GO) is available for Caenorhabditis elegans, it does not include anatomical information. We have developed a tool for identifying enrichment of C. elegans tissues among gene sets and generated a website GUI where users can access this tool. Since a common drawback to ontology enrichment analyses is its verbosity, we developed a very simple filtering algorithm to reduce the ontology size by an order of magnitude. We adjusted these filters and validated our tool using a set of 30 gold standards from Expression Cluster data in WormBase. We show our tool can even discriminate between embryonic and larval tissues and can even identify tissues down to the single-cell level. We used our tool to identify multiple neuronal tissues that are down-regulated due to pathogen infection in C. elegans. Our Tissue Enrichment Analysis (TEA) can be found within WormBase, and can be downloaded using Python's standard pip installer. It tests a slimmed-down C. elegans tissue ontology for enrichment of specific terms and provides users with a text and graphic representation of the results.

  16. Development and Validation of a Comparative Genomic Fingerprinting Method for High-Resolution Genotyping of Campylobacter jejuni

    PubMed Central

    Ross, Susan L.; Mutschall, Steven K.; MacKinnon, Joanne M.; Roberts, Michael J.; Buchanan, Cody J.; Kruczkiewicz, Peter; Jokinen, Cassandra C.; Thomas, James E.; Nash, John H. E.; Gannon, Victor P. J.; Marshall, Barbara; Pollari, Frank; Clark, Clifford G.

    2012-01-01

    Campylobacter spp. are a leading cause of bacterial gastroenteritis worldwide. The need for molecular subtyping methods with enhanced discrimination in the context of surveillance- and outbreak-based epidemiologic investigations of Campylobacter spp. is critical to our understanding of sources and routes of transmission and the development of mitigation strategies to reduce the incidence of campylobacteriosis. We describe the development and validation of a rapid and high-resolution comparative genomic fingerprinting (CGF) method for C. jejuni. A total of 412 isolates from agricultural, environmental, retail, and human clinical sources obtained from the Canadian national integrated enteric pathogen surveillance program (C-EnterNet) were analyzed using a 40-gene assay (CGF40) and multilocus sequence typing (MLST). The significantly higher Simpson's index of diversity (ID) obtained with CGF40 (ID = 0.994) suggests that it has a higher discriminatory power than MLST at both the level of clonal complex (ID = 0.873) and sequence type (ID = 0.935). High Wallace coefficients obtained when CGF40 was used as the primary typing method suggest that CGF and MLST are highly concordant, and we show that isolates with identical MLST profiles are comprised of isolates with distinct but highly similar CGF profiles. The high concordance with MLST coupled with the ability to discriminate between closely related isolates suggests that CFG40 is useful in differentiating highly prevalent sequence types, such as ST21 and ST45. CGF40 is a high-resolution comparative genomics-based method for C. jejuni subtyping with high discriminatory power that is also rapid, low cost, and easily deployable for routine epidemiologic surveillance and outbreak investigations. PMID:22170908

  17. Development and validation of a comparative genomic fingerprinting method for high-resolution genotyping of Campylobacter jejuni.

    PubMed

    Taboada, Eduardo N; Ross, Susan L; Mutschall, Steven K; Mackinnon, Joanne M; Roberts, Michael J; Buchanan, Cody J; Kruczkiewicz, Peter; Jokinen, Cassandra C; Thomas, James E; Nash, John H E; Gannon, Victor P J; Marshall, Barbara; Pollari, Frank; Clark, Clifford G

    2012-03-01

    Campylobacter spp. are a leading cause of bacterial gastroenteritis worldwide. The need for molecular subtyping methods with enhanced discrimination in the context of surveillance- and outbreak-based epidemiologic investigations of Campylobacter spp. is critical to our understanding of sources and routes of transmission and the development of mitigation strategies to reduce the incidence of campylobacteriosis. We describe the development and validation of a rapid and high-resolution comparative genomic fingerprinting (CGF) method for C. jejuni. A total of 412 isolates from agricultural, environmental, retail, and human clinical sources obtained from the Canadian national integrated enteric pathogen surveillance program (C-EnterNet) were analyzed using a 40-gene assay (CGF40) and multilocus sequence typing (MLST). The significantly higher Simpson's index of diversity (ID) obtained with CGF40 (ID = 0.994) suggests that it has a higher discriminatory power than MLST at both the level of clonal complex (ID = 0.873) and sequence type (ID = 0.935). High Wallace coefficients obtained when CGF40 was used as the primary typing method suggest that CGF and MLST are highly concordant, and we show that isolates with identical MLST profiles are comprised of isolates with distinct but highly similar CGF profiles. The high concordance with MLST coupled with the ability to discriminate between closely related isolates suggests that CFG40 is useful in differentiating highly prevalent sequence types, such as ST21 and ST45. CGF40 is a high-resolution comparative genomics-based method for C. jejuni subtyping with high discriminatory power that is also rapid, low cost, and easily deployable for routine epidemiologic surveillance and outbreak investigations.

  18. [Detection of the introgression of genome elements of Aegilops cylindrica Host. into Triticum aestivum L. genome with ISSR-analysis].

    PubMed

    Galaev, A V; Babaiants, L T; Sivolap, Iu M

    2003-01-01

    Comparative analysis of introgressive and parental forms of wheat was carried out to reveal the sites of donor genome with new loci of resistance to fungal diseases. By ISSR-method 124 ISSR-loci were detected in the genomes of 18 individual plants of introgressive line 5/20-91; 17 of them have been related to introgressive fragments of Ae. cylindrica genome in T. aestivum. It was shown that ISSR-method is effective for detection of the variability caused by introgression of alien genetic material to T. aestivum genome.

  19. Genome wide characterization of simple sequence repeats in watermelon genome and their application in comparative mapping and genetic diversity analysis

    USDA-ARS?s Scientific Manuscript database

    Simple sequence repeats (SSR) or microsatellite markers are one of the most informative and versatile DNA-based markers. The use of next-generation sequencing technologies allow whole genome sequencing and make it possible to develop large numbers of SSRs through bioinformatic analysis of genome da...

  20. Integrated genome-wide analysis of genomic changes and gene regulation in human adrenocortical tissue samples.

    PubMed

    Gara, Sudheer Kumar; Wang, Yonghong; Patel, Dhaval; Liu-Chittenden, Yi; Jain, Meenu; Boufraqech, Myriem; Zhang, Lisa; Meltzer, Paul S; Kebebew, Electron

    2015-10-30

    To gain insight into the pathogenesis of adrenocortical carcinoma (ACC) and whether there is progression from normal-to-adenoma-to-carcinoma, we performed genome-wide gene expression, gene methylation, microRNA expression and comparative genomic hybridization (CGH) analysis in human adrenocortical tissue (normal, adrenocortical adenomas and ACC) samples. A pairwise comparison of normal, adrenocortical adenomas and ACC gene expression profiles with more than four-fold expression differences and an adjusted P-value < 0.05 revealed no major differences in normal versus adrenocortical adenoma whereas there are 808 and 1085, respectively, dysregulated genes between ACC versus adrenocortical adenoma and ACC versus normal. The majority of the dysregulated genes in ACC were downregulated. By integrating the CGH, gene methylation and expression profiles of potential miRNAs with the gene expression of dysregulated genes, we found that there are higher alterations in ACC versus normal compared to ACC versus adrenocortical adenoma. Importantly, we identified several novel molecular pathways that are associated with dysregulated genes and further experimentally validated that oncostatin m signaling induces caspase 3 dependent apoptosis and suppresses cell proliferation. Finally, we propose that there is higher number of genomic changes from normal-to-adenoma-to-carcinoma and identified oncostatin m signaling as a plausible druggable pathway for therapeutics.

  1. Analysis of potential genomic confounding in genetic association studies and an online genomic confounding browser (GCB).

    PubMed

    Raistrick, Christopher A; Alharbi, Khalid K; Day, Ian N M; Gaunt, Tom R

    2011-11-01

    Genome-wide association studies have transformed genetic studies of disease susceptibility, identifying many variants that may tag functional polymorphism nearby. Variants are often ascribed to a physically close gene exhibiting plausible functionality for a causal pathway. However, more physically remote genes may be at a lesser linkage or linkage disequilibrium (LD) distance from the tested SNP and could therefore contain the functional variant tagged. This analysis aims to identify instances where research may be misled by misassociation of a variant with a gene and develop tools to analyse genomic confounding. A catalogue of reported associations was systematically analysed for unreported genes which may represent the true functionality ascribed to a reported variant, calculating physical and genetic distances for all genes within 1 cM of the tagging polymorphism. Results revealed 55 SNPs where recombination was lower between the identified SNP and a physically more remote gene than initially reported, and 374 where an alternative gene was genetically and physically closer than the reported gene. Analyses show potential for genomic confounding through false inferences of variant association to a gene. An online visualization tool (http://gcb.genes.org.uk/) was developed to plot genes by physical and genetic distance relative to a variant, along with LD data. © 2011 The Authors Annals of Human Genetics © 2011 Blackwell Publishing Ltd/University College London.

  2. Comparative genomic analysis of the compound Brassica napus Rf locus.

    PubMed

    Gaborieau, Lydiane; Brown, Gregory G

    2016-10-26

    The plant trait of cytoplasmically-inherited male sterility (CMS) and its suppression by nuclear restorer-of-fertility (Rf) genes can be viewed as a genetic arms race between the mitochondrial and nuclear genomes. Most nuclear Rf genes have been shown to encode P-type pentatricopeptide repeat proteins (PPRs). Phylogenetic analysis of P-class PPRs from sequenced plants genomes has shown that Rf-proteins cluster in a distinct clade of P-class PPRs, RFL-PPRs, that display hallmarks of positive evolutionary selection. Genes encoding RFL-PPRs (RFLs) within a given plant genome tend to be closely related both in sequence and position, but a detailed understanding of how such species-specific expansion occurs is lacking. In the canola, (oilseed rape) species Brassica napus, previous work has indicated the nuclear restorer genes for the two native forms of CMS, Rfn (for nap CMS) and Rfp (pol CMS), represent alternate haplotypes, or alleles, of a single nuclear locus. Fine genetic mapping indicates that Rfn does indeed localize to the same genomic region as Rfp. We find this region is enriched in RFL genes, three of which, based on their position and expression, represent potential candidates for Rfn; one of these genes, designated PPR4, is a preferred candidate in that it is not expressed in the nap CMS line. Comparison of the corresponding regions of the genomes of B. rapa, B. oleracea, Arabidopsis thaliana and A. lyrata provides insight into the expansion of this group of RFL genes in different lines of evolutionary descent. Unlike other nuclear restorer loci containing multiple RFL genes, the RFL genes in the Rf region of B. napus are not present in tandem arrays but rather are dispersed in genomic location. The genes do not share similar flanking non-coding regions and do not contain introns, indicating that they have duplicated primarily through a retrotransposition-mediated process. In contrast, segmental duplication has been responsible for the distribution of the

  3. Genome-wide functional analysis in Candida albicans.

    PubMed

    Motaung, Thabiso E; Ells, Ruan; Pohl, Carolina H; Albertyn, Jacobus; Tsilo, Toi J

    2017-02-08

    Candida albicans is an important etiological agent of superficial and life-threatening infections in individuals with compromised immune systems. To date, we know of several overlapping genetic networks that govern virulence attributes in this fungal pathogen. Classical use of deletion mutants has led to the discovery of numerous virulence factors over the years, and genome-wide functional analysis has propelled gene discovery at an even faster pace. Indeed, a number of recent studies using large-scale genetic screens followed by genome-wide functional analysis has allowed for the unbiased discovery of many new genes involved in C. albicans biology. Here we share our perspectives on the role of these studies in analyzing fundamental aspects of C. albicans virulence properties.

  4. Multi-resolution Convolution Methodology for ICP Waveform Morphology Analysis.

    PubMed

    Shaw, Martin; Piper, Ian; Hawthorne, Christopher

    2016-01-01

    Intracranial pressure (ICP) monitoring is a key clinical tool in the assessment and treatment of patients in neurointensive care. ICP morphology analysis can be useful in the classification of waveform features.A methodology for the decomposition of an ICP signal into clinically relevant dimensions has been devised that allows the identification of important ICP waveform types. It has three main components. First, multi-resolution convolution analysis is used for the main signal decomposition. Then, an impulse function is created, with multiple parameters, that can represent any form in the signal under analysis. Finally, a simple, localised optimisation technique is used to find morphologies of interest in the decomposed data.A pilot application of this methodology using a simple signal has been performed. This has shown that the technique works with performance receiver operator characteristic area under the curve values for each of the waveform types: plateau wave, B wave and high and low compliance states of 0.936, 0.694, 0.676 and 0.698, respectively.This is a novel technique that showed some promise during the pilot analysis. However, it requires further optimisation to become a usable clinical tool for the automated analysis of ICP signals.

  5. Analysis of the impact of spatial resolution on land/water classifications using high-resolution aerial imagery

    USGS Publications Warehouse

    Enwright, Nicholas M.; Jones, William R.; Garber, Adrienne L.; Keller, Matthew J.

    2014-01-01

    Long-term monitoring efforts often use remote sensing to track trends in habitat or landscape conditions over time. To most appropriately compare observations over time, long-term monitoring efforts strive for consistency in methods. Thus, advances and changes in technology over time can present a challenge. For instance, modern camera technology has led to an increasing availability of very high-resolution imagery (i.e. submetre and metre) and a shift from analogue to digital photography. While numerous studies have shown that image resolution can impact the accuracy of classifications, most of these studies have focused on the impacts of comparing spatial resolution changes greater than 2 m. Thus, a knowledge gap exists on the impacts of minor changes in spatial resolution (i.e. submetre to about 1.5 m) in very high-resolution aerial imagery (i.e. 2 m resolution or less). This study compared the impact of spatial resolution on land/water classifications of an area dominated by coastal marsh vegetation in Louisiana, USA, using 1:12,000 scale colour-infrared analogue aerial photography (AAP) scanned at four different dot-per-inch resolutions simulating ground sample distances (GSDs) of 0.33, 0.54, 1, and 2 m. Analysis of the impact of spatial resolution on land/water classifications was conducted by exploring various spatial aspects of the classifications including density of waterbodies and frequency distributions in waterbody sizes. This study found that a small-magnitude change (1–1.5 m) in spatial resolution had little to no impact on the amount of water classified (i.e. percentage mapped was less than 1.5%), but had a significant impact on the mapping of very small waterbodies (i.e. waterbodies ≤ 250 m2). These findings should interest those using temporal image classifications derived from very high-resolution aerial photography as a component of long-term monitoring programs.

  6. Progress toward accurate high spatial resolution actinide analysis by EPMA

    NASA Astrophysics Data System (ADS)

    Jercinovic, M. J.; Allaz, J. M.; Williams, M. L.

    2010-12-01

    High precision, high spatial resolution EPMA of actinides is a significant issue for geochronology, resource geochemistry, and studies involving the nuclear fuel cycle. Particular interest focuses on understanding of the behavior of Th and U in the growth and breakdown reactions relevant to actinide-bearing phases (monazite, zircon, thorite, allanite, etc.), and geochemical fractionation processes involving Th and U in fluid interactions. Unfortunately, the measurement of minor and trace concentrations of U in the presence of major concentrations of Th and/or REEs is particularly problematic, especially in complexly zoned phases with large compositional variation on the micro or nanoscale - spatial resolutions now accessible with modern instruments. Sub-micron, high precision compositional analysis of minor components is feasible in very high Z phases where scattering is limited at lower kV (15kV or less) and where the beam diameter can be kept below 400nm at high current (e.g. 200-500nA). High collection efficiency spectrometers and high performance electron optics in EPMA now allow the use of lower overvoltage through an exceptional range in beam current, facilitating higher spatial resolution quantitative analysis. The U LIII edge at 17.2 kV precludes L-series analysis at low kV (high spatial resolution), requiring careful measurements of the actinide M series. Also, U-La detection (wavelength = 0.9A) requires the use of LiF (220) or (420), not generally available on most instruments. Strong peak overlaps of Th on U make highly accurate interference correction mandatory, with problems compounded by the ThMIV and ThMV absorption edges affecting peak, background, and interference calibration measurements (especially the interference of the Th M line family on UMb). Complex REE bearing phases such as monazite, zircon, and allanite have particularly complex interference issues due to multiple peak and background overlaps from elements present in the activation

  7. Genomic resolution of a cold subsurface aquifer community provides metabolic insights for novel microbes adapted to high CO2 concentrations.

    PubMed

    Probst, Alexander J; Castelle, Cindy J; Singh, Andrea; Brown, Christopher T; Anantharaman, Karthik; Sharon, Itai; Hug, Laura A; Burstein, David; Emerson, Joanne B; Thomas, Brian C; Banfield, Jillian F

    2017-02-01

    As in many deep underground environments, the microbial communities in subsurface high-CO2 ecosystems remain relatively unexplored. Recent investigations based on single-gene assays revealed a remarkable variety of organisms from little studied phyla in Crystal Geyser (Utah, USA), a site where deeply sourced CO2 -saturated fluids are erupted at the surface. To provide genomic resolution of the metabolisms of these organisms, we used a novel metagenomic approach to recover 227 high-quality genomes from 150 microbial species affiliated with 46 different phylum-level lineages. Bacteria from two novel phylum-level lineages have the capacity for CO2 fixation. Analyses of carbon fixation pathways in all studied organisms revealed that the Wood-Ljungdahl pathway and the Calvin-Benson-Bassham Cycle occurred with the highest frequency, whereas the reverse TCA cycle was little used. We infer that this, and selection for form II RuBisCOs, are adaptions to high CO2 -concentrations. However, many autotrophs can also grow mixotrophically, a strategy that confers metabolic versatility. The assignment of 156 hydrogenases to 90 different organisms suggests that H2 is an important inter-species energy currency even under gaseous CO2 -saturation. Overall, metabolic analyses at the organism level provided insight into the biochemical cycles that support subsurface life under the extreme condition of CO2 saturation.

  8. Transcription-coupled and global genome repair in the Saccharomyces cerevisiae RPB2 gene at nucleotide resolution.

    PubMed Central

    Tijsterman, M; Tasseron-de Jong, J G; van de Putte, P; Brouwer, J

    1996-01-01

    Repair of UV-induced cyclobutane pyrimidine dimers (CPDs) was examined at single nucleotide resolution in the yeast Saccharomyces cerevisiae, using an improved protocol for genomic end-labelling. To obtain the sensitivity required for adduct detection in yeast, an oligonucleotide-directed enrichment step was introduced into the current methodology developed for adduct detection in Escherichia coli. With this method, heterogeneous repair of CPDs within the RPB2 locus is observed. Individual CPDs positioned in the transcribed strand are removed very efficiently with identical kinetics. This fast repair starts within 23 bases downstream of the transcription initiation site. The non-transcribed strand of the active gene exhibits slow repair without detectable repair variations between individual lesions. In contrast, CPDs positioned in the promoter region show profound repair heterogeneity. Here, CPDs at specific sites are removed very quickly, with comparable rates to CPDs positioned in the transcribed strand, while at other positions lesions are not repaired at all during the period studied. Interestingly, the fast repair in the promoter region is dependent on the RAD7 and RAD16 genes, as are the slowly repaired CPDs in this region and in the non-transcribed strand. This indicates that the global genome repair pathway is not intrinsically slow and at specific positions can be as efficient as the transcription-coupled repair pathway. PMID:8836174

  9. A High Resolution Genome-Wide Scan for Significant Selective Sweeps: An Application to Pooled Sequence Data in Laying Chickens

    PubMed Central

    Qanbari, Saber; Strom, Tim M.; Haberer, Georg; Weigend, Steffen; Gheyas, Almas A.; Turner, Frances; Burt, David W.; Preisinger, Rudolf; Gianola, Daniel; Simianer, Henner

    2012-01-01

    In most studies aimed at localizing footprints of past selection, outliers at tails of the empirical distribution of a given test statistic are assumed to reflect locus-specific selective forces. Significance cutoffs are subjectively determined, rather than being related to a clear set of hypotheses. Here, we define an empirical p-value for the summary statistic by means of a permutation method that uses the observed SNP structure in the real data. To illustrate the methodology, we applied our approach to a panel of 2.9 million autosomal SNPs identified from re-sequencing a pool of 15 individuals from a brown egg layer line. We scanned the genome for local reductions in heterozygosity, suggestive of selective sweeps. We also employed a modified sliding window approach that accounts for gaps in the sequence and increases scanning resolution by moving the overlapping windows by steps of one SNP only, and suggest to call this a “creeping window” strategy. The approach confirmed selective sweeps in the region of previously described candidate genes, i.e. TSHR, PRL, PRLHR, INSR, LEPR, IGF1, and NRAMP1 when used as positive controls. The genome scan revealed 82 distinct regions with strong evidence of selection (genome-wide p-value<0.001), including genes known to be associated with eggshell structure and immune system such as CALB1 and GAL cluster, respectively. A substantial proportion of signals was found in poor gene content regions including the most extreme signal on chromosome 1. The observation of multiple signals in a highly selected layer line of chicken is consistent with the hypothesis that egg production is a complex trait controlled by many genes. PMID:23209582

  10. Refinement of the high-resolution physical and genetic map of Rhodobacter capsulatus and genome surveys using blots of the cosmid encyclopedia.

    PubMed Central

    Fonstein, M; Koshy, E G; Nikolskaya, T; Mourachov, P; Haselkorn, R

    1995-01-01

    Cosmids from a library containing Rhodobacter capsulatus DNA fragments were previously ordered in two contigs: one corresponding to the chromosome and one to a 134 kb plasmid. This map contained 40 regions connected only by colony hybridization. To confirm the linkage and correct the map, the actual sizes of the overlaps were determined by blot-hybridization with Rhodobacter chromosomal DNA and by mapping of additional cosmids. Several revisions of the earlier map include single cosmid shifts and inversions. One additional gap in a cosmid contig was also found, raising the possibility that the chromosome is not a contiguous circle. About 2500 additional EcoRI,BamHI and HindIII restriction sites were added to the 560 EcoRV sites previously mapped onto the Rhodobacter chromosome, increasing the resolution of the physical map to the size of individual genes. Twenty-five new markers were located on the genetic map. The 48 markers now mapped represent nearly 300 genes and ORFs cloned from different species of Rhodobacter. The orientation of transcription of the four rrn operons was established using 16S rRNA- and 23S rRNA-specific probes and digestion with the rare-cutting enzyme, CeuI. Gel blots of 192 cosmids of the miniset of R.capsulatus digested with EcoRV were prepared. Such a hybridization template represents the whole genome cut into 560 DNA fragments varying in size from 0.4 to 25 kb. This template was used for high-resolution mapping of single genes, analysis of total genomic DNAs from related Rhodobacter strains and differentially expressed RNAs. Images PMID:7737133

  11. High-resolution melt analysis without DNA extraction affords rapid genotype resolution and species identification.

    PubMed

    Rugman-Jones, Paul F; Stouthamer, Richard

    2016-09-22

    Extracting and sequencing DNA from specimens can impose major time and monetary costs to studies requiring genotyping, or identification to species, of large numbers of individuals. As such, so-called direct PCR methods have been developed enabling significant savings at the DNA extraction step. Similarly, real-time quantitative PCR techniques (qPCR) offer very cost-effective alternatives to sequencing. High-resolution melt analysis (HRM) is a qPCR method that incorporates an intercalating dye into a double-stranded PCR amplicon. The dye fluoresces brightly, but only when it is bound. Thus, after PCR, raising the temperature of the amplicon while measuring the fluorescence of the reaction results in the generation of a sequence-specific melt curve, allowing discrimination of genotypes. Methods combining HRM (or other qPCR methods) and direct PCR have not previously been reported, most likely due to concerns that any tissue in the reaction tube would interfere with detection of the fluorescent signal. Here, we couple direct PCR with HRM and, by way of three examples, demonstrate a very quick and cost-effective method for genotyping large numbers of specimens, using Rotor-Gene HRM instruments (QIAGEN). In contrast to the heated-block design of most qPCR/HRM instruments, the Rotor-Gene's centrifugal rotor and air-based temperature-regulation system facilitate our method by depositing tissues away from the pathway of the machine's fluorescence detection optics.

  12. A Functional Genomic Analysis of NF1-Associated Learning Disabilities

    DTIC Science & Technology

    2008-02-01

    family. For example, Rab1 and Rab2 are downregulated in the NF1 hippocampus, while Rab3A is upregulated; Synaptotagmin 1 is downregulated, while...syntaxin binding protein 1 DOWN 1421990_at Syt1 synaptotagmin I DOWN 1422589_at Rab3a RAB3A, member RAS oncogene family UP Vesicle recycling 1422809_at...AD_________________ Award Number: W81XWH-04- 1 -0261 TITLE: A Functional Genomic Analysis of NF1-Associated Learning Disabilities

  13. A Functional Genomic Analysis of NF1-Associated Learning Disabilities

    DTIC Science & Technology

    2007-02-01

    For example, Rab1 and Rab2 are downregulated in the NF1 hippocampus, while Rab3A is upregulated; Synaptotagmin 1 is downregulated, while...syntaxin binding protein 1 DOWN 1421990_at Syt1 synaptotagmin I DOWN 1422589_at Rab3a RAB3A, member RAS oncogene family UP Vesicle recycling 1422809_at...AD_________________ Award Number: W81XWH-04- 1 -0261 TITLE: A Functional Genomic Analysis of NF1

  14. Bordetella holmesii: initial genomic analysis of an emerging opportunist.

    PubMed

    Planet, Paul J; Narechania, Apurva; Hymes, Saul R; Gagliardo, Christina; Huard, Richard C; Whittier, Susan; Della-Latta, Phyllis; Ratner, Adam J

    2013-03-01

    Bordetella holmesii is an emerging opportunistic pathogen that causes respiratory disease in healthy individuals and invasive infections among patients lacking splenic function. We used 16S rRNA gene analysis to confirm B. holmesii as the cause of bacteremia in a child with sickle cell disease. Semiconductor-based draft genome sequencing provided insight into B. holmesii phylogeny and potential virulence mechanisms and also identified a toluene-4-monoxygenase locus unique among bordetellae.

  15. Bordetella holmesii: initial genomic analysis of an emerging opportunist

    PubMed Central

    Planet, Paul J.; Narechania, Apurva; Hymes, Saul R.; Gagliardo, Christina; Huard, Richard C.; Whittier, Susan; Della-Latta, Phyllis; Ratner, Adam J.

    2013-01-01

    Bordetella holmesii is an emerging opportunistic pathogen that causes respiratory disease in healthy individuals and invasive infections among patients lacking splenic function. We used 16S rRNA analysis to confirm B. holmesii as the cause of bacteremia in a child with sickle cell disease. Semiconductor-based draft genome sequencing provided insight into B. holmesii phylogeny and potential virulence mechanisms and also identified a toluene-4-monoxygenase locus unique among bordetellae. PMID:23620158

  16. Sequencing and comparative analysis of the gorilla MHC genomic sequence.

    PubMed

    Wilming, Laurens G; Hart, Elizabeth A; Coggill, Penny C; Horton, Roger; Gilbert, James G R; Clee, Chris; Jones, Matt; Lloyd, Christine; Palmer, Sophie; Sims, Sarah; Whitehead, Siobhan; Wiley, David; Beck, Stephan; Harrow, Jennifer L

    2013-01-01

    Major histocompatibility complex (MHC) genes play a critical role in vertebrate immune response and because the MHC is linked to a significant number of auto-immune and other diseases it is of great medical interest. Here we describe the clone-based sequencing and subsequent annotation of the MHC region of the gorilla genome. Because the MHC is subject to extensive variation, both structural and sequence-wise, it is not readily amenable to study in whole genome shotgun sequence such as the recently published gorilla genome. The variation of the MHC also makes it of evolutionary interest and therefore we analyse the sequence in the context of human and chimpanzee. In our comparisons with human and re-annotated chimpanzee MHC sequence we find that gorilla has a trimodular RCCX cluster, versus the reference human bimodular cluster, and additional copies of Class I (pseudo)genes between Gogo-K and Gogo-A (the orthologues of HLA-K and -A). We also find that Gogo-H (and Patr-H) is coding versus the HLA-H pseudogene and, conversely, there is a Gogo-DQB2 pseudogene versus the HLA-DQB2 coding gene. Our analysis, which is freely available through the VEGA genome browser, provides the research community with a comprehensive dataset for comparative and evolutionary research of the MHC.

  17. Sequencing and comparative analysis of the gorilla MHC genomic sequence

    PubMed Central

    Wilming, Laurens G.; Hart, Elizabeth A.; Coggill, Penny C.; Horton, Roger; Gilbert, James G. R.; Clee, Chris; Jones, Matt; Lloyd, Christine; Palmer, Sophie; Sims, Sarah; Whitehead, Siobhan; Wiley, David; Beck, Stephan; Harrow, Jennifer L.

    2013-01-01

    Major histocompatibility complex (MHC) genes play a critical role in vertebrate immune response and because the MHC is linked to a significant number of auto-immune and other diseases it is of great medical interest. Here we describe the clone-based sequencing and subsequent annotation of the MHC region of the gorilla genome. Because the MHC is subject to extensive variation, both structural and sequence-wise, it is not readily amenable to study in whole genome shotgun sequence such as the recently published gorilla genome. The variation of the MHC also makes it of evolutionary interest and therefore we analyse the sequence in the context of human and chimpanzee. In our comparisons with human and re-annotated chimpanzee MHC sequence we find that gorilla has a trimodular RCCX cluster, versus the reference human bimodular cluster, and additional copies of Class I (pseudo)genes between Gogo-K and Gogo-A (the orthologues of HLA-K and -A). We also find that Gogo-H (and Patr-H) is coding versus the HLA-H pseudogene and, conversely, there is a Gogo-DQB2 pseudogene versus the HLA-DQB2 coding gene. Our analysis, which is freely available through the VEGA genome browser, provides the research community with a comprehensive dataset for comparative and evolutionary research of the MHC. PMID:23589541

  18. Privacy-preserving GWAS analysis on federated genomic datasets

    PubMed Central

    2015-01-01

    Background The biomedical community benefits from the increasing availability of genomic data to support meaningful scientific research, e.g., Genome-Wide Association Studies (GWAS). However, high quality GWAS usually requires a large amount of samples, which can grow beyond the capability of a single institution. Federated genomic data analysis holds the promise of enabling cross-institution collaboration for effective GWAS, but it raises concerns about patient privacy and medical information confidentiality (as data are being exchanged across institutional boundaries), which becomes an inhibiting factor for the practical use. Methods We present a privacy-preserving GWAS framework on federated genomic datasets. Our method is to layer the GWAS computations on top of secure multi-party computation (MPC) systems. This approach allows two parties in a distributed system to mutually perform secure GWAS computations, but without exposing their private data outside. Results We demonstrate our technique by implementing a framework for minor allele frequency counting and χ2 statistics calculation, one of typical computations used in GWAS. For efficient prototyping, we use a state-of-the-art MPC framework, i.e., Portable Circuit Format (PCF) [1]. Our experimental results show promise in realizing both efficient and secure cross-institution GWAS computations. PMID:26733045

  19. Pan-Genome Analysis of Brazilian Lineage A Amoebal Mimiviruses

    PubMed Central

    Assis, Felipe L.; Bajrai, Leena; Abrahao, Jonatas S.; Kroon, Erna G.; Dornas, Fabio P.; Andrade, Kétyllen R.; Boratto, Paulo V. M.; Pilotto, Mariana R.; Robert, Catherine; Benamar, Samia; La Scola, Bernard; Colson, Philippe

    2015-01-01

    Since the recent discovery of Samba virus, the first representative of the family Mimiviridae from Brazil, prospecting for mimiviruses has been conducted in different environmental conditions in Brazil. Recently, we isolated using Acanthamoeba sp. three new mimiviruses, all of lineage A of amoebal mimiviruses: Kroon virus from urban lake water; Amazonia virus from the Brazilian Amazon river; and Oyster virus from farmed oysters. The aims of this work were to sequence and analyze the genome of these new Brazilian mimiviruses (mimi-BR) and update the analysis of the Samba virus genome. The genomes of Samba virus, Amazonia virus and Oyster virus were 97%–99% similar, whereas Kroon virus had a low similarity (90%–91%) with other mimi-BR. A total of 3877 proteins encoded by mimi-BR were grouped into 974 orthologous clusters. In addition, we identified three new ORFans in the Kroon virus genome. Additional work is needed to expand our knowledge of the diversity of mimiviruses from Brazil, including if and why among amoebal mimiviruses those of lineage A predominate in the Brazilian environment. PMID:26131958

  20. Analysis of the core genome and pangenome of Pseudomonas putida.

    PubMed

    Udaondo, Zulema; Molina, Lázaro; Segura, Ana; Duque, Estrella; Ramos, Juan L

    2016-10-01

    Pseudomonas putida are strict aerobes that proliferate in a range of temperate niches and are of interest for environmental applications due to their capacity to degrade pollutants and ability to promote plant growth. Furthermore solvent-tolerant strains are useful for biosynthesis of added-value chemicals. We present a comprehensive comparative analysis of nine strains and the first characterization of the Pseudomonas putida pangenome. The core genome of P. putida comprises approximately 3386 genes. The most abundant genes within the core genome are those that encode nutrient transporters. Other conserved genes include those for central carbon metabolism through the Entner-Doudoroff pathway, the pentose phosphate cycle, arginine and proline metabolism, and pathways for degradation of aromatic chemicals. Genes that encode transporters, enzymes and regulators for amino acid metabolism (synthesis and degradation) are all part of the core genome, as well as various electron transporters, which enable aerobic metabolism under different oxygen regimes. Within the core genome are 30 genes for flagella biosynthesis and 12 key genes for biofilm formation. Pseudomonas putida strains share 85% of the coding regions with Pseudomonas aeruginosa; however, in P. putida, virulence factors such as exotoxins and type III secretion systems are absent.