Science.gov

Sample records for resolution genomic analysis

  1. Regulatory analysis of the C. elegans genome with spatiotemporal resolution

    PubMed Central

    Araya, Carlos L.; Kawli, Trupti; Kundaje, Anshul; Jiang, Lixia; Wu, Beijing; Vafeados, Dionne; Terrell, Robert; Weissdepp, Peter; Gevirtzman, Louis; Mace, Daniel; Niu, Wei; Boyle, Alan P.; Xie, Dan; Ma, Lijia; Murray, John I.; Reinke, Valerie; Waterston, Robert H.; Snyder, Michael

    2015-01-01

    Summary Discovering the structure and dynamics of transcriptional regulatory events in the genome with cellular and temporal resolution is crucial to understanding the regulatory underpinnings of development and disease. We determined the genomic distribution of binding sites for 92 transcription factors (TFs) and regulatory proteins across multiple stages of C. elegans development by performing 241 ChIP-seq experiments. Integrating regulatory binding and cellular-resolution expression data yielded a spatiotemporally-resolved metazoan TF binding map. Using this map, we explore developmental regulatory circuits that encode combinatorial logic at the levels of co-binding and co-expression of TFs, characterizing (1) the genomic coverage and clustering of regulatory binding, (2) the binding preferences of and biological processes regulated by TFs, (3) the global TF co-associations and genomic subdomains that suggest shared patterns of regulation, and (4) key TFs and TF co-associations for fate specification of individual lineages and cell-types. PMID:25164749

  2. Bisulfite-free, base-resolution analysis of 5-formylcytosine at the genome scale.

    PubMed

    Xia, Bo; Han, Dali; Lu, Xingyu; Sun, Zhaozhu; Zhou, Ankun; Yin, Qiangzong; Zeng, Hu; Liu, Menghao; Jiang, Xiang; Xie, Wei; He, Chuan; Yi, Chengqi

    2015-11-01

    Active DNA demethylation in mammals involves oxidation of 5-methylcytosine (5mC) to 5-hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC) and 5-carboxylcytosine (5caC). However, genome-wide detection of 5fC at single-base resolution remains challenging. Here we present fC-CET, a bisulfite-free method for whole-genome analysis of 5fC based on selective chemical labeling of 5fC and subsequent C-to-T transition during PCR. Base-resolution 5fC maps showed limited overlap with 5hmC, with 5fC-marked regions more active than 5hmC-marked ones. PMID:26344045

  3. Topological Data Analysis Generates High-Resolution, Genome-wide Maps of Human Recombination.

    PubMed

    Camara, Pablo G; Rosenbloom, Daniel I S; Emmett, Kevin J; Levine, Arnold J; Rabadan, Raul

    2016-07-01

    Meiotic recombination is a fundamental evolutionary process driving diversity in eukaryotes. In mammals, recombination is known to occur preferentially at specific genomic regions. Using topological data analysis (TDA), a branch of applied topology that extracts global features from large data sets, we developed an efficient method for mapping recombination at fine scales. When compared to standard linkage-based methods, TDA can deal with a larger number of SNPs and genomes without incurring prohibitive computational costs. We applied TDA to 1,000 Genomes Project data and constructed high-resolution whole-genome recombination maps of seven human populations. Our analysis shows that recombination is generally under-represented within transcription start sites. However, the binding sites of specific transcription factors are enriched for sites of recombination. These include transcription factors that regulate the expression of meiosis- and gametogenesis-specific genes, cell cycle progression, and differentiation blockage. Additionally, our analysis identifies an enrichment for sites of recombination at repeat-derived loci matched by piwi-interacting RNAs. PMID:27345159

  4. Genome-wide analysis of chromatin packing in Arabidopsis thaliana at single-gene resolution

    PubMed Central

    Liu, Chang; Wang, Congmao; Wang, George; Becker, Claude; Zaidem, Maricris; Weigel, Detlef

    2016-01-01

    The three-dimensional packing of the genome plays an important role in regulating gene expression. We have used Hi-C, a genome-wide chromatin conformation capture (3C) method, to analyze Arabidopsis thaliana chromosomes dissected into subkilobase segments, which is required for gene-level resolution in this species with a gene-dense genome. We found that the repressive H3K27me3 histone mark is overrepresented in the promoter regions of genes that are in conformational linkage over long distances. In line with the globally dispersed distribution of RNA polymerase II in A. thaliana nuclear space, actively transcribed genes do not show a strong tendency to associate with each other. In general, there are often contacts between 5′ and 3′ ends of genes, forming local chromatin loops. Such self-loop structures of genes are more likely to occur in more highly expressed genes, although they can also be found in silent genes. Silent genes with local chromatin loops are highly enriched for the histone variant H3.3 at their 5′ and 3′ ends but depleted of repressive marks such as heterochromatic histone modifications and DNA methylation in flanking regions. Our results suggest that, different from animals, a major theme of genome folding in A. thaliana is the formation of structural units that correspond to gene bodies. PMID:27225844

  5. Analysis of complete mitochondrial genomes from extinct and extant rhinoceroses reveals lack of phylogenetic resolution

    PubMed Central

    Willerslev, Eske; Gilbert, M Thomas P; Binladen, Jonas; Ho, Simon YW; Campos, Paula F; Ratan, Aakrosh; Tomsho, Lynn P; da Fonseca, Rute R; Sher, Andrei; Kuznetsova, Tatanya V; Nowak-Kemp, Malgosia; Roth, Terri L; Miller, Webb; Schuster, Stephan C

    2009-01-01

    Background The scientific literature contains many examples where DNA sequence analyses have been used to provide definitive answers to phylogenetic problems that traditional (non-DNA based) approaches alone have failed to resolve. One notable example concerns the rhinoceroses, a group for which several contradictory phylogenies were proposed on the basis of morphology, then apparently resolved using mitochondrial DNA fragments. Results In this study we report the first complete mitochondrial genome sequences of the extinct ice-age woolly rhinoceros (Coelodonta antiquitatis), and the threatened Javan (Rhinoceros sondaicus), Sumatran (Dicerorhinus sumatrensis), and black (Diceros bicornis) rhinoceroses. In combination with the previously published mitochondrial genomes of the white (Ceratotherium simum) and Indian (Rhinoceros unicornis) rhinoceroses, this data set putatively enables reconstruction of the rhinoceros phylogeny. While the six species cluster into three strongly supported sister-pairings: (i) The black/white, (ii) the woolly/Sumatran, and (iii) the Javan/Indian, resolution of the higher-level relationships has no statistical support. The phylogenetic signal from individual genes is highly diffuse, with mixed topological support from different genes. Furthermore, the choice of outgroup (horse vs tapir) has considerable effect on reconstruction of the phylogeny. The lack of resolution is suggestive of a hard polytomy at the base of crown-group Rhinocerotidae, and this is supported by an investigation of the relative branch lengths. Conclusion Satisfactory resolution of the rhinoceros phylogeny may not be achievable without additional analyses of substantial amounts of nuclear DNA. This study provides a compelling demonstration that, in spite of substantial sequence length, there are significant limitations with single-locus phylogenetics. We expect further examples of this to appear as next-generation, large-scale sequencing of complete mitochondrial

  6. Genome-wide and fine-resolution association analysis of malaria in West Africa

    PubMed Central

    Jallow, Muminatou; Teo, Yik Ying; Small, Kerrin S; Rockett, Kirk A; Deloukas, Panos; Clark, Taane G; Kivinen, Katja; Bojang, Kalifa A; Conway, David J; Pinder, Margaret; Sirugo, Giorgio; Sisay-Joof, Fatou; Usen, Stanley; Auburn, Sarah; Bumpstead, Suzannah J; Campino, Susana; Coffey, Alison; Dunham, Andrew; Fry, Andrew E; Green, Angela; Gwilliam, Rhian; Hunt, Sarah E; Inouye, Michael; Jeffreys, Anna E; Mendy, Alieu; Palotie, Aarno; Potter, Simon; Ragoussis, Jiannis; Rogers, Jane; Rowlands, Kate; Somaskantharajah, Elilan; Whittaker, Pamela; Widden, Claire; Donnelly, Peter; Howie, Bryan; Marchini, Jonathan; Morris, Andrew; SanJoaquin, Miguel; Achidi, Eric Akum; Agbenyega, Tsiri; Allen, Angela; Amodu, Olukemi; Corran, Patrick; Djimde, Abdoulaye; Dolo, Amagana; Doumbo, Ogobara K; Drakeley, Chris; Dunstan, Sarah; Evans, Jennifer; Farrar, Jeremy; Fernando, Deepika; Hien, Tran Tinh; Horstmann, Rolf D; Ibrahim, Muntaser; Karunaweera, Nadira; Kokwaro, Gilbert; Koram, Kwadwo A; Lemnge, Martha; Makani, Julie; Marsh, Kevin; Michon, Pascal; Modiano, David; Molyneux, Malcolm E; Mueller, Ivo; Parker, Michael; Peshu, Norbert; Plowe, Christopher V; Puijalon, Odile; Reeder, John; Reyburn, Hugh; Riley, Eleanor M; Sakuntabhai, Anavaj; Singhasivanon, Pratap; Sirima, Sodiomon; Tall, Adama; Taylor, Terrie E; Thera, Mahamadou; Troye-Blomberg, Marita; Williams, Thomas N; Wilson, Michael; Kwiatkowski, Dominic P

    2009-01-01

    We report a genome-wide association (GWA) study of severe malaria in The Gambia. The initial GWA scan included 2,500 children genotyped on the Affymetrix 500K GeneChip, and a replication study included 3,400 children. We used this to examine the performance of GWA methods in Africa. We found considerable population stratification, and also that signals of association at known malaria resistance loci were greatly attenuated owing to weak linkage disequilibrium (LD). To investigate possible solutions to the problem of low LD, we focused on the HbS locus, sequencing this region of the genome in 62 Gambian individuals and then using these data to conduct multipoint imputation in the GWA samples. This increased the signal of association, from P = 4 × 10−7 to P = 4 × 10−14, with the peak of the signal located precisely at the HbS causal variant. Our findings provide proof of principle that fine-resolution multipoint imputation, based on population-specific sequencing data, can substantially boost authentic GWA signals and enable fine mapping of causal variants in African populations. PMID:19465909

  7. Genome-wide and fine-resolution association analysis of malaria in West Africa.

    PubMed

    Jallow, Muminatou; Teo, Yik Ying; Small, Kerrin S; Rockett, Kirk A; Deloukas, Panos; Clark, Taane G; Kivinen, Katja; Bojang, Kalifa A; Conway, David J; Pinder, Margaret; Sirugo, Giorgio; Sisay-Joof, Fatou; Usen, Stanley; Auburn, Sarah; Bumpstead, Suzannah J; Campino, Susana; Coffey, Alison; Dunham, Andrew; Fry, Andrew E; Green, Angela; Gwilliam, Rhian; Hunt, Sarah E; Inouye, Michael; Jeffreys, Anna E; Mendy, Alieu; Palotie, Aarno; Potter, Simon; Ragoussis, Jiannis; Rogers, Jane; Rowlands, Kate; Somaskantharajah, Elilan; Whittaker, Pamela; Widden, Claire; Donnelly, Peter; Howie, Bryan; Marchini, Jonathan; Morris, Andrew; SanJoaquin, Miguel; Achidi, Eric Akum; Agbenyega, Tsiri; Allen, Angela; Amodu, Olukemi; Corran, Patrick; Djimde, Abdoulaye; Dolo, Amagana; Doumbo, Ogobara K; Drakeley, Chris; Dunstan, Sarah; Evans, Jennifer; Farrar, Jeremy; Fernando, Deepika; Hien, Tran Tinh; Horstmann, Rolf D; Ibrahim, Muntaser; Karunaweera, Nadira; Kokwaro, Gilbert; Koram, Kwadwo A; Lemnge, Martha; Makani, Julie; Marsh, Kevin; Michon, Pascal; Modiano, David; Molyneux, Malcolm E; Mueller, Ivo; Parker, Michael; Peshu, Norbert; Plowe, Christopher V; Puijalon, Odile; Reeder, John; Reyburn, Hugh; Riley, Eleanor M; Sakuntabhai, Anavaj; Singhasivanon, Pratap; Sirima, Sodiomon; Tall, Adama; Taylor, Terrie E; Thera, Mahamadou; Troye-Blomberg, Marita; Williams, Thomas N; Wilson, Michael; Kwiatkowski, Dominic P

    2009-06-01

    We report a genome-wide association (GWA) study of severe malaria in The Gambia. The initial GWA scan included 2,500 children genotyped on the Affymetrix 500K GeneChip, and a replication study included 3,400 children. We used this to examine the performance of GWA methods in Africa. We found considerable population stratification, and also that signals of association at known malaria resistance loci were greatly attenuated owing to weak linkage disequilibrium (LD). To investigate possible solutions to the problem of low LD, we focused on the HbS locus, sequencing this region of the genome in 62 Gambian individuals and then using these data to conduct multipoint imputation in the GWA samples. This increased the signal of association, from P = 4 × 10(-7) to P = 4 × 10(-14), with the peak of the signal located precisely at the HbS causal variant. Our findings provide proof of principle that fine-resolution multipoint imputation, based on population-specific sequencing data, can substantially boost authentic GWA signals and enable fine mapping of causal variants in African populations. PMID:19465909

  8. Genome-Wide High-Resolution aCGH Analysis of Gestational Choriocarcinomas

    PubMed Central

    Poaty, Henriette; Coullin, Philippe; Peko, Jean Félix; Dessen, Philippe; Diatta, Ange Lucien; Valent, Alexander; Leguern, Eric; Prévot, Sophie; Gombé-Mbalawa, Charles; Candelier, Jean-Jacques; Picard, Jean-Yves; Bernheim, Alain

    2012-01-01

    Eleven samples of DNA from choriocarcinomas were studied by high resolution CGH-array 244 K. They were studied after histopathological confirmation of the diagnosis, of the androgenic etiology and after a microsatellite marker analysis confirming the absence of contamination of tumor DNA from maternal DNA. Three cell lines, BeWo, JAR, JEG were also studied by this high resolution pangenomic technique. According to aCGH analysis, the de novo choriocarcinomas exhibited simple chromosomal rearrangements or normal profiles. The cell lines showed various and complex chromosomal aberrations. 23 Minimal Critical Regions were defined that allowed us to list the genes that were potentially implicated. Among them, unusually high numbers of microRNA clusters and imprinted genes were observed. PMID:22253721

  9. The interaction of high-resolution electrophoresis and computational analysis in genome mapping

    SciTech Connect

    Carrano, A.V.; Branscomb, E.W.; de Jong, P.J.; Mohrenweiser, H.; Olsen, A.; Slezak, T.

    1990-07-26

    The construction of physical maps and the determination of the DNA sequence of chromosome-size segments of the human genome is a complex, multidisciplinary undertaking. The approach we have taken to construct a physical map and sequence of human chromosome 19 typifies these interactions. We exploit the power of both acrylamide and agarose gel electrophoresis to provide a simple and versatile method for DNA fingerprinting and the creation of contigs or sets of overlapping genomic clones. Cosmid libraries are constructed from Yeast Artificial Chromosomes (YAC) clones or from flow-sorted chromosomes. Cosmid DNA isolated from the screened library array is cut with a combination of five restriction enzymes and the fragment ends labeled with one of four different fluorochromes. Our approach to contig construction uses a robotic system to label restriction fragments from cosmids with fluorochromes, use of an automated DNA sequencer to capture fragment mobility data in a high resolution multiplex mode processes the mobility data to determine fragment length and provide a statistical measure of overlap among cosmids; and display the contigs and underlying cosmids for operator interaction and access to a database. Data analyses and interactions are conducted over a network of SUN workstations using a set of software tools that we developed and coupled to a commercially available database. Applying these methods, we have analyzed 5154 cosmid clones and assembled 515 contigs for chromosome 19. Some of these contigs have been identified with known genes and many have been mapped to the chromosome by fluorescence in situ hybridization. Existing contigs are being extended by a combination of walking and fingerprinting. 21 refs., 2 figs.

  10. A high-resolution genomic analysis of multidrug-resistant hospital outbreaks of Klebsiella pneumoniae

    PubMed Central

    Chung The, Hao; Karkey, Abhilasha; Pham Thanh, Duy; Boinett, Christine J; Cain, Amy K; Ellington, Matthew; Baker, Kate S; Dongol, Sabina; Thompson, Corinne; Harris, Simon R; Jombart, Thibaut; Le Thi Phuong, Tu; Tran Do Hoang, Nhu; Ha Thanh, Tuyen; Shretha, Shrijana; Joshi, Suchita; Basnyat, Buddha; Thwaites, Guy; Thomson, Nicholas R; Rabaa, Maia A; Baker, Stephen

    2015-01-01

    Multidrug-resistant (MDR) Klebsiella pneumoniae has become a leading cause of nosocomial infections worldwide. Despite its prominence, little is known about the genetic diversity of K. pneumoniae in resource-poor hospital settings. Through whole-genome sequencing (WGS), we reconstructed an outbreak of MDR K. pneumoniae occurring on high-dependency wards in a hospital in Kathmandu during 2012 with a case-fatality rate of 75%. The WGS analysis permitted the identification of two MDR K. pneumoniae lineages causing distinct outbreaks within the complex endemic K. pneumoniae. Using phylogenetic reconstruction and lineage-specific PCR, our data predicted a scenario in which K. pneumoniae, circulating for 6 months before the outbreak, underwent a series of ward-specific clonal expansions after the acquisition of genes facilitating virulence and MDR. We suggest that the early detection of a specific NDM-1 containing lineage in 2011 would have alerted the high-dependency ward staff to intervene. We argue that some form of real-time genetic characterisation, alongside clade-specific PCR during an outbreak, should be factored into future healthcare infection control practices in both high- and low-income settings. PMID:25712531

  11. High-resolution genomic analysis suggests the absence of recurrent genomic alterations other than SMARCB1 aberrations in atypical teratoid/rhabdoid tumors.

    PubMed

    Hasselblatt, Martin; Isken, Sarah; Linge, Anna; Eikmeier, Kristin; Jeibmann, Astrid; Oyen, Florian; Nagel, Inga; Richter, Julia; Bartelheim, Kerstin; Kordes, Uwe; Schneppenheim, Reinhard; Frühwald, Michael; Siebert, Reiner; Paulus, Werner

    2013-02-01

    Atypical teratoid/rhabdoid tumor (AT/RT) is a rare malignant pediatric brain tumor characterized by genetic alterations affecting the SMARCB1 (hSNF5/INI1) locus in chromosome band 22q11.2. To identify potential additional genetic alterations, high-resolution genome-wide analysis was performed using a molecular inversion probe single-nucleotide polymorphism (MIP SNP) assay (Affymetrix OncoScan formalin-fixed paraffin-embedded express) on DNA isolated from 18 formalin-fixed paraffin-embedded archival samples. Alterations affecting the SMARCB1 locus could be demonstrated by MIP SNP in 15 out of 16 evaluable cases (94%). These comprised five tumors with homozygous deletions, six tumors with heterozygous deletions, and four tumors with copy number neutral loss of heterozygosity (LOH) involving chromosome band 22q11.2. Remarkably, MIB SNP analysis did not yield any further recurrent chromosomal gains, losses, or copy neutral LOH. On MIP SNP screening for somatic mutations, the presence of a SMARCB1 mutation (c.472C>T p.R158X) was confirmed, but no recurrent mutations of other cancer relevant genes could be identified. Results of fluorescence in situ hybridization, multiplex ligation-dependent probe amplification, and SMARCB1 sequencing were highly congruent with that of the MIP SNP assay. In conclusion, these data further suggest the absence of recurrent genomic alterations other than SMARCB1 in AT/RT. PMID:23074045

  12. Analysis of complete mitochondrial genome sequences increases phylogenetic resolution of bears (Ursidae), a mammalian family that experienced rapid speciation

    PubMed Central

    Yu, Li; Li, Yi-Wei; Ryder, Oliver A; Zhang, Ya-Ping

    2007-01-01

    Background Despite the small number of ursid species, bear phylogeny has long been a focus of study due to their conservation value, as all bear genera have been classified as endangered at either the species or subspecies level. The Ursidae family represents a typical example of rapid evolutionary radiation. Previous analyses with a single mitochondrial (mt) gene or a small number of mt genes either provide weak support or a large unresolved polytomy for ursids. We revisit the contentious relationships within Ursidae by analyzing complete mt genome sequences and evaluating the performance of both entire mt genomes and constituent mtDNA genes in recovering a phylogeny of extremely recent speciation events. Results This mitochondrial genome-based phylogeny provides strong evidence that the spectacled bear diverged first, while within the genus Ursus, the sloth bear is the sister taxon of all the other five ursines. The latter group is divided into the brown bear/polar bear and the two black bears/sun bear assemblages. These findings resolve the previous conflicts between trees using partial mt genes. The ability of different categories of mt protein coding genes to recover the correct phylogeny is concordant with previous analyses for taxa with deep divergence times. This study provides a robust Ursidae phylogenetic framework for future validation by additional independent evidence, and also has significant implications for assisting in the resolution of other similarly difficult phylogenetic investigations. Conclusion Identification of base composition bias and utilization of the combined data of whole mitochondrial genome sequences has allowed recovery of a strongly supported phylogeny that is upheld when using multiple alternative outgroups for the Ursidae, a mammalian family that underwent a rapid radiation since the mid- to late Pliocene. It remains to be seen if the reliability of mt genome analysis will hold up in studies of other difficult phylogenetic

  13. High Resolution Genome-Wide Analysis of Chromosomal Alterations in Burkitt's Lymphoma

    PubMed Central

    Toujani, Saloua; Dessen, Philippe; Ithzar, Nathalie; Danglot, Gisèle; Richon, Catherine; Vassetzky, Yegor; Robert, Thomas; Lazar, Vladimir; Bosq, Jacques; Da Costa, Lydie; Pérot, Christine; Ribrag, Vincent; Patte, Catherine; Wiels, Jöelle; Bernheim, Alain

    2009-01-01

    Additional chromosomal abnormalities are currently detected in Burkitt's lymphoma. They play major roles in the progression of BL and in prognosis. The genes involved remain elusive. A whole-genome oligonucleotide array CGH analysis correlated with karyotype and FISH was performed in a set of 27 Burkitt's lymphoma-derived cell lines and primary tumors. More than half of the 145 CNAs<2 Mb were mapped to Mendelian CNVs, including GSTT1, glutathione s-transferase and BIRC6, an anti-apoptotic protein, possibly predisposing to some cancers. Somatic cell line-specific CNVs localized to the IG locus were consistently observed with the 244 K aCGH platform. Among 136 CNAs >2 Mb, gains were found in 1q (12/27), 13q (7/27), 7q (6/27), 8q(4/27), 2p (3/27), 11q (2/27) and 15q (2/27). Losses were found in 3p (5/27), 4p (4/27), 4q (4/27), 9p (4/27), 13q (4/27), 6p (3/27), 17p (3/27), 6q (2/27),11pterp13 (2/27) and 14q12q21.3 (2/27). Twenty one minimal critical regions (MCR), (range 0.04–71.36 Mb), were delineated in tumors and cell lines. Three MCRs were localized to 1q. The proximal one was mapped to 1q21.1q25.2 with a 6.3 Mb amplicon (1q21.1q21.3) harboring BCA2 and PIAS3. In the other 2 MCRs, 1q32.1 and 1q44, MDM4 and AKT3 appeared as possible drivers of these gains respectively. The 13q31.3q32.1 <89.58–96.81> MCR contained an amplicon and ABCC4 might be the driver of this amplicon. The 40 Kb 2p16.1 <60.96–61> MCR was the smallest gained MCR and specifically encompassed the REL oncogene which is already implicated in B cell lymphomas. The most frequently deleted MCR was 3p14.1 <60.43–60.53> that removed the fifth exon of FHIT. Further investigations which combined gene expression and functional studies are essential to understand the lymphomagenesis mechanism and for the development of more effective, targeted therapeutic strategies. PMID:19759907

  14. Genomic paradigms for food-borne enteric pathogen analysis at the USFDA: case studies highlighting method utility, integration and resolution.

    PubMed

    Elkins, C A; Kotewicz, M L; Jackson, S A; Lacher, D W; Abu-Ali, G S; Patel, I R

    2013-01-01

    Modern risk control and food safety practices involving food-borne bacterial pathogens are benefiting from new genomic technologies for rapid, yet highly specific, strain characterisations. Within the United States Food and Drug Administration (USFDA) Center for Food Safety and Applied Nutrition (CFSAN), optical genome mapping and DNA microarray genotyping have been used for several years to quickly assess genomic architecture and gene content, respectively, for outbreak strain subtyping and to enhance retrospective trace-back analyses. The application and relative utility of each method varies with outbreak scenario and the suspect pathogen, with comparative analytical power enhanced by database scale and depth. Integration of these two technologies allows high-resolution scrutiny of the genomic landscapes of enteric food-borne pathogens with notable examples including Shiga toxin-producing Escherichia coli (STEC) and Salmonella enterica serovars from a variety of food commodities. Moreover, the recent application of whole genome sequencing technologies to food-borne pathogen outbreaks and surveillance has enhanced resolution to the single nucleotide scale. This new wealth of sequence data will support more refined next-generation custom microarray designs, targeted re-sequencing and "genomic signature recognition" approaches involving a combination of genes and single nucleotide polymorphism detection to distil strain-specific fingerprinting to a minimised scale. This paper examines the utility of microarrays and optical mapping in analysing outbreaks, reviews best practices and the limits of these technologies for pathogen differentiation, and it considers future integration with whole genome sequencing efforts. PMID:23199033

  15. Xenopus tropicalis Genome Re-Scaffolding and Re-Annotation Reach the Resolution Required for In Vivo ChIA-PET Analysis

    PubMed Central

    Buisine, Nicolas; Ruan, Xiaoan; Bilesimo, Patrice; Grimaldi, Alexis; Alfama, Gladys; Ariyaratne, Pramila; Mulawadi, Fabianus; Chen, Jieqi; Sung, Wing-Kin; Liu, Edison T.; Demeneix, Barbara A.; Ruan, Yijun; Sachs, Laurent M.

    2015-01-01

    Genome-wide functional analyses require high-resolution genome assembly and annotation. We applied ChIA-PET to analyze gene regulatory networks, including 3D chromosome interactions, underlying thyroid hormone (TH) signaling in the frog Xenopus tropicalis. As the available versions of Xenopus tropicalis assembly and annotation lacked the resolution required for ChIA-PET we improve the genome assembly version 4.1 and annotations using data derived from the paired end tag (PET) sequencing technologies and approaches (e.g., DNA-PET [gPET], RNA-PET etc.). The large insert (~10Kb, ~17Kb) paired end DNA-PET with high throughput NGS sequencing not only significantly improved genome assembly quality, but also strongly reduced genome “fragmentation”, reducing total scaffold numbers by ~60%. Next, RNA-PET technology, designed and developed for the detection of full-length transcripts and fusion mRNA in whole transcriptome studies (ENCODE consortia), was applied to capture the 5' and 3' ends of transcripts. These amendments in assembly and annotation were essential prerequisites for the ChIA-PET analysis of TH transcription regulation. Their application revealed complex regulatory configurations of target genes and the structures of the regulatory networks underlying physiological responses. Our work allowed us to improve the quality of Xenopus tropicalis genomic resources, reaching the standard required for ChIA-PET analysis of transcriptional networks. We consider that the workflow proposed offers useful conceptual and methodological guidance and can readily be applied to other non-conventional models that have low-resolution genome data. PMID:26348928

  16. Xenopus tropicalis Genome Re-Scaffolding and Re-Annotation Reach the Resolution Required for In Vivo ChIA-PET Analysis.

    PubMed

    Buisine, Nicolas; Ruan, Xiaoan; Bilesimo, Patrice; Grimaldi, Alexis; Alfama, Gladys; Ariyaratne, Pramila; Mulawadi, Fabianus; Chen, Jieqi; Sung, Wing-Kin; Liu, Edison T; Demeneix, Barbara A; Ruan, Yijun; Sachs, Laurent M

    2015-01-01

    Genome-wide functional analyses require high-resolution genome assembly and annotation. We applied ChIA-PET to analyze gene regulatory networks, including 3D chromosome interactions, underlying thyroid hormone (TH) signaling in the frog Xenopus tropicalis. As the available versions of Xenopus tropicalis assembly and annotation lacked the resolution required for ChIA-PET we improve the genome assembly version 4.1 and annotations using data derived from the paired end tag (PET) sequencing technologies and approaches (e.g., DNA-PET [gPET], RNA-PET etc.). The large insert (~10 Kb, ~17 Kb) paired end DNA-PET with high throughput NGS sequencing not only significantly improved genome assembly quality, but also strongly reduced genome "fragmentation", reducing total scaffold numbers by ~60%. Next, RNA-PET technology, designed and developed for the detection of full-length transcripts and fusion mRNA in whole transcriptome studies (ENCODE consortia), was applied to capture the 5' and 3' ends of transcripts. These amendments in assembly and annotation were essential prerequisites for the ChIA-PET analysis of TH transcription regulation. Their application revealed complex regulatory configurations of target genes and the structures of the regulatory networks underlying physiological responses. Our work allowed us to improve the quality of Xenopus tropicalis genomic resources, reaching the standard required for ChIA-PET analysis of transcriptional networks. We consider that the workflow proposed offers useful conceptual and methodological guidance and can readily be applied to other non-conventional models that have low-resolution genome data. PMID:26348928

  17. Genome-wide SNP identification for the construction of a high-resolution genetic map of Japanese flounder (Paralichthys olivaceus): applications to QTL mapping of Vibrio anguillarum disease resistance and comparative genomic analysis.

    PubMed

    Shao, Changwei; Niu, Yongchao; Rastas, Pasi; Liu, Yang; Xie, Zhiyuan; Li, Hengde; Wang, Lei; Jiang, Yong; Tai, Shuaishuai; Tian, Yongsheng; Sakamoto, Takashi; Chen, Songlin

    2015-04-01

    High-resolution genetic maps are essential for fine mapping of complex traits, genome assembly, and comparative genomic analysis. Single-nucleotide polymorphisms (SNPs) are the primary molecular markers used for genetic map construction. In this study, we identified 13,362 SNPs evenly distributed across the Japanese flounder (Paralichthys olivaceus) genome. Of these SNPs, 12,712 high-confidence SNPs were subjected to high-throughput genotyping and assigned to 24 consensus linkage groups (LGs). The total length of the genetic linkage map was 3,497.29 cM with an average distance of 0.47 cM between loci, thereby representing the densest genetic map currently reported for Japanese flounder. Nine positive quantitative trait loci (QTLs) forming two main clusters for Vibrio anguillarum disease resistance were detected. All QTLs could explain 5.1-8.38% of the total phenotypic variation. Synteny analysis of the QTL regions on the genome assembly revealed 12 immune-related genes, among them 4 genes strongly associated with V. anguillarum disease resistance. In addition, 246 genome assembly scaffolds with an average size of 21.79 Mb were anchored onto the LGs; these scaffolds, comprising 522.99 Mb, represented 95.78% of assembled genomic sequences. The mapped assembly scaffolds in Japanese flounder were used for genome synteny analyses against zebrafish (Danio rerio) and medaka (Oryzias latipes). Flounder and medaka were found to possess almost one-to-one synteny, whereas flounder and zebrafish exhibited a multi-syntenic correspondence. The newly developed high-resolution genetic map, which will facilitate QTL mapping, scaffold assembly, and genome synteny analysis of Japanese flounder, marks a milestone in the ongoing genome project for this species. PMID:25762582

  18. Combined Analysis of Variation in Core, Accessory and Regulatory Genome Regions Provides a Super-Resolution View into the Evolution of Bacterial Populations.

    PubMed

    McNally, Alan; Oren, Yaara; Kelly, Darren; Pascoe, Ben; Dunn, Steven; Sreecharan, Tristan; Vehkala, Minna; Välimäki, Niko; Prentice, Michael B; Ashour, Amgad; Avram, Oren; Pupko, Tal; Dobrindt, Ulrich; Literak, Ivan; Guenther, Sebastian; Schaufler, Katharina; Wieler, Lothar H; Zhiyong, Zong; Sheppard, Samuel K; McInerney, James O; Corander, Jukka

    2016-09-01

    The use of whole-genome phylogenetic analysis has revolutionized our understanding of the evolution and spread of many important bacterial pathogens due to the high resolution view it provides. However, the majority of such analyses do not consider the potential role of accessory genes when inferring evolutionary trajectories. Moreover, the recently discovered importance of the switching of gene regulatory elements suggests that an exhaustive analysis, combining information from core and accessory genes with regulatory elements could provide unparalleled detail of the evolution of a bacterial population. Here we demonstrate this principle by applying it to a worldwide multi-host sample of the important pathogenic E. coli lineage ST131. Our approach reveals the existence of multiple circulating subtypes of the major drug-resistant clade of ST131 and provides the first ever population level evidence of core genome substitutions in gene regulatory regions associated with the acquisition and maintenance of different accessory genome elements. PMID:27618184

  19. Joint GWAS Analysis: Comparing similar GWAS at different genomic resolutions identifies novel pathway associations with six complex diseases

    PubMed Central

    McGeachie, Michael J.; Clemmer, George L.; Lasky-Su, Jessica; Dahlin, Amber; Raby, Benjamin A.; Weiss, Scott T.

    2014-01-01

    We show here that combining two existing genome wide association studies (GWAS) yields additional biologically relevant information, beyond that obtained by either GWAS separately. We propose Joint GWAS Analysis, a method that compares a pair of GWAS for similarity among the top SNP associations, top genes identified, gene functional clusters, and top biological pathways. We show that Joint GWAS Analysis identifies additional enriched biological pathways that would be missed by traditional Single-GWAS analysis. Furthermore, we examine the similarities of six complex genetic disorders at the SNP-level, gene-level, gene-cluster-level, and pathway-level. We make concrete hypotheses regarding novel pathway associations for several complex disorders considered, based on the results of Joint GWAS Analysis. Together, these results demonstrate that common complex disorders share substantially more genomic architecture than has been previously realized and that the meta-analysis of GWAS needs not be limited to GWAS of the same phenotype to be informative. PMID:25838990

  20. Whole-genome analysis of 5-hydroxymethylcytosine and 5-methylcytosine at base resolution in the human brain

    PubMed Central

    2014-01-01

    Background 5-methylcytosine (mC) can be oxidized by the tet methylcytosine dioxygenase (Tet) family of enzymes to 5-hydroxymethylcytosine (hmC), which is an intermediate of mC demethylation and may also be a stable epigenetic modification that influences chromatin structure. hmC is particularly abundant in mammalian brains but its function is currently unknown. A high-resolution hydroxymethylome map is required to fully understand the function of hmC in the human brain. Results We present genome-wide and single-base resolution maps of hmC and mC in the human brain by combined application of Tet-assisted bisulfite sequencing and bisulfite sequencing. We demonstrate that hmCs increase markedly from the fetal to the adult stage, and in the adult brain, 13% of all CpGs are highly hydroxymethylated with strong enrichment at genic regions and distal regulatory elements. Notably, hmC peaks are identified at the 5′splicing sites at the exon-intron boundary, suggesting a mechanistic link between hmC and splicing. We report a surprising transcription-correlated hmC bias toward the sense strand and an mC bias toward the antisense strand of gene bodies. Furthermore, hmC is negatively correlated with H3K27me3-marked and H3K9me3-marked repressive genomic regions, and is more enriched at poised enhancers than active enhancers. Conclusions We provide single-base resolution hmC and mC maps in the human brain and our data imply novel roles of hmC in regulating splicing and gene expression. Hydroxymethylation is the main modification status for a large portion of CpGs situated at poised enhancers and actively transcribed regions, suggesting its roles in epigenetic tuning at these regions. PMID:24594098

  1. High-resolution Whole-Genome Analysis of Skull Base Chordomas Implicates FHIT Loss in Chordoma Pathogenesis12

    PubMed Central

    Diaz, Roberto Jose; Guduk, Mustafa; Romagnuolo, Rocco; Smith, Christian A; Northcott, Paul; Shih, David; Berisha, Fitim; Flanagan, Adrienne; Munoz, David G; Cusimano, Michael D; Pamir, M Necmettin; Rutka, James T

    2012-01-01

    Chordoma is a rare tumor arising in the sacrum, clivus, or vertebrae. It is often not completely resectable and shows a high incidence of recurrence and progression with shortened patient survival and impaired quality of life. Chemotherapeutic options are limited to investigational therapies at present. Therefore, adjuvant therapy for control of tumor recurrence and progression is of great interest, especially in skull base lesions where complete tumor resection is often not possible because of the proximity of cranial nerves. To understand the extent of genetic instability and associated chromosomal and gene losses or gains in skull base chordoma, we undertook whole-genome single-nucleotide polymorphism microarray analysis of flash frozen surgical chordoma specimens, 21 from the clivus and 1 from C1 to C2 vertebrae. We confirm the presence of a deletion at 9p involving CDKN2A, CDKN2B, and MTAP but at a much lower rate (22%) than previously reported for sacral chordoma. At a similar frequency (21%), we found aneuploidy of chromosome 3. Tissue microarray immunohistochemistry demonstrated absent or reduced fragile histidine triad (FHIT) protein expression in 98% of sacral chordomas and 67%of skull base chordomas. Our data suggest that chromosome 3 aneuploidy and epigenetic regulation of FHIT contribute to loss of the FHIT tumor suppressor in chordoma. The finding that FHIT is lost in a majority of chordomas provides new insight into chordoma pathogenesis and points to a potential new therapeutic target for this challenging neoplasm. PMID:23019410

  2. High-resolution whole-genome analysis of skull base chordomas implicates FHIT loss in chordoma pathogenesis.

    PubMed

    Diaz, Roberto Jose; Guduk, Mustafa; Romagnuolo, Rocco; Smith, Christian A; Northcott, Paul; Shih, David; Berisha, Fitim; Flanagan, Adrienne; Munoz, David G; Cusimano, Michael D; Pamir, M Necmettin; Rutka, James T

    2012-09-01

    Chordoma is a rare tumor arising in the sacrum, clivus, or vertebrae. It is often not completely resectable and shows a high incidence of recurrence and progression with shortened patient survival and impaired quality of life. Chemotherapeutic options are limited to investigational therapies at present. Therefore, adjuvant therapy for control of tumor recurrence and progression is of great interest, especially in skull base lesions where complete tumor resection is often not possible because of the proximity of cranial nerves. To understand the extent of genetic instability and associated chromosomal and gene losses or gains in skull base chordoma, we undertook whole-genome single-nucleotide polymorphism microarray analysis of flash frozen surgical chordoma specimens, 21 from the clivus and 1 from C1 to C2 vertebrae. We confirm the presence of a deletion at 9p involving CDKN2A, CDKN2B, and MTAP but at a much lower rate (22%) than previously reported for sacral chordoma. At a similar frequency (21%), we found aneuploidy of chromosome 3. Tissue microarray immunohistochemistry demonstrated absent or reduced fragile histidine triad (FHIT) protein expression in 98% of sacral chordomas and 67%of skull base chordomas. Our data suggest that chromosome 3 aneuploidy and epigenetic regulation of FHIT contribute to loss of the FHIT tumor suppressor in chordoma. The finding that FHIT is lost in a majority of chordomas provides new insight into chordoma pathogenesis and points to a potential new therapeutic target for this challenging neoplasm. PMID:23019410

  3. An object model for genome information at all levels of resolution

    SciTech Connect

    Honda, S.; Parrott, N.W.; Smith, R.; Lawrence, C.

    1993-12-31

    An object model for genome data at all levels of resolution is described. The model was derived by considering the requirements for representing genome related objects in three application domains: genome maps, large-scale DNA sequencing, and exploring functional information in gene and protein sequences. The methodology used for the object-oriented analysis is also described.

  4. High resolution analysis

    NASA Technical Reports Server (NTRS)

    Robinove, C. J.

    1982-01-01

    The possibilities for the use of high spectral resolution analysis in the field of hydrology and water resources are examined. Critical gaps in scientific knowledge that must be filled before technology can be evaluated involve the spectral response of water, substances dissolved and suspended in water, and substances floating on water. The most complete mapping of oil slicks can be done in the ultraviolet region. A mean of measuring the ultraviolet reflection at the surface from satellite altitudes needs to be determined. The use of high spectral resolution sensors in a reasonable number of narrow bands may be able to sense the reflectance or emission characteristics of water and its contained materials that can be correlated with commonly used water quality variables. Technological alternative available to experiment with problems of sensing water quality are to use existing remote sensing instrumentation in an empirical mode and to develop instruments for either testing hypoteses or conducting empirical experiments.

  5. Genomic Southern blot analysis.

    PubMed

    Gebbie, Leigh

    2014-01-01

    This chapter describes a detailed protocol for genomic Southern blot analysis which can be used to detect transgene or endogenous gene sequences in cereal genomes. The protocol follows a standard approach that has been shown to generate high-quality results: size fractionation of genomic DNA; capillary transfer to a nylon membrane; hybridization with a digoxigenin-labelled probe; and detection using a chemiluminescent-based system. High sensitivity and limited background are key to successful Southern blots. The critical steps in this protocol are complete digestion of the right quantity of DNA, careful handling of the membrane to avoid unnecessary background, and optimization of probe concentration and temperatures during the hybridization step. Detailed instructions on how to successfully master these techniques are provided. PMID:24243203

  6. High-Resolution Analysis by Whole-Genome Sequencing of an International Lineage (Sequence Type 111) of Pseudomonas aeruginosa Associated with Metallo-Carbapenemases in the United Kingdom

    PubMed Central

    Wright, Laura; Underwood, Anthony; Witney, Adam A.; Chan, Yuen-Ting; Al-Shahib, Ali; Arnold, Catherine; Doumith, Michel; Patel, Bharat; Planche, Timothy D.; Green, Jonathan; Holliman, Richard; Woodford, Neil

    2015-01-01

    Whole-genome sequencing (WGS) was carried out on 87 isolates of sequence type 111 (ST-111) of Pseudomonas aeruginosa collected between 2005 and 2014 from 65 patients and 12 environmental isolates from 24 hospital laboratories across the United Kingdom on an Illumina HiSeq instrument. Most isolates (73) carried VIM-2, but others carried IMP-1 or IMP-13 (5) or NDM-1 (1); one isolate had VIM-2 and IMP-18, and 7 carried no metallo-beta-lactamase (MBL) gene. Single nucleotide polymorphism analysis divided the isolates into distinct clusters; the NDM-1 isolate was an outlier, and the IMP isolates and 6/7 MBL-negative isolates clustered separately from the main set of 73 VIM-2 isolates. Within the VIM-2 set, there were at least 3 distinct clusters, including a tightly clustered set of isolates from 3 hospital laboratories consistent with an outbreak from a single introduction that was quickly brought under control and a much broader set dominated by isolates from a long-running outbreak in a London hospital likely seeded from an environmental source, requiring different control measures; isolates from 7 other hospital laboratories in London and southeast England were also included. Bayesian evolutionary analysis indicated that all the isolates shared a common ancestor dating back ∼50 years (1960s), with the main VIM-2 set separating approximately 20 to 30 years ago. Accessory gene profiling revealed blocks of genes associated with particular clusters, with some having high similarity (≥95%) to bacteriophage genes. WGS of widely found international lineages such as ST-111 provides the necessary resolution to inform epidemiological investigations and intervention policies. PMID:26041902

  7. High Resolution QTL Map Of Body Conformation Traits From Genome-Wide Association Analysis In Contemporary U.S. Holstein Cows

    Technology Transfer Automated Retrieval System (TEKTRAN)

    A QTL map of 1,005 SNP markers affecting 18 body conformation traits (top 100 effects per trait) was constructed based on a genome-wide association analysis of 1,654 contemporary U.S. Holstein cows genotyped with the BovineSNP50 (45,878 SNPs). The top 100 effects for each trait explained 38-56% of t...

  8. High Resolution QTL Map Of Net Merit Component Traits And Calving Traits From Genome-Wide Association Analysis In Contemporary U.S. Holstein Cows

    Technology Transfer Automated Retrieval System (TEKTRAN)

    A QTL map of 725 SNPs affecting 13 dairy traits (top 100 effects per trait) was constructed based on a genome-wide association analysis of 1,654 contemporary U.S. Holstein cows genotyped with 45,878 SNPs. The 13 traits were net merit (NM$), its 8 component traits and 4 calving traits. The top 100 ef...

  9. High resolution comparative genomic hybridisation in clinical cytogenetics

    PubMed Central

    Kirchhoff, M.; Rose, H.; Lundsteen, C.

    2001-01-01

    High resolution comparative genomic hybridisation (HR-CGH) is a diagnostic tool in our clinical cytogenetics laboratory. The present survey reports the results of 253 clinical cases in which 47 abnormalities were detected. Among 144 dysmorphic and mentally retarded subjects with a normal conventional karyotype, 15 (10%) had small deletions or duplications, of which 11 were interstitial. In addition, a case of mosaic trisomy 9 was detected. Among 25 dysmorphic and mentally retarded subjects carrying apparently balanced de novo translocations, four had deletions at translocation breakpoints and two had deletions elsewhere in the genome. Seventeen of 19 complex rearrangements were clarified by HR-CGH. A small supernumerary marker chromosome occurring with low frequency and the breakpoint of a mosaic r(18) case could not be clarified. Three of 19 other abnormalities could not be confirmed by HR-CGH. One was a Williams syndrome deletion and two were DiGeorge syndrome deletions, which were apparently below the resolution of HR-CGH. However, we were able to confirm Angelman and Prader-Willi syndrome deletions, which are about 3-5 Mb. We conclude that HR-CGH should be used for the evaluation of (1) dysmorphic and mentally retarded subjects where normal karyotyping has failed to show abnormalities, (2) dysmorphic and mentally retarded subjects carrying apparently balanced de novo translocations, (3) apparently balanced de novo translocations detected prenatally, and (4) for clarification of complex structural rearrangements.


Keywords: comparative genomic hybridisation; chromosome analysis; chromosome aberrations; dysmorphism PMID:11694545

  10. High-resolution detection of recurrent aberrations in lung adenocarcinomas by array comparative genomic hybridization and expression analysis of selective genes by quantitative PCR.

    PubMed

    Zhu, Hong; Wong, Maria Pik; Tin, Vicky

    2014-06-01

    Genomic abnormalities are the hallmark of cancers and may harbor potential candidate genes important for cancer development and progression. We performed array comparative genomic hybridization (array CGH) on 36 cases of primary lung adenocarcinoma (AD) using an array containing 2621 BAC or PAC clones spanning the genome at an average interval of 1 Mb. Array CGH identified the commonest aberrations consisting of DNA gains within 1p, 1q, 5p, 5q, 7p, 7q, 8q, 11q, 12p, 13q, 16p, 17q, 20q, and losses with 6q, 9p, 10q and 18q. High-level copy gains involved mainly 7p21-p15 and 20q13.3. Dual color fluorescence in situ hybridization (FISH) was performed on a selective locus for validation of array CGH results. Genomic aberrations were compared with different clinicopathological features and a trend of higher number of aberrations in tumors with aggressive phenotypes and current tobacco exposure was identified. According to array CGH data, 23 candidate genes were selected for quantitative PCR (qPCR) analysis. The concordance observed between the genomic and expression changes in most of the genes suggested that they could be candidate cancer-related genes that contributed to the development of lung AD. PMID:24728343

  11. Genome-wide analysis of human global and transcription-coupled excision repair of UV damage at single-nucleotide resolution

    PubMed Central

    Hu, Jinchuan; Adar, Sheera; Selby, Christopher P.

    2015-01-01

    We developed a method for genome-wide mapping of DNA excision repair named XR-seq (excision repair sequencing). Human nucleotide excision repair generates two incisions surrounding the site of damage, creating an ∼30-mer. In XR-seq, this fragment is isolated and subjected to high-throughput sequencing. We used XR-seq to produce stranded, nucleotide-resolution maps of repair of two UV-induced DNA damages in human cells: cyclobutane pyrimidine dimers (CPDs) and (6-4) pyrimidine–pyrimidone photoproducts [(6-4)PPs]. In wild-type cells, CPD repair was highly associated with transcription, specifically with the template strand. Experiments in cells defective in either transcription-coupled excision repair or general excision repair isolated the contribution of each pathway to the overall repair pattern and showed that transcription-coupled repair of both photoproducts occurs exclusively on the template strand. XR-seq maps capture transcription-coupled repair at sites of divergent gene promoters and bidirectional enhancer RNA (eRNA) production at enhancers. XR-seq data also uncovered the repair characteristics and novel sequence preferences of CPDs and (6-4)PPs. XR-seq and the resulting repair maps will facilitate studies of the effects of genomic location, chromatin context, transcription, and replication on DNA repair in human cells. PMID:25934506

  12. Superfine resolution acoustooptic spectrum analysis

    NASA Technical Reports Server (NTRS)

    Ansari, Homayoon; Lesh, James R.

    1991-01-01

    High resolution spectrum analysis of RF signals is required in applications such as the search for extraterrestrial intelligence, RF interference monitoring, or general purpose decomposition of signals. Sub-Hertz resolution in three-dimensional acoustooptic spectrum analysis is theoretically and experimentally demonstrated. The operation of a two-dimensional acoustooptic spectrum analyzer is extended to include time integration over a sequence of CCD frames.

  13. Resolution of M. bovis phylogeny using genome-wide single nucleotide polymorphisms

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Piecemeal analysis of Mycobacterium bovis (MBO) genomes and conventional genotyping methods have not lent to a comprehensive resolution of its genetic diversity to explain the wide range of disease phenotypes caused by this zoonotic pathogen. Conventional genotyping methods like spoligotyping and V...

  14. Resolution analysis by random probing

    NASA Astrophysics Data System (ADS)

    Simutė, S.; Fichtner, A.; van Leeuwen, T.

    2015-12-01

    We develop and apply methods for resolution analysis in tomography, based on stochastic probing of the Hessian or resolution operators. Key properties of our methods are (i) low algorithmic complexity and easy implementation, (ii) applicability to any tomographic technique, including full-waveform inversion and linearized ray tomography, (iii) applicability in any spatial dimension and to inversions with a large number of model parameters, (iv) low computational costs that are mostly a fraction of those required for synthetic recovery tests, and (v) the ability to quantify both spatial resolution and inter-parameter trade-offs. Using synthetic full-waveform inversions as benchmarks, we demonstrate that auto-correlations of random-model applications to the Hessian yield various resolution measures, including direction- and position-dependent resolution lengths, and the strength of inter-parameter mappings. We observe that the required number of random test models is around 5 in one, two and three dimensions. This means that the proposed resolution analyses are not only more meaningful than recovery tests but also computationally less expensive. We demonstrate the applicability of our method in 3D real-data full-waveform inversions for the western Mediterranean and Japan. In addition to tomographic problems, resolution analysis by random probing may be used in other inverse methods that constrain continuously distributed properties, including electromagnetic and potential-field inversions, as well as recently emerging geodynamic data assimilation.

  15. Resolution analysis by random probing

    NASA Astrophysics Data System (ADS)

    Fichtner, Andreas; Leeuwen, Tristan van

    2015-08-01

    We develop and apply methods for resolution analysis in tomography, based on stochastic probing of the Hessian or resolution operators. Key properties of our methods are (i) low algorithmic complexity and easy implementation, (ii) applicability to any tomographic technique, including full-waveform inversion and linearized ray tomography, (iii) applicability in any spatial dimension and to inversions with a large number of model parameters, (iv) low computational costs that are mostly a fraction of those required for synthetic recovery tests, and (v) the ability to quantify both spatial resolution and interparameter trade-offs. Using synthetic full-waveform inversions as benchmarks, we demonstrate that autocorrelations of random-model applications to the Hessian yield various resolution measures, including direction- and position-dependent resolution lengths and the strength of interparameter mappings. We observe that the required number of random test models is around five in one, two, and three dimensions. This means that the proposed resolution analyses are not only more meaningful than recovery tests but also computationally less expensive. We demonstrate the applicability of our method in a 3-D real-data full-waveform inversion for the western Mediterranean. In addition to tomographic problems, resolution analysis by random probing may be used in other inverse methods that constrain continuously distributed properties, including electromagnetic and potential-field inversions, as well as recently emerging geodynamic data assimilation.

  16. Whole-Genome Sequencing for High-Resolution Investigation of Methicillin-Resistant Staphylococcus aureus Epidemiology and Genome Plasticity

    PubMed Central

    SenGupta, Dhruba J.; Cummings, Lisa A.; Hoogestraat, Daniel R.; Butler-Wu, Susan M.; Shendure, Jay; Cookson, Brad T.

    2014-01-01

    Methicillin-resistant Staphylococcus aureus (MRSA) infections pose a major challenge in health care, yet the limited heterogeneity within this group hinders molecular investigations of related outbreaks. Pulsed-field gel electrophoresis (PFGE) has been the gold standard approach but is impractical for many clinical laboratories and is often replaced with PCR-based methods. Regardless, both approaches can prove problematic for identifying subclonal outbreaks. Here, we explore the use of whole-genome sequencing for clinical laboratory investigations of MRSA molecular epidemiology. We examine the relationships of 44 MRSA isolates collected over a period of 3 years by using whole-genome sequencing and two PCR-based methods, multilocus variable-number tandem-repeat analysis (MLVA) and spa typing. We find that MLVA offers higher resolution than spa typing, as it resolved 17 versus 12 discrete isolate groups, respectively. In contrast, whole-genome sequencing reproducibly cataloged genomic variants (131,424 different single nucleotide polymorphisms and indels across the strain collection) that uniquely identified each MRSA clone, recapitulating those groups but enabling higher-resolution phylogenetic inferences of the epidemiological relationships. Importantly, whole-genome sequencing detected a significant number of variants, thereby distinguishing between groups that were considered identical by both spa typing (minimum, 1,124 polymorphisms) and MLVA (minimum, 193 polymorphisms); this suggests that these more conventional approaches can lead to false-positive identification of outbreaks due to inappropriate grouping of genetically distinct strains. An analysis of the distribution of variants across the MRSA genome reveals 47 mutational hot spots (comprising ∼2.5% of the genome) that account for 23.5% of the observed polymorphisms, and the use of this selected data set successfully recapitulates most epidemiological relationships in this pathogen group. PMID:24850346

  17. Understanding and utilizing crop genome diversity via high-resolution genotyping.

    PubMed

    Voss-Fels, Kai; Snowdon, Rod J

    2016-04-01

    High-resolution genome analysis technologies provide an unprecedented level of insight into structural diversity across crop genomes. Low-cost discovery of sequence variation has become accessible for all crops since the development of next-generation DNA sequencing technologies, using diverse methods ranging from genome-scale resequencing or skim sequencing, reduced-representation genotyping-by-sequencing, transcriptome sequencing or sequence capture approaches. High-density, high-throughput genotyping arrays generated using the resulting sequence data are today available for the assessment of genomewide single nucleotide polymorphisms in all major crop species. Besides their application in genetic mapping or genomewide association studies for dissection of complex agronomic traits, high-density genotyping arrays are highly suitable for genomic selection strategies. They also enable description of crop diversity at an unprecedented chromosome-scale resolution. Application of population genetics parameters to genomewide diversity data sets enables dissection of linkage disequilibrium to characterize loci underlying selective sweeps. High-throughput genotyping platforms simultaneously open the way for targeted diversity enrichment, allowing rejuvenation of low-diversity chromosome regions in strongly selected breeding pools to potentially reverse the influence of linkage drag. Numerous recent examples are presented which demonstrate the power of next-generation genomics for high-resolution analysis of crop diversity on a subgenomic and chromosomal scale. Such studies give deep insight into the history of crop evolution and selection, while simultaneously identifying novel diversity to improve yield and heterosis. PMID:27003869

  18. High-Resolution Linkage and Quantitative Trait Locus Mapping Aided by Genome Survey Sequencing: Building Up An Integrative Genomic Framework for a Bivalve Mollusc

    PubMed Central

    Jiao, Wenqian; Fu, Xiaoteng; Dou, Jinzhuang; Li, Hengde; Su, Hailin; Mao, Junxia; Yu, Qian; Zhang, Lingling; Hu, Xiaoli; Huang, Xiaoting; Wang, Yangfan; Wang, Shi; Bao, Zhenmin

    2014-01-01

    Genetic linkage maps are indispensable tools in genetic and genomic studies. Recent development of genotyping-by-sequencing (GBS) methods holds great promise for constructing high-resolution linkage maps in organisms lacking extensive genomic resources. In the present study, linkage mapping was conducted for a bivalve mollusc (Chlamys farreri) using a newly developed GBS method—2b-restriction site-associated DNA (2b-RAD). Genome survey sequencing was performed to generate a preliminary reference genome that was utilized to facilitate linkage and quantitative trait locus (QTL) mapping in C. farreri. A high-resolution linkage map was constructed with a marker density (3806) that has, to our knowledge, never been achieved in any other molluscs. The linkage map covered nearly the whole genome (99.5%) with a resolution of 0.41 cM. QTL mapping and association analysis congruously revealed two growth-related QTLs and one potential sex-determination region. An important candidate QTL gene named PROP1, which functions in the regulation of growth hormone production in vertebrates, was identified from the growth-related QTL region detected on the linkage group LG3. We demonstrate that this linkage map can serve as an important platform for improving genome assembly and unifying multiple genomic resources. Our study, therefore, exemplifies how to build up an integrative genomic framework in a non-model organism. PMID:24107803

  19. Integration of complete chloroplast genome sequences with small amplicon datasets improves phylogenetic resolution in Acacia.

    PubMed

    Williams, Anna V; Miller, Joseph T; Small, Ian; Nevill, Paul G; Boykin, Laura M

    2016-03-01

    Combining whole genome data with previously obtained amplicon sequences has the potential to increase the resolution of phylogenetic analyses, particularly at low taxonomic levels or where recent divergence, rapid speciation or slow genome evolution has resulted in limited sequence variation. However, the integration of these types of data for large scale phylogenetic studies has rarely been investigated. Here we conduct a phylogenetic analysis of the whole chloroplast genome and two nuclear ribosomal loci for 65 Acacia species from across the most recent Acacia phylogeny. We then combine this data with previously generated amplicon sequences (four chloroplast loci and two nuclear ribosomal loci) for 508 Acacia species. We use several phylogenetic methods, including maximum likelihood bootstrapping (with and without constraint) and ExaBayes, in order to determine the success of combining a dataset of 4000bp with one of 189,000bp. The results of our study indicate that the inclusion of whole genome data gave a far better resolved and well supported representation of the phylogenetic relationships within Acacia than using only amplicon sequences, with the greatest support observed when using a whole genome phylogeny as a constraint on the amplicon sequences. Our study therefore provides methods for optimal integration of genomic and amplicon sequences. PMID:26702955

  20. Core Genome Multilocus Sequence Typing Scheme for High-Resolution Typing of Enterococcus faecium

    PubMed Central

    de Been, Mark; Pinholt, Mette; Top, Janetta; Bletz, Stefan; van Schaik, Willem; Brouwer, Ellen; Rogers, Malbert; Kraat, Yvette; Bonten, Marc; Corander, Jukka; Westh, Henrik; Harmsen, Dag

    2015-01-01

    Enterococcus faecium, a common inhabitant of the human gut, has emerged in the last 2 decades as an important multidrug-resistant nosocomial pathogen. Since the start of the 21st century, multilocus sequence typing (MLST) has been used to study the molecular epidemiology of E. faecium. However, due to the use of a small number of genes, the resolution of MLST is limited. Whole-genome sequencing (WGS) now allows for high-resolution tracing of outbreaks, but current WGS-based approaches lack standardization, rendering them less suitable for interlaboratory prospective surveillance. To overcome this limitation, we developed a core genome MLST (cgMLST) scheme for E. faecium. cgMLST transfers genome-wide single nucleotide polymorphism (SNP) diversity into a standardized and portable allele numbering system that is far less computationally intensive than SNP-based analysis of WGS data. The E. faecium cgMLST scheme was built using 40 genome sequences that represented the diversity of the species. The scheme consists of 1,423 cgMLST target genes. To test the performance of the scheme, we performed WGS analysis of 103 outbreak isolates from five different hospitals in the Netherlands, Denmark, and Germany. The cgMLST scheme performed well in distinguishing between epidemiologically related and unrelated isolates, even between those that had the same sequence type (ST), which denotes the higher discriminatory power of this cgMLST scheme over that of conventional MLST. We also show that in terms of resolution, the performance of the E. faecium cgMLST scheme is equivalent to that of an SNP-based approach. In conclusion, the cgMLST scheme developed in this study facilitates rapid, standardized, and high-resolution tracing of E. faecium outbreaks. PMID:26400782

  1. Genomic signatures of chromosomal instability and osteosarcoma progression detected by high resolution array CGH and interphase FISH.

    PubMed

    Selvarajah, S; Yoshimoto, M; Ludkovski, O; Park, P C; Bayani, J; Thorner, P; Maire, G; Squire, J A; Zielenska, M

    2008-01-01

    Osteosarcoma (OS) is characterized by an unstable karyotype which typically has a heterogeneous pattern of complex chromosomal abnormalities. High-resolution array comparative genomic hybridization (CGH) in combination with interphase fluorescence in situ hybridization (FISH) analyses provides a complete description of genomic imbalances together with an evaluation of the contribution of cell-to-cell variation to copy number changes. There have been no analyses to date documenting genomic signatures consistent with chromosomal instability mechanisms in OS tumors using array CGH. In this study, we utilized high-resolution array CGH to identify and characterize recurrent signatures of genomic imbalances using ten OS tumors. Comparison between the genomic profiles identified tumor groups with low, intermediate and high levels of genomic imbalance. Bands 6p22-->p21, 8q24 and 17p12--> p11.2 were consistently involved in high copy gain or amplification events. Since these three locations have been consistently associated with OS oncogenesis, FISH probes from each cytoband were used to derive an index of cellular heterogeneity for copy number within each region. OS with the highest degree of genomic imbalance also exhibited the most extreme cell-to-cell copy number variation. Significantly, the three OS with the most imbalance and genomic copy number heterogeneity also had the poorest response to preoperative chemotherapy. This genome wide analysis is the first utilizing oligonucleotide array CGH in combination with FISH analysis to derive genomic signatures of chromosomal instability in OS tumors by studying genomic imbalance and intercellular heterogeneity. This comprehensive genomic screening approach provides important insights concerning the mechanisms responsible for generating complex genomes. The resulting phenotypic diversity can generate tumors with a propensity for an aggressive disease course. A better understanding of the underlying mechanisms leading to OS

  2. Proteogenomic analysis of Mycobacterium tuberculosis by high resolution mass spectrometry.

    PubMed

    Kelkar, Dhanashree S; Kumar, Dhirendra; Kumar, Praveen; Balakrishnan, Lavanya; Muthusamy, Babylakshmi; Yadav, Amit Kumar; Shrivastava, Priyanka; Marimuthu, Arivusudar; Anand, Sridhar; Sundaram, Hema; Kingsbury, Reena; Harsha, H C; Nair, Bipin; Prasad, T S Keshava; Chauhan, Devendra Singh; Katoch, Kiran; Katoch, Vishwa Mohan; Kumar, Prahlad; Chaerkady, Raghothama; Ramachandran, Srinivasan; Dash, Debasis; Pandey, Akhilesh

    2011-12-01

    The genome sequencing of H37Rv strain of Mycobacterium tuberculosis was completed in 1998 followed by the whole genome sequencing of a clinical isolate, CDC1551 in 2002. Since then, the genomic sequences of a number of other strains have become available making it one of the better studied pathogenic bacterial species at the genomic level. However, annotation of its genome remains challenging because of high GC content and dissimilarity to other model prokaryotes. To this end, we carried out an in-depth proteogenomic analysis of the M. tuberculosis H37Rv strain using Fourier transform mass spectrometry with high resolution at both MS and tandem MS levels. In all, we identified 3176 proteins from Mycobacterium tuberculosis representing ~80% of its total predicted gene count. In addition to protein database search, we carried out a genome database search, which led to identification of ~250 novel peptides. Based on these novel genome search-specific peptides, we discovered 41 novel protein coding genes in the H37Rv genome. Using peptide evidence and alternative gene prediction tools, we also corrected 79 gene models. Finally, mass spectrometric data from N terminus-derived peptides confirmed 727 existing annotations for translational start sites while correcting those for 33 proteins. We report creation of a high confidence set of protein coding regions in Mycobacterium tuberculosis genome obtained by high resolution tandem mass-spectrometry at both precursor and fragment detection steps for the first time. This proteogenomic approach should be generally applicable to other organisms whose genomes have already been sequenced for obtaining a more accurate catalogue of protein-coding genes. PMID:21969609

  3. BPGA- an ultra-fast pan-genome analysis pipeline.

    PubMed

    Chaudhari, Narendrakumar M; Gupta, Vinod Kumar; Dutta, Chitra

    2016-01-01

    Recent advances in ultra-high-throughput sequencing technology and metagenomics have led to a paradigm shift in microbial genomics from few genome comparisons to large-scale pan-genome studies at different scales of phylogenetic resolution. Pan-genome studies provide a framework for estimating the genomic diversity of the dataset, determining core (conserved), accessory (dispensable) and unique (strain-specific) gene pool of a species, tracing horizontal gene-flux across strains and providing insight into species evolution. The existing pan genome software tools suffer from various limitations like limited datasets, difficult installation/requirements, inadequate functional features etc. Here we present an ultra-fast computational pipeline BPGA (Bacterial Pan Genome Analysis tool) with seven functional modules. In addition to the routine pan genome analyses, BPGA introduces a number of novel features for downstream analyses like core/pan/MLST (Multi Locus Sequence Typing) phylogeny, exclusive presence/absence of genes in specific strains, subset analysis, atypical G + C content analysis and KEGG &COG mapping of core, accessory and unique genes. Other notable features include minimum running prerequisites, freedom to select the gene clustering method, ultra-fast execution, user friendly command line interface and high-quality graphics outputs. The performance of BPGA has been evaluated using a dataset of complete genome sequences of 28 Streptococcus pyogenes strains. PMID:27071527

  4. BPGA- an ultra-fast pan-genome analysis pipeline

    PubMed Central

    Chaudhari, Narendrakumar M.; Gupta, Vinod Kumar; Dutta, Chitra

    2016-01-01

    Recent advances in ultra-high-throughput sequencing technology and metagenomics have led to a paradigm shift in microbial genomics from few genome comparisons to large-scale pan-genome studies at different scales of phylogenetic resolution. Pan-genome studies provide a framework for estimating the genomic diversity of the dataset, determining core (conserved), accessory (dispensable) and unique (strain-specific) gene pool of a species, tracing horizontal gene-flux across strains and providing insight into species evolution. The existing pan genome software tools suffer from various limitations like limited datasets, difficult installation/requirements, inadequate functional features etc. Here we present an ultra-fast computational pipeline BPGA (Bacterial Pan Genome Analysis tool) with seven functional modules. In addition to the routine pan genome analyses, BPGA introduces a number of novel features for downstream analyses like core/pan/MLST (Multi Locus Sequence Typing) phylogeny, exclusive presence/absence of genes in specific strains, subset analysis, atypical G + C content analysis and KEGG & COG mapping of core, accessory and unique genes. Other notable features include minimum running prerequisites, freedom to select the gene clustering method, ultra-fast execution, user friendly command line interface and high-quality graphics outputs. The performance of BPGA has been evaluated using a dataset of complete genome sequences of 28 Streptococcus pyogenes strains. PMID:27071527

  5. High-resolution genetic maps of Eucalyptus improve Eucalyptus grandis genome assembly.

    PubMed

    Bartholomé, Jérôme; Mandrou, Eric; Mabiala, André; Jenkins, Jerry; Nabihoudine, Ibouniyamine; Klopp, Christophe; Schmutz, Jeremy; Plomion, Christophe; Gion, Jean-Marc

    2015-06-01

    Genetic maps are key tools in genetic research as they constitute the framework for many applications, such as quantitative trait locus analysis, and support the assembly of genome sequences. The resequencing of the two parents of a cross between Eucalyptus urophylla and Eucalyptus grandis was used to design a single nucleotide polymorphism (SNP) array of 6000 markers evenly distributed along the E. grandis genome. The genotyping of 1025 offspring enabled the construction of two high-resolution genetic maps containing 1832 and 1773 markers with an average marker interval of 0.45 and 0.5 cM for E. grandis and E. urophylla, respectively. The comparison between genetic maps and the reference genome highlighted 85% of collinear regions. A total of 43 noncollinear regions and 13 nonsynthetic regions were detected and corrected in the new genome assembly. This improved version contains 4943 scaffolds totalling 691.3 Mb of which 88.6% were captured by the 11 chromosomes. The mapping data were also used to investigate the effect of population size and number of markers on linkage mapping accuracy. This study provides the most reliable linkage maps for Eucalyptus and version 2.0 of the E. grandis genome. PMID:25385325

  6. A high-resolution cattle CNV map by population-scale genome sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Copy Number Variations (CNVs) are common genomic structural variations that have been linked to human diseases and phenotypic traits. Prior studies in cattle have produced low-resolution CNV maps. We constructed a draft, high-resolution map of cattle CNVs based on whole genome sequencing data from 7...

  7. Utility of array comparative genomic hybridization in cytogenetic analysis.

    PubMed

    Singh, Rashmi R; Cheung, K-John J; Horsman, Douglas E

    2011-01-01

    Conventional comparative genomic hybridization (CGH), high-resolution oligonucleotide, and BAC array CGH have modernized the field of cytogenetics to enable access to unbalanced genomic aberrations such as whole or partial chromosomal gains and losses. The basic principle of array CGH involves hybridizing differentially labeled proband/test (e.g., tumor) and normal reference DNA on an array of oligonucleotide or BAC clones instead of normal metaphases as in conventional CGH. The sub-megabase resolution tiling BAC arrays are extremely useful for the analysis of acquired aberrations in cancer genomes. Array CGH can be extremely useful to identify the chromosomal makeup of marker and ring chromosomes, to define/delineate the precise location/bands involved in structural aberrations and the accurate localization of translocation breakpoints in both simple and complex karyotypes either alone or in combination with standard karyotype analysis. PMID:21431645

  8. ARTIST: High-Resolution Genome-Wide Assessment of Fitness Using Transposon-Insertion Sequencing

    PubMed Central

    Abel, Sören; Davis, Brigid M.; Baranowski, Catherine; Zhang, Yanjia J.; Rubin, Eric J.; Waldor, Matthew K.

    2014-01-01

    Transposon-insertion sequencing (TIS) is a powerful approach for deciphering genetic requirements for bacterial growth in different conditions, as it enables simultaneous genome-wide analysis of the fitness of thousands of mutants. However, current methods for comparative analysis of TIS data do not adjust for stochastic experimental variation between datasets and are limited to interrogation of annotated genomic elements. Here, we present ARTIST, an accessible TIS analysis pipeline for identifying essential regions that are required for growth under optimal conditions as well as conditionally essential loci that participate in survival only under specific conditions. ARTIST uses simulation-based normalization to model and compensate for experimental noise, and thereby enhances the statistical power in conditional TIS analyses. ARTIST also employs a novel adaptation of the hidden Markov model to generate statistically robust, high-resolution, annotation-independent maps of fitness-linked loci across the entire genome. Using ARTIST, we sensitively and comprehensively define Mycobacterium tuberculosis and Vibrio cholerae loci required for host infection while limiting inclusion of false positive loci. ARTIST is applicable to a broad range of organisms and will facilitate TIS-based dissection of pathways required for microbial growth and survival under a multitude of conditions. PMID:25375795

  9. The rise and fall of breakpoint reuse depending on genome resolution

    PubMed Central

    2011-01-01

    Background During evolution, large-scale genome rearrangements of chromosomes shuffle the order of homologous genome sequences ("synteny blocks") across species. Some years ago, a controversy erupted in genome rearrangement studies over whether rearrangements recur, causing breakpoints to be reused. Methods We investigate this controversial issue using the synteny block's for human-mouse-rat reported by Bourque et al. and a series of synteny blocks we generated using Mauve at resolutions ranging from coarse to very fine-scale. We conducted analyses to test how resolution affects the traditional measure of the breakpoint reuse rate. Results We found that the inversion-based breakpoint reuse rate is low at fine-scale synteny block resolution and that it rises and eventually falls as synteny block resolution decreases. By analyzing the cycle structure of the breakpoint graph of human-mouse-rat synteny blocks for human-mouse and comparing with theoretically derived distributions for random genome rearrangements, we showed that the implied genome rearrangements at each level of resolution become more “random” as synteny block resolution diminishes. At highest synteny block resolutions the Hannenhalli-Pevzner inversion distance deviates from the Double Cut and Join distance, possibly due to small-scale transpositions or simply due to inclusion of erroneous synteny blocks. At synteny block resolutions as coarse as the Bourque et al. blocks, we show the breakpoint graph cycle structure has already converged to the pattern expected for a random distribution of synteny blocks. Conclusions The inferred breakpoint reuse rate depends on synteny block resolution in human-mouse genome comparisons. At fine-scale resolution, the cycle structure for the transformation appears less random compared to that for coarse resolution. Small synteny blocks may contain critical information for accurate reconstruction of genome rearrangement history and parameters. PMID:22151330

  10. Analysis of the bread wheat genome using whole-genome shotgun sequencing.

    PubMed

    Brenchley, Rachel; Spannagl, Manuel; Pfeifer, Matthias; Barker, Gary L A; D'Amore, Rosalinda; Allen, Alexandra M; McKenzie, Neil; Kramer, Melissa; Kerhornou, Arnaud; Bolser, Dan; Kay, Suzanne; Waite, Darren; Trick, Martin; Bancroft, Ian; Gu, Yong; Huo, Naxin; Luo, Ming-Cheng; Sehgal, Sunish; Gill, Bikram; Kianian, Sharyar; Anderson, Olin; Kersey, Paul; Dvorak, Jan; McCombie, W Richard; Hall, Anthony; Mayer, Klaus F X; Edwards, Keith J; Bevan, Michael W; Hall, Neil

    2012-11-29

    Bread wheat (Triticum aestivum) is a globally important crop, accounting for 20 per cent of the calories consumed by humans. Major efforts are underway worldwide to increase wheat production by extending genetic diversity and analysing key traits, and genomic resources can accelerate progress. But so far the very large size and polyploid complexity of the bread wheat genome have been substantial barriers to genome analysis. Here we report the sequencing of its large, 17-gigabase-pair, hexaploid genome using 454 pyrosequencing, and comparison of this with the sequences of diploid ancestral and progenitor genomes. We identified between 94,000 and 96,000 genes, and assigned two-thirds to the three component genomes (A, B and D) of hexaploid wheat. High-resolution synteny maps identified many small disruptions to conserved gene order. We show that the hexaploid genome is highly dynamic, with significant loss of gene family members on polyploidization and domestication, and an abundance of gene fragments. Several classes of genes involved in energy harvesting, metabolism and growth are among expanded gene families that could be associated with crop productivity. Our analyses, coupled with the identification of extensive genetic variation, provide a resource for accelerating gene discovery and improving this major crop. PMID:23192148

  11. Toward high-resolution population genomics using archaeological samples.

    PubMed

    Morozova, Irina; Flegontov, Pavel; Mikheyev, Alexander S; Bruskin, Sergey; Asgharian, Hosseinali; Ponomarenko, Petr; Klyuchnikov, Vladimir; ArunKumar, GaneshPrasad; Prokhortchouk, Egor; Gankin, Yuriy; Rogaev, Evgeny; Nikolsky, Yuri; Baranova, Ancha; Elhaik, Eran; Tatarinova, Tatiana V

    2016-08-01

    The term 'ancient DNA' (aDNA) is coming of age, with over 1,200 hits in the PubMed database, beginning in the early 1980s with the studies of 'molecular paleontology'. Rooted in cloning and limited sequencing of DNA from ancient remains during the pre-PCR era, the field has made incredible progress since the introduction of PCR and next-generation sequencing. Over the last decade, aDNA analysis ushered in a new era in genomics and became the method of choice for reconstructing the history of organisms, their biogeography, and migration routes, with applications in evolutionary biology, population genetics, archaeogenetics, paleo-epidemiology, and many other areas. This change was brought by development of new strategies for coping with the challenges in studying aDNA due to damage and fragmentation, scarce samples, significant historical gaps, and limited applicability of population genetics methods. In this review, we describe the state-of-the-art achievements in aDNA studies, with particular focus on human evolution and demographic history. We present the current experimental and theoretical procedures for handling and analysing highly degraded aDNA. We also review the challenges in the rapidly growing field of ancient epigenomics. Advancement of aDNA tools and methods signifies a new era in population genetics and evolutionary medicine research. PMID:27436340

  12. Toward high-resolution population genomics using archaeological samples

    PubMed Central

    Morozova, Irina; Flegontov, Pavel; Mikheyev, Alexander S.; Bruskin, Sergey; Asgharian, Hosseinali; Ponomarenko, Petr; Klyuchnikov, Vladimir; ArunKumar, GaneshPrasad; Prokhortchouk, Egor; Gankin, Yuriy; Rogaev, Evgeny; Nikolsky, Yuri; Baranova, Ancha; Elhaik, Eran; Tatarinova, Tatiana V.

    2016-01-01

    The term ‘ancient DNA’ (aDNA) is coming of age, with over 1,200 hits in the PubMed database, beginning in the early 1980s with the studies of ‘molecular paleontology’. Rooted in cloning and limited sequencing of DNA from ancient remains during the pre-PCR era, the field has made incredible progress since the introduction of PCR and next-generation sequencing. Over the last decade, aDNA analysis ushered in a new era in genomics and became the method of choice for reconstructing the history of organisms, their biogeography, and migration routes, with applications in evolutionary biology, population genetics, archaeogenetics, paleo-epidemiology, and many other areas. This change was brought by development of new strategies for coping with the challenges in studying aDNA due to damage and fragmentation, scarce samples, significant historical gaps, and limited applicability of population genetics methods. In this review, we describe the state-of-the-art achievements in aDNA studies, with particular focus on human evolution and demographic history. We present the current experimental and theoretical procedures for handling and analysing highly degraded aDNA. We also review the challenges in the rapidly growing field of ancient epigenomics. Advancement of aDNA tools and methods signifies a new era in population genetics and evolutionary medicine research. PMID:27436340

  13. High Resolution Genetic Mapping by Genome Sequencing Reveals Genome Duplication and Tetraploid Genetic Structure of the Diploid Miscanthus sinensis

    PubMed Central

    Ma, Xue-Feng; Jensen, Elaine; Alexandrov, Nickolai; Troukhan, Maxim; Zhang, Liping; Thomas-Jones, Sian; Farrar, Kerrie; Clifton-Brown, John; Donnison, Iain; Swaller, Timothy; Flavell, Richard

    2012-01-01

    We have created a high-resolution linkage map of Miscanthus sinensis, using genotyping-by-sequencing (GBS), identifying all 19 linkage groups for the first time. The result is technically significant since Miscanthus has a very large and highly heterozygous genome, but has no or limited genomics information to date. The composite linkage map containing markers from both parental linkage maps is composed of 3,745 SNP markers spanning 2,396 cM on 19 linkage groups with a 0.64 cM average resolution. Comparative genomics analyses of the M. sinensis composite linkage map to the genomes of sorghum, maize, rice, and Brachypodium distachyon indicate that sorghum has the closest syntenic relationship to Miscanthus compared to other species. The comparative results revealed that each pair of the 19 M. sinensis linkages aligned to one sorghum chromosome, except for LG8, which mapped to two sorghum chromosomes (4 and 7), presumably due to a chromosome fusion event after genome duplication. The data also revealed several other chromosome rearrangements relative to sorghum, including two telomere-centromere inversions of the sorghum syntenic chromosome 7 in LG8 of M. sinensis and two paracentric inversions of sorghum syntenic chromosome 4 in LG7 and LG8 of M. sinensis. The results clearly demonstrate, for the first time, that the diploid M. sinensis is tetraploid origin consisting of two sub-genomes. This complete and high resolution composite linkage map will not only serve as a useful resource for novel QTL discoveries, but also enable informed deployment of the wealth of existing genomics resources of other species to the improvement of Miscanthus as a high biomass energy crop. In addition, it has utility as a reference for genome sequence assembly for the forthcoming whole genome sequencing of the Miscanthus genus. PMID:22439001

  14. High resolution genetic mapping by genome sequencing reveals genome duplication and tetraploid genetic structure of the diploid Miscanthus sinensis.

    PubMed

    Ma, Xue-Feng; Jensen, Elaine; Alexandrov, Nickolai; Troukhan, Maxim; Zhang, Liping; Thomas-Jones, Sian; Farrar, Kerrie; Clifton-Brown, John; Donnison, Iain; Swaller, Timothy; Flavell, Richard

    2012-01-01

    We have created a high-resolution linkage map of Miscanthus sinensis, using genotyping-by-sequencing (GBS), identifying all 19 linkage groups for the first time. The result is technically significant since Miscanthus has a very large and highly heterozygous genome, but has no or limited genomics information to date. The composite linkage map containing markers from both parental linkage maps is composed of 3,745 SNP markers spanning 2,396 cM on 19 linkage groups with a 0.64 cM average resolution. Comparative genomics analyses of the M. sinensis composite linkage map to the genomes of sorghum, maize, rice, and Brachypodium distachyon indicate that sorghum has the closest syntenic relationship to Miscanthus compared to other species. The comparative results revealed that each pair of the 19 M. sinensis linkages aligned to one sorghum chromosome, except for LG8, which mapped to two sorghum chromosomes (4 and 7), presumably due to a chromosome fusion event after genome duplication. The data also revealed several other chromosome rearrangements relative to sorghum, including two telomere-centromere inversions of the sorghum syntenic chromosome 7 in LG8 of M. sinensis and two paracentric inversions of sorghum syntenic chromosome 4 in LG7 and LG8 of M. sinensis. The results clearly demonstrate, for the first time, that the diploid M. sinensis is tetraploid origin consisting of two sub-genomes. This complete and high resolution composite linkage map will not only serve as a useful resource for novel QTL discoveries, but also enable informed deployment of the wealth of existing genomics resources of other species to the improvement of Miscanthus as a high biomass energy crop. In addition, it has utility as a reference for genome sequence assembly for the forthcoming whole genome sequencing of the Miscanthus genus. PMID:22439001

  15. High-resolution genomic profiles of breast cancer cell lines assessed by tiling BAC array comparative genomic hybridization.

    PubMed

    Jönsson, Göran; Staaf, Johan; Olsson, Eleonor; Heidenblad, Markus; Vallon-Christersson, Johan; Osoegawa, Kazutoyo; de Jong, Pieter; Oredsson, Stina; Ringnér, Markus; Höglund, Mattias; Borg, Ake

    2007-06-01

    A BAC-array platform for comparative genomic hybridization was constructed from a library of 32,433 clones providing complete genome coverage, and evaluated by screening for DNA copy number changes in 10 breast cancer cell lines (BT474, MCF7, HCC1937, SK-BR-3, L56Br-C1, ZR-75-1, JIMT1, MDA-MB-231, MDA-MB-361, and HCC2218) and one cell line derived from fibrocystic disease of the breast (MCF10A). These were also characterized by gene expression analysis and found to represent all five recently described breast cancer subtypes using the "intrinsic gene set" and centroid correlation. Three cell lines, HCC1937 and L56BrC1 derived from BRCA1 mutation carriers and MDA-MB-231, were of basal-like subtype and characterized by a high frequency of low-level gains and losses of typical pattern, including limited deletions on 5q. Four estrogen receptor positive cell lines were of luminal A subtype and characterized by a different pattern of aberrations and high-level amplifications, including ERBB2 and other 17q amplicons in BT474 and MDA-MB-361. SK-BR-3 cells, characterized by a complex genome including ERBB2 amplification, massive high-level amplifications on 8q and a homozygous deletion of CDH1 at 16q22, had an expression signature closest to luminal B subtype. The effects of gene amplifications were verified by gene expression analysis to distinguish targeted genes from silent amplicon passengers. JIMT1, derived from an ERBB2 amplified trastuzumab resistant tumor, was of the ERBB2 subtype. Homozygous deletions included other known targets such as PTEN (HCC1937) and CDKN2A (MDA-MB-231, MCF10A), but also new candidate suppressor genes such as FUSSEL18 (HCC1937) and WDR11 (L56Br-C1) as well as regions without known genes. The tiling BAC-arrays constitute a powerful tool for high-resolution genomic profiling suitable for cancer research and clinical diagnostics. PMID:17334996

  16. Genomic location analysis by ChIP-Seq

    PubMed Central

    Barski, Artem; Zhao, Keji

    2013-01-01

    The interaction of a multitude of transcription factors and other chromatin proteins with the genome can influence gene expression and subsequently cell differentiation and function. Thus systematic identification of binding targets of transcription factors is key to unraveling gene regulation networks. The recent development of ChIP-Seq has revolutionized mapping of DNA-protein interactions. Now protein binding can be mapped in a truly genome-wide manner with extremely high resolution. This review discusses ChIP-Seq technology, its possible pitfalls, data analysis and several early applications of the ChIP-Seq technology. PMID:19173299

  17. The Cancer Genome Atlas ovarian cancer analysis

    Cancer.gov

    An analysis of genomic changes in ovarian cancer has provided the most comprehensive and integrated view of cancer genes for any cancer type to date. Ovarian serous adenocarcinoma tumors from 500 patients were examined by The Cancer Genome Atlas (TCGA) Re

  18. Genome sequence and analysis of Lactobacillus helveticus

    PubMed Central

    Cremonesi, Paola; Chessa, Stefania; Castiglioni, Bianca

    2013-01-01

    The microbiological characterization of lactobacilli is historically well developed, but the genomic analysis is recent. Because of the widespread use of Lactobacillus helveticus in cheese technology, information concerning the heterogeneity in this species is accumulating rapidly. Recently, the genome of five L. helveticus strains was sequenced to completion and compared with other genomically characterized lactobacilli. The genomic analysis of the first sequenced strain, L. helveticus DPC 4571, isolated from cheese and selected for its characteristics of rapid lysis and high proteolytic activity, has revealed a plethora of genes with industrial potential including those responsible for key metabolic functions such as proteolysis, lipolysis, and cell lysis. These genes and their derived enzymes can facilitate the production of cheese and cheese derivatives with potential for use as ingredients in consumer foods. In addition, L. helveticus has the potential to produce peptides with a biological function, such as angiotensin converting enzyme (ACE) inhibitory activity, in fermented dairy products, demonstrating the therapeutic value of this species. A most intriguing feature of the genome of L. helveticus is the remarkable similarity in gene content with many intestinal lactobacilli. Comparative genomics has allowed the identification of key gene sets that facilitate a variety of lifestyles including adaptation to food matrices or the gastrointestinal tract. As genome sequence and functional genomic information continues to explode, key features of the genomes of L. helveticus strains continue to be discovered, answering many questions but also raising many new ones. PMID:23335916

  19. Whole genome analysis of a Vietnamese trio.

    PubMed

    Hai, Dang Thanh; Thanh, Nguyen Dai; Trang, Pham Thi Minh; Quang, Le Si; Hang, Phan Thi Thu; Cuong, Dang Cao; Phuc, Hoang Kim; Duc, Nguyen Huu; Dong, Do Duc; Minh, Bui Quang; Son, Pham Bao; Vinh, Le Sy

    2015-03-01

    We here present the first whole genome analysis of an anonymous Kinh Vietnamese (KHV) trio whose genomes were deeply sequenced to 30-fold average coverage. The resulting short reads covered 99.91 percent of the human reference genome (GRCh37d5). We identified 4,719,412 SNPs and 827,385 short indels that satisfied the Mendelian inheritance law. Among them, 109,914 (2.3 percent) SNPs and 59,119 (7.1 percent) short indels were novel. We also detected 30,171 structural variants of which 27,604 (91.5 percent) were large indels. There were 6,681 large indels in the range 0.1-100 kbp occurring in the child genome that were also confirmed in either the father or mother genome. We compared these large indels against the DGV database and found that 1,499 (22.44 percent) were KHV specific. De novo assembly of high-quality unmapped reads yielded 789 contigs with the length greater than or equal to 300 bp. There were 235 contigs from the child genome of which 199 (84.7 percent) were significantly matched with at least one contig from the father or mother genome. Blasting these 199 contigs against other alternative human genomes revealed 4 novel contigs. The novel variants identified from our study demonstrated the necessity of conducting more genome-wide studies not only for Kinh but also for other ethnic groups in Vietnam. PMID:25740146

  20. Genome Sequencing and Analysis Conference IV

    SciTech Connect

    Not Available

    1993-12-31

    J. Craig Venter and C. Thomas Caskey co-chaired Genome Sequencing and Analysis Conference IV held at Hilton Head, South Carolina from September 26--30, 1992. Venter opened the conference by noting that approximately 400 researchers from 16 nations were present four times as many participants as at Genome Sequencing Conference I in 1989. Venter also introduced the Data Fair, a new component of the conference allowing exchange and on-site computer analysis of unpublished sequence data.

  1. Pathway and network analysis of cancer genomes.

    PubMed

    2015-07-01

    Genomic information on tumors from 50 cancer types cataloged by the International Cancer Genome Consortium (ICGC) shows that only a few well-studied driver genes are frequently mutated, in contrast to many infrequently mutated genes that may also contribute to tumor biology. Hence there has been large interest in developing pathway and network analysis methods that group genes and illuminate the processes involved. We provide an overview of these analysis techniques and show where they guide mechanistic and translational investigations. PMID:26125594

  2. Pathway and Network Analysis of Cancer Genomes

    PubMed Central

    Haider, Syed; Wu, Guanming; Shibata, Tatsuhiro; Vazquez, Miguel; Mustonen, Ville; Gonzalez-Perez, Abel; Pearson, John; Sander, Chris; Raphael, Benjamin J.; Marks, Debora S.; Ouellette, B.F. Francis; Valencia, Alfonso; Bader, Gary D.; Boutros, Paul C.; Stuart, Joshua M.; Linding, Rune; Lopez-Bigas, Nuria; Stein, Lincoln D.

    2016-01-01

    Genomic information on tumors from 50 cancer types catalogued by The International Cancer Genome Consortium (ICGC) shows that only few well-studied driver genes are frequently mutated, in contrast to many infrequently mutated genes that may also contribute to tumor biology. Hence there has been large interest in developing pathway and network analysis methods that group genes and illuminate the processes involved. We provide an overview of these analysis techniques and show where they guide mechanistic and translational investigations. PMID:26125594

  3. Tracking viral genomes in host cells at single-molecule resolution.

    PubMed

    Wang, I-Hsuan; Suomalainen, Maarit; Andriasyan, Vardan; Kilcher, Samuel; Mercer, Jason; Neef, Anne; Luedtke, Nathan W; Greber, Urs F

    2013-10-16

    Viral DNA trafficking in cells has large impacts on physiology and disease development. Current methods lack the resolution and accuracy to visualize and quantify viral DNA trafficking at single-molecule resolution. We developed a noninvasive protocol for accurate quantification of viral DNA-genome (vDNA) trafficking in single cells. Ethynyl-modified nucleosides were used to metabolically label newly synthesized adenovirus, herpes virus, and vaccinia virus vDNA, without affecting infectivity. Superresolution microscopy and copper(I)-catalyzed azide-alkyne cycloaddition (click) reactions allowed visualization of infection at single vDNA resolution within mammalian cells. Analysis of adenovirus infection revealed that a large pool of capsid-free vDNA accumulated in the cytosol upon virus uncoating, indicating that nuclear import of incoming vDNA is a bottleneck. The method described here is applicable for the entire replication cycle of DNA viruses and offers opportunities to localize cellular and viral effector machineries on newly replicated viral DNA, or innate immune sensors on cytoplasmic viral DNA. PMID:24139403

  4. Proteogenomic Analysis of Mycobacterium smegmatis Using High Resolution Mass Spectrometry

    PubMed Central

    Potgieter, Matthys G.; Nakedi, Kehilwe C.; Ambler, Jon M.; Nel, Andrew J. M.; Garnett, Shaun; Soares, Nelson C.; Mulder, Nicola; Blackburn, Jonathan M.

    2016-01-01

    Biochemical evidence is vital for accurate genome annotation. The integration of experimental data collected at the proteome level using high resolution mass spectrometry allows for improvements in genome annotation by providing evidence for novel gene models, while validating or modifying others. Here, we report the results of a proteogenomic analysis of a reference strain of Mycobacterium smegmatis (mc2155), a fast growing model organism for the pathogenic Mycobacterium tuberculosis—the causative agent for Tuberculosis. By integrating high throughput LC/MS/MS proteomic data with genomic six frame translation and ab initio gene prediction databases, a total of 2887 ORFs were identified, including 2810 ORFs annotated to a Reference protein, and 63 ORFs not previously annotated to a Reference protein. Further, the translational start site (TSS) was validated for 558 Reference proteome gene models, while upstream translational evidence was identified for 81. In addition, N-terminus derived peptide identifications allowed for downstream TSS modification of a further 24 gene models. We validated the existence of six previously described interrupted coding sequences at the peptide level, and provide evidence for four novel frameshift positions. Analysis of peptide posterior error probability (PEP) scores indicates high-confidence novel peptide identifications and shows that the genome of M. smegmatis mc2155 is not yet fully annotated. Data are available via ProteomeXchange with identifier PXD003500. PMID:27092112

  5. High-resolution genomic profiling of chronic lymphocytic leukemia reveals new recurrent genomic alterations.

    PubMed

    Edelmann, Jennifer; Holzmann, Karlheinz; Miller, Florian; Winkler, Dirk; Bühler, Andreas; Zenz, Thorsten; Bullinger, Lars; Kühn, Michael W M; Gerhardinger, Andreas; Bloehdorn, Johannes; Radtke, Ina; Su, Xiaoping; Ma, Jing; Pounds, Stanley; Hallek, Michael; Lichter, Peter; Korbel, Jan; Busch, Raymonde; Mertens, Daniel; Downing, James R; Stilgenbauer, Stephan; Döhner, Hartmut

    2012-12-01

    To identify genomic alterations in chronic lymphocytic leukemia (CLL), we performed single-nucleotide polymorphism-array analysis using Affymetrix Version 6.0 on 353 samples from untreated patients entered in the CLL8 treatment trial. Based on paired-sample analysis (n = 144), a mean of 1.8 copy number alterations per patient were identified; approximately 60% of patients carried no copy number alterations other than those detected by fluorescence in situ hybridization analysis. Copy-neutral loss-of-heterozygosity was detected in 6% of CLL patients and was found most frequently on 13q, 17p, and 11q. Minimally deleted regions were refined on 13q14 (deleted in 61% of patients) to the DLEU1 and DLEU2 genes, on 11q22.3 (27% of patients) to ATM, on 2p16.1-2p15 (gained in 7% of patients) to a 1.9-Mb fragment containing 9 genes, and on 8q24.21 (5% of patients) to a segment 486 kb proximal to the MYC locus. 13q deletions exhibited proximal and distal breakpoint cluster regions. Among the most common novel lesions were deletions at 15q15.1 (4% of patients), with the smallest deletion (70.48 kb) found in the MGA locus. Sequence analysis of MGA in 59 samples revealed a truncating mutation in one CLL patient lacking a 15q deletion. MNT at 17p13.3, which in addition to MGA and MYC encodes for the network of MAX-interacting proteins, was also deleted recurrently. PMID:23047824

  6. Whole-genome sequencing in outbreak analysis.

    PubMed

    Gilchrist, Carol A; Turner, Stephen D; Riley, Margaret F; Petri, William A; Hewlett, Erik L

    2015-07-01

    In addition to the ever-present concern of medical professionals about epidemics of infectious diseases, the relative ease of access and low cost of obtaining, producing, and disseminating pathogenic organisms or biological toxins mean that bioterrorism activity should also be considered when facing a disease outbreak. Utilization of whole-genome sequencing (WGS) in outbreak analysis facilitates the rapid and accurate identification of virulence factors of the pathogen and can be used to identify the path of disease transmission within a population and provide information on the probable source. Molecular tools such as WGS are being refined and advanced at a rapid pace to provide robust and higher-resolution methods for identifying, comparing, and classifying pathogenic organisms. If these methods of pathogen characterization are properly applied, they will enable an improved public health response whether a disease outbreak was initiated by natural events or by accidental or deliberate human activity. The current application of next-generation sequencing (NGS) technology to microbial WGS and microbial forensics is reviewed. PMID:25876885

  7. Whole-Genome Sequencing in Outbreak Analysis

    PubMed Central

    Turner, Stephen D.; Riley, Margaret F.; Petri, William A.; Hewlett, Erik L.

    2015-01-01

    SUMMARY In addition to the ever-present concern of medical professionals about epidemics of infectious diseases, the relative ease of access and low cost of obtaining, producing, and disseminating pathogenic organisms or biological toxins mean that bioterrorism activity should also be considered when facing a disease outbreak. Utilization of whole-genome sequencing (WGS) in outbreak analysis facilitates the rapid and accurate identification of virulence factors of the pathogen and can be used to identify the path of disease transmission within a population and provide information on the probable source. Molecular tools such as WGS are being refined and advanced at a rapid pace to provide robust and higher-resolution methods for identifying, comparing, and classifying pathogenic organisms. If these methods of pathogen characterization are properly applied, they will enable an improved public health response whether a disease outbreak was initiated by natural events or by accidental or deliberate human activity. The current application of next-generation sequencing (NGS) technology to microbial WGS and microbial forensics is reviewed. PMID:25876885

  8. Dating the age of admixture via wavelet transform analysis of genome-wide data.

    PubMed

    Pugach, Irina; Matveyev, Rostislav; Wollstein, Andreas; Kayser, Manfred; Stoneking, Mark

    2011-01-01

    We describe a PCA-based genome scan approach to analyze genome-wide admixture structure, and introduce wavelet transform analysis as a method for estimating the time of admixture. We test the wavelet transform method with simulations and apply it to genome-wide SNP data from eight admixed human populations. The wavelet transform method offers better resolution than existing methods for dating admixture, and can be applied to either SNP or sequence data from humans or other species. PMID:21352535

  9. Dating the age of admixture via wavelet transform analysis of genome-wide data

    PubMed Central

    2011-01-01

    We describe a PCA-based genome scan approach to analyze genome-wide admixture structure, and introduce wavelet transform analysis as a method for estimating the time of admixture. We test the wavelet transform method with simulations and apply it to genome-wide SNP data from eight admixed human populations. The wavelet transform method offers better resolution than existing methods for dating admixture, and can be applied to either SNP or sequence data from humans or other species. PMID:21352535

  10. Comparative analysis of methods for genome-wide nucleosome cartography.

    PubMed

    Quintales, Luis; Vázquez, Enrique; Antequera, Francisco

    2015-07-01

    Nucleosomes contribute to compacting the genome into the nucleus and regulate the physical access of regulatory proteins to DNA either directly or through the epigenetic modifications of the histone tails. Precise mapping of nucleosome positioning across the genome is, therefore, essential to understanding the genome regulation. In recent years, several experimental protocols have been developed for this purpose that include the enzymatic digestion, chemical cleavage or immunoprecipitation of chromatin followed by next-generation sequencing of the resulting DNA fragments. Here, we compare the performance and resolution of these methods from the initial biochemical steps through the alignment of the millions of short-sequence reads to a reference genome to the final computational analysis to generate genome-wide maps of nucleosome occupancy. Because of the lack of a unified protocol to process data sets obtained through the different approaches, we have developed a new computational tool (NUCwave), which facilitates their analysis, comparison and assessment and will enable researchers to choose the most suitable method for any particular purpose. NUCwave is freely available at http://nucleosome.usal.es/nucwave along with a step-by-step protocol for its use. PMID:25296770

  11. A Distance Measure for Genome Phylogenetic Analysis

    NASA Astrophysics Data System (ADS)

    Cao, Minh Duc; Allison, Lloyd; Dix, Trevor

    Phylogenetic analyses of species based on single genes or parts of the genomes are often inconsistent because of factors such as variable rates of evolution and horizontal gene transfer. The availability of more and more sequenced genomes allows phylogeny construction from complete genomes that is less sensitive to such inconsistency. For such long sequences, construction methods like maximum parsimony and maximum likelihood are often not possible due to their intensive computational requirement. Another class of tree construction methods, namely distance-based methods, require a measure of distances between any two genomes. Some measures such as evolutionary edit distance of gene order and gene content are computational expensive or do not perform well when the gene content of the organisms are similar. This study presents an information theoretic measure of genetic distances between genomes based on the biological compression algorithm expert model. We demonstrate that our distance measure can be applied to reconstruct the consensus phylogenetic tree of a number of Plasmodium parasites from their genomes, the statistical bias of which would mislead conventional analysis methods. Our approach is also used to successfully construct a plausible evolutionary tree for the γ-Proteobacteria group whose genomes are known to contain many horizontally transferred genes.

  12. Comparative Genome Analysis in the Integrated Microbial Genomes(IMG) System

    SciTech Connect

    Kyrpides, Nikos C.; Markowitz, Victor M.

    2006-03-01

    Comparative genome analysis is critical for the effectiveexploration of a rapidly growing number of complete and draft sequencesfor microbial genomes. The Integrated Microbial Genomes (IMG) system(img.jgi.doe.gov) has been developed as a community resource thatprovides support for comparative analysis of microbial genomes in anintegrated context. IMG allows users to navigate the multidimensionalmicrobial genome data space and focus their analysis on a subset ofgenes, genomes, and functions of interest. IMG provides graphicalviewers, summaries and occurrence profile tools for comparing genes,pathways and functions (terms) across specific genomes. Genes can befurther examined using gene neighborhoods and compared with sequencealignment tools.

  13. Sequencing and Analysis of Neanderthal Genomic DNA

    PubMed Central

    Noonan, James P.; Coop, Graham; Kudaravalli, Sridhar; Smith, Doug; Krause, Johannes; Alessi, Joe; Chen, Feng; Platt, Darren; Pääbo, Svante; Pritchard, Jonathan K.; Rubin, Edward M.

    2008-01-01

    Our knowledge of Neanderthals is based on a limited number of remains and artifacts from which we must make inferences about their biology, behavior, and relationship to ourselves. Here, we describe the characterization of these extinct hominids from a new perspective, based on the development of a Neanderthal metagenomic library and its high-throughput sequencing and analysis. Several lines of evidence indicate that the 65,250 base pairs of hominid sequence so far identified in the library are of Neanderthal origin, the strongest being the ascertainment of sequence identities between Neanderthal and chimpanzee at sites where the human genomic sequence is different. These results enabled us to calculate the human-Neanderthal divergence time based on multiple randomly distributed autosomal loci. Our analyses suggest that on average the Neanderthal genomic sequence we obtained and the reference human genome sequence share a most recent common ancestor ~706,000 years ago, and that the human and Neanderthal ancestral populations split ~370,000 years ago, before the emergence of anatomically modern humans. Our finding that the Neanderthal and human genomes are at least 99.5% identical led us to develop and successfully implement a targeted method for recovering specific ancient DNA sequences from metagenomic libraries. This initial analysis of the Neanderthal genome advances our understanding of the evolutionary relationship of Homo sapiens and Homo neanderthalensis and signifies the dawn of Neanderthal genomics. PMID:17110569

  14. Genomic signal analysis of pathogen variability

    NASA Astrophysics Data System (ADS)

    Cristea, Paul Dan

    2006-02-01

    The paper presents results in the study of pathogen variability by using genomic signals. The conversion of symbolic nucleotide sequences into digital signals offers the possibility to apply signal processing methods to the analysis of genomic data. The method is particularly well suited to characterize small size genomic sequences, such as those found in viruses and bacteria, being a promising tool in tracking the variability of pathogens, especially in the context of developing drug resistance. The paper is based on data downloaded from GenBank [32], and comprises results on the variability of the eight segments of the influenza type A, subtype H5N1, virus genome, and of the Hemagglutinin (HA) gene, for the H1, H2, H3, H4, H5 and H16 types. Data from human and avian virus isolates are used.

  15. Genomic analysis of Fusarium verticillioides.

    PubMed

    Brown, D W; Butchko, R A E; Proctor, R H

    2008-09-01

    Fusarium verticillioides (teleomorph Gibberella moniliformis) can be either an endophyte of maize, causing no visible disease, or a pathogen-causing disease of ears, stalks, roots and seedlings. At any stage, this fungus can synthesize fumonisins, a family of mycotoxins structurally similar to the sphingolipid sphinganine. Ingestion of fumonisin-contaminated maize has been associated with a number of animal diseases, including cancer in rodents, and exposure has been correlated with human oesophageal cancer in some regions of the world, and some evidence suggests that fumonisins are a risk factor for neural tube defects. A primary goal of the authors' laboratory is to eliminate fumonisin contamination of maize and maize products. Understanding how and why these toxins are made and the F. verticillioides-maize disease process will allow one to develop novel strategies to limit tissue destruction (rot) and fumonisin production. To meet this goal, genomic sequence data, expressed sequence tags (ESTs) and microarrays are being used to identify F. verticillioides genes involved in the biosynthesis of toxins and plant pathogenesis. This paper describes the current status of F. verticillioides genomic resources and three approaches being used to mine microarray data from a wild-type strain cultured in liquid fumonisin production medium for 12, 24, 48, 72, 96 and 120h. Taken together, these approaches demonstrate the power of microarray technology to provide information on different biological processes. PMID:19238625

  16. A high-resolution radiation hybrid map of the bovine genome

    Technology Transfer Automated Retrieval System (TEKTRAN)

    We are building high-resolution radiation hybrid maps of all 29 bovine autosomes and chromosome X, using a 58,000-marker genotyping assay, and a 12,000-rad whole-genome radiation hybrid (RH) panel. To accommodate the large number of markers, and to automate the map building procedure, a software pip...

  17. A high-resolution cattle CNV map by population-scale genome sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Copy Number Variations (CNVs) are common genomic structural variations that have been linked to human diseases and phenotypic traits. CNVs represent an important type of genetic variation among cattle breeds and even individual animals; however, only low-resolution maps of cattle CNVs currently exis...

  18. Genomic analysis of RNA localization

    PubMed Central

    Taliaferro, J Matthew; Wang, Eric T; Burge, Christopher B

    2014-01-01

    The localization of mRNAs to specific subcellular sites is widespread, allowing cells to spatially restrict and regulate protein production, and playing important roles in development and cellular physiology. This process has been studied in mechanistic detail for several RNAs. However, the generality or specificity of RNA localization systems and mechanisms that impact the many thousands of localized mRNAs has been difficult to assess. In this review, we discuss the current state of the field in determining which RNAs localize, which RNA sequences mediate localization, the protein factors involved, and the biological implications of localization. For each question, we examine prominent systems and techniques that are used to study individual messages, highlight recent genome-wide studies of RNA localization, and discuss the potential for adapting other high-throughput approaches to the study of localization. PMID:25483039

  19. Resolution or Analysis Scale: What Matters Most?

    NASA Astrophysics Data System (ADS)

    Miller, Bradley

    2016-04-01

    Identifying the scale at which different covariates best explain the variation of soil properties reflects the geographic strategy of using map generalization (relative size of map delineations) to identify the scale at which phenomena occur. The size of map delineations corresponds to resolution in raster data models. Although not always considered in digital soil mapping studies, resolution is widely recognized as an important factor in identifying covariates in digital spatial analysis. However, many variables that are useful as predictors in digital soil mapping are dependent upon spatial context. For example, the slope gradient at a specific location can only be calculated by considering the surrounding area. In these cases, an analysis neighborhood is used when calculating such variables using a raster data model. The context or area considered is then dependent upon both the resolution and the number of cells (window size) used to define the neighborhood. This presentation explores the difference between resolution and analysis scale, then tests which concept is most important for identifying optimal scales of correlation for digital soil informatics.

  20. High-Resolution Genetic Map for Understanding the Effect of Genome-Wide Recombination Rate on Nucleotide Diversity in Watermelon

    PubMed Central

    Reddy, Umesh K.; Nimmakayala, Padma; Levi, Amnon; Abburi, Venkata Lakshmi; Saminathan, Thangasamy; Tomason, Yan. R.; Vajja, Gopinath; Reddy, Rishi; Abburi, Lavanya; Wehner, Todd C.; Ronin, Yefim; Karol, Abraham

    2014-01-01

    We used genotyping by sequencing to identify a set of 10,480 single nucleotide polymorphism (SNP) markers for constructing a high-resolution genetic map of 1096 cM for watermelon. We assessed the genome-wide variation in recombination rate (GWRR) across the map and found an association between GWRR and genome-wide nucleotide diversity. Collinearity between the map and the genome-wide reference sequence for watermelon was studied to identify inconsistency and chromosome rearrangements. We assessed genome-wide nucleotide diversity, linkage disequilibrium (LD), and selective sweep for wild, semi-wild, and domesticated accessions of Citrullus lanatus var. lanatus to track signals of domestication. Principal component analysis combined with chromosome-wide phylogenetic study based on 1563 SNPs obtained after LD pruning with minor allele frequency of 0.05 resolved the differences between semi-wild and wild accessions as well as relationships among worldwide sweet watermelon. Population structure analysis revealed predominant ancestries for wild, semi-wild, and domesticated watermelons as well as admixture of various ancestries that were important for domestication. Sliding window analysis of Tajima’s D across various chromosomes was used to resolve selective sweep. LD decay was estimated for various chromosomes. We identified a strong selective sweep on chromosome 3 consisting of important genes that might have had a role in sweet watermelon domestication. PMID:25227227

  1. Shape-based alignment of genomic landscapes in multi-scale resolution

    PubMed Central

    Ashida, Hiroki; Asai, Kiyoshi; Hamada, Michiaki

    2012-01-01

    Due to dramatic advances in DNA technology, quantitative measures of annotation data can now be obtained in continuous coordinates across the entire genome, allowing various heterogeneous ‘genomic landscapes’ to emerge. Although much effort has been devoted to comparing DNA sequences, not much attention has been given to comparing these large quantities of data comprehensively. In this article, we introduce a method for rapidly detecting local regions that show high correlations between genomic landscapes. We overcame the size problem for genome-wide data by converting the data into series of symbols and then carrying out sequence alignment. We also decomposed the oscillation of the landscape data into different frequency bands before analysis, since the real genomic landscape is a mixture of embedded and confounded biological processes working at different scales in the cell nucleus. To verify the usefulness and generality of our method, we applied our approach to well investigated landscapes from the human genome, including several histone modifications. Furthermore, by applying our method to over 20 genomic landscapes in human and 12 in mouse, we found that DNA replication timing and the density of Alu insertions are highly correlated genome-wide in both species, even though the Alu elements have amplified independently in the two genomes. To our knowledge, this is the first method to align genomic landscapes at multiple scales according to their shape. PMID:22561376

  2. Whole genome sequence analysis of Mycobacterium suricattae.

    PubMed

    Dippenaar, Anzaan; Parsons, Sven David Charles; Sampson, Samantha Leigh; van der Merwe, Ruben Gerhard; Drewe, Julian Ashley; Abdallah, Abdallah Musa; Siame, Kabengele Keith; Gey van Pittius, Nicolaas Claudius; van Helden, Paul David; Pain, Arnab; Warren, Robin Mark

    2015-12-01

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi. PMID:26542221

  3. Comprehensive prediction of chromosome dimer resolution sites in bacterial genomes

    PubMed Central

    2011-01-01

    Background During the replication process of bacteria with circular chromosomes, an odd number of homologous recombination events results in concatenated dimer chromosomes that cannot be partitioned into daughter cells. However, many bacteria harbor a conserved dimer resolution machinery consisting of one or two tyrosine recombinases, XerC and XerD, and their 28-bp target site, dif. Results To study the evolution of the dif/XerCD system and its relationship with replication termination, we report the comprehensive prediction of dif sequences in silico using a phylogenetic prediction approach based on iterated hidden Markov modeling. Using this method, dif sites were identified in 641 organisms among 16 phyla, with a 97.64% identification rate for single-chromosome strains. The dif sequence positions were shown to be strongly correlated with the GC skew shift-point that is induced by replicational mutation/selection pressures, but the difference in the positions of the predicted dif sites and the GC skew shift-points did not correlate with the degree of replicational mutation/selection pressures. Conclusions The sequence of dif sites is widely conserved among many bacterial phyla, and they can be computationally identified using our method. The lack of correlation between dif position and the degree of GC skew suggests that replication termination does not occur strictly at dif sites. PMID:21223577

  4. Design and bioinformatics analysis of genome-wide CLIP experiments

    PubMed Central

    Wang, Tao; Xiao, Guanghua; Chu, Yongjun; Zhang, Michael Q.; Corey, David R.; Xie, Yang

    2015-01-01

    The past decades have witnessed a surge of discoveries revealing RNA regulation as a central player in cellular processes. RNAs are regulated by RNA-binding proteins (RBPs) at all post-transcriptional stages, including splicing, transportation, stabilization and translation. Defects in the functions of these RBPs underlie a broad spectrum of human pathologies. Systematic identification of RBP functional targets is among the key biomedical research questions and provides a new direction for drug discovery. The advent of cross-linking immunoprecipitation coupled with high-throughput sequencing (genome-wide CLIP) technology has recently enabled the investigation of genome-wide RBP–RNA binding at single base-pair resolution. This technology has evolved through the development of three distinct versions: HITS-CLIP, PAR-CLIP and iCLIP. Meanwhile, numerous bioinformatics pipelines for handling the genome-wide CLIP data have also been developed. In this review, we discuss the genome-wide CLIP technology and focus on bioinformatics analysis. Specifically, we compare the strengths and weaknesses, as well as the scopes, of various bioinformatics tools. To assist readers in choosing optimal procedures for their analysis, we also review experimental design and procedures that affect bioinformatics analyses. PMID:25958398

  5. AGAPE (Automated Genome Analysis PipelinE) for Pan-Genome Analysis of Saccharomyces cerevisiae

    PubMed Central

    Song, Giltae; Dickins, Benjamin J. A.; Demeter, Janos; Engel, Stacia; Dunn, Barbara; Cherry, J. Michael

    2015-01-01

    The characterization and public release of genome sequences from thousands of organisms is expanding the scope for genetic variation studies. However, understanding the phenotypic consequences of genetic variation remains a challenge in eukaryotes due to the complexity of the genotype-phenotype map. One approach to this is the intensive study of model systems for which diverse sources of information can be accumulated and integrated. Saccharomyces cerevisiae is an extensively studied model organism, with well-known protein functions and thoroughly curated phenotype data. To develop and expand the available resources linking genomic variation with function in yeast, we aim to model the pan-genome of S. cerevisiae. To initiate the yeast pan-genome, we newly sequenced or re-sequenced the genomes of 25 strains that are commonly used in the yeast research community using advanced sequencing technology at high quality. We also developed a pipeline for automated pan-genome analysis, which integrates the steps of assembly, annotation, and variation calling. To assign strain-specific functional annotations, we identified genes that were not present in the reference genome. We classified these according to their presence or absence across strains and characterized each group of genes with known functional and phenotypic features. The functional roles of novel genes not found in the reference genome and associated with strains or groups of strains appear to be consistent with anticipated adaptations in specific lineages. As more S. cerevisiae strain genomes are released, our analysis can be used to collate genome data and relate it to lineage-specific patterns of genome evolution. Our new tool set will enhance our understanding of genomic and functional evolution in S. cerevisiae, and will be available to the yeast genetics and molecular biology community. PMID:25781462

  6. Multiplexed chromosome conformation capture sequencing for rapid genome-scale high-resolution detection of long-range chromatin interactions.

    PubMed

    Stadhouders, Ralph; Kolovos, Petros; Brouwer, Rutger; Zuin, Jessica; van den Heuvel, Anita; Kockx, Christel; Palstra, Robert-Jan; Wendt, Kerstin S; Grosveld, Frank; van Ijcken, Wilfred; Soler, Eric

    2013-03-01

    Chromosome conformation capture (3C) technology is a powerful and increasingly popular tool for analyzing the spatial organization of genomes. Several 3C variants have been developed (e.g., 4C, 5C, ChIA-PET, Hi-C), allowing large-scale mapping of long-range genomic interactions. Here we describe multiplexed 3C sequencing (3C-seq), a 4C variant coupled to next-generation sequencing, allowing genome-scale detection of long-range interactions with candidate regions. Compared with several other available techniques, 3C-seq offers a superior resolution (typically single restriction fragment resolution; approximately 1-8 kb on average) and can be applied in a semi-high-throughput fashion. It allows the assessment of long-range interactions of up to 192 genes or regions of interest in parallel by multiplexing library sequencing. This renders multiplexed 3C-seq an inexpensive, quick (total hands-on time of 2 weeks) and efficient method that is ideal for the in-depth analysis of complex genetic loci. The preparation of multiplexed 3C-seq libraries can be performed by any investigator with basic skills in molecular biology techniques. Data analysis requires basic expertise in bioinformatics and in Linux and Python environments. The protocol describes all materials, critical steps and bioinformatics tools required for successful application of 3C-seq technology. PMID:23411633

  7. Microarray analysis at single molecule resolution

    PubMed Central

    Mureşan, Leila; Jacak, Jarosław; Klement, Erich Peter; Hesse, Jan; Schütz, Gerhard J.

    2010-01-01

    Bioanalytical chip-based assays have been enormously improved in sensitivity in the recent years; detection of trace amounts of substances down to the level of individual fluorescent molecules has become state of the art technology. The impact of such detection methods, however, has yet not fully been exploited, mainly due to a lack in appropriate mathematical tools for robust data analysis. One particular example relates to the analysis of microarray data. While classical microarray analysis works at resolutions of two to 20 micrometers and quantifies the abundance of target molecules by determining average pixel intensities, a novel high resolution approach [1] directly visualizes individual bound molecules as diffraction limited peaks. The now possible quantification via counting is less susceptible to labeling artifacts and background noise. We have developed an approach for the analysis of high-resolution microarray images. It consists first of a single molecule detection step, based on undecimated wavelet transforms, and second, of a spot identification step via spatial statistics approach (corresponding to the segmentation step in the classical microarray analysis). The detection method was tested on simulated images with a concentration range of 0.001 to 0.5 molecules per square micron and signal-to-noise ratio (SNR) between 0.9 and 31.6. For SNR above 15 the false negatives relative error was below 15%. Separation of foreground/background proved reliable, in case foreground density exceeds background by a factor of 2. The method has also been applied to real data from high-resolution microarray measurements. PMID:20123580

  8. The Genomic HyperBrowser: an analysis web server for genome-scale data

    PubMed Central

    Sandve, Geir K.; Gundersen, Sveinung; Johansen, Morten; Glad, Ingrid K.; Gunathasan, Krishanthi; Holden, Lars; Holden, Marit; Liestøl, Knut; Nygård, Ståle; Nygaard, Vegard; Paulsen, Jonas; Rydbeck, Halfdan; Trengereid, Kai; Clancy, Trevor; Drabløs, Finn; Ferkingstad, Egil; Kalaš, Matúš; Lien, Tonje; Rye, Morten B.; Frigessi, Arnoldo; Hovig, Eivind

    2013-01-01

    The immense increase in availability of genomic scale datasets, such as those provided by the ENCODE and Roadmap Epigenomics projects, presents unprecedented opportunities for individual researchers to pose novel falsifiable biological questions. With this opportunity, however, researchers are faced with the challenge of how to best analyze and interpret their genome-scale datasets. A powerful way of representing genome-scale data is as feature-specific coordinates relative to reference genome assemblies, i.e. as genomic tracks. The Genomic HyperBrowser (http://hyperbrowser.uio.no) is an open-ended web server for the analysis of genomic track data. Through the provision of several highly customizable components for processing and statistical analysis of genomic tracks, the HyperBrowser opens for a range of genomic investigations, related to, e.g., gene regulation, disease association or epigenetic modifications of the genome. PMID:23632163

  9. Genomic sequence analysis tools: a user's guide.

    PubMed

    Fortna, A; Gardiner, K

    2001-03-01

    The wealth of information from various genome sequencing projects provides the biologist with a new perspective from which to analyze, and design experiments with, mammalian systems. The complexity of the information, however, requires new software tools, and numerous such tools are now available. Which type and which specific system is most effective depends, in part, upon how much sequence is to be analyzed and with what level of experimental support. Here we survey a number of mammalian genomic sequence analysis systems with respect to the data they provide and the ease of their use. The hope is to aid the experimental biologist in choosing the most appropriate tool for their analyses. PMID:11226611

  10. High-Resolution DNA Melting Analysis in Plant Research.

    PubMed

    Simko, Ivan

    2016-06-01

    Genetic and genomic studies provide valuable insight into the inheritance, structure, organization, and function of genes. The knowledge gained from the analysis of plant genes is beneficial to all aspects of plant research, including crop improvement. New methods and tools are continually being developed to facilitate rapid and accurate mapping, sequencing, and analyzing of genes. Here, I review the recent progress in the application of high-resolution melting (HRM) analysis of DNA, a method that allows detecting polymorphism in double-stranded DNA by comparing profiles of melting curves. Use of HRM has expanded considerably in the past few years as the method was successfully applied for high-throughput genotyping, mapping genes, testing food products and seeds, and other areas of plant research. PMID:26827247

  11. Comparative genome analysis of Basidiomycete fungi

    SciTech Connect

    Riley, Robert; Salamov, Asaf; Henrissat, Bernard; Nagy, Laszlo; Brown, Daren; Held, Benjamin; Baker, Scott; Blanchette, Robert; Boussau, Bastien; Doty, Sharon L.; Fagnan, Kirsten; Floudas, Dimitris; Levasseur, Anthony; Manning, Gerard; Martin, Francis; Morin, Emmanuelle; Otillar, Robert; Pisabarro, Antonio; Walton, Jonathan; Wolfe, Ken; Hibbett, David; Grigoriev, Igor

    2013-08-07

    Fungi of the phylum Basidiomycota (basidiomycetes), make up some 37percent of the described fungi, and are important in forestry, agriculture, medicine, and bioenergy. This diverse phylum includes symbionts, pathogens, and saprotrophs including the majority of wood decaying and ectomycorrhizal species. To better understand the genetic diversity of this phylum we compared the genomes of 35 basidiomycetes including 6 newly sequenced genomes. These genomes span extremes of genome size, gene number, and repeat content. Analysis of core genes reveals that some 48percent of basidiomycete proteins are unique to the phylum with nearly half of those (22percent) found in only one organism. Correlations between lifestyle and certain gene families are evident. Phylogenetic patterns of plant biomass-degrading genes in Agaricomycotina suggest a continuum rather than a dichotomy between the white rot and brown rot modes of wood decay. Based on phylogenetically-informed PCA analysis of wood decay genes, we predict that that Botryobasidium botryosum and Jaapia argillacea have properties similar to white rot species, although neither has typical ligninolytic class II fungal peroxidases (PODs). This prediction is supported by growth assays in which both fungi exhibit wood decay with white rot-like characteristics. Based on this, we suggest that the white/brown rot dichotomy may be inadequate to describe the full range of wood decaying fungi. Analysis of the rate of discovery of proteins with no or few homologs suggests the value of continued sequencing of basidiomycete fungi.

  12. Image analysis in comparative genomic hybridization

    SciTech Connect

    Lundsteen, C.; Maahr, J.; Christensen, B.

    1995-01-01

    Comparative genomic hybridization (CGH) is a new technique by which genomic imbalances can be detected by combining in situ suppression hybridization of whole genomic DNA and image analysis. We have developed software for rapid, quantitative CGH image analysis by a modification and extension of the standard software used for routine karyotyping of G-banded metaphase spreads in the Magiscan chromosome analysis system. The DAPI-counterstained metaphase spread is karyotyped interactively. Corrections for image shifts between the DAPI, FITC, and TRITC images are done manually by moving the three images relative to each other. The fluorescence background is subtracted. A mean filter is applied to smooth the FITC and TRITC images before the fluorescence ratio between the individual FITC and TRITC-stained chromosomes is computed pixel by pixel inside the area of the chromosomes determined by the DAPI boundaries. Fluorescence intensity ratio profiles are generated, and peaks and valleys indicating possible gains and losses of test DNA are marked if they exceed ratios below 0.75 and above 1.25. By combining the analysis of several metaphase spreads, consistent findings of gains and losses in all or almost all spreads indicate chromosomal imbalance. Chromosomal imbalances are detected either by visual inspection of fluorescence ratio (FR) profiles or by a statistical approach that compares FR measurements of the individual case with measurements of normal chromosomes. The complete analysis of one metaphase can be carried out in approximately 10 minutes. 8 refs., 7 figs., 1 tab.

  13. Comparative Analysis of Genome Sequences with VISTA

    DOE Data Explorer

    Dubchak, Inna

    VISTA is a comprehensive suite of programs and databases developed by and hosted at the Genomics Division of Lawrence Berkeley National Laboratory. They provide information and tools designed to facilitate comparative analysis of genomic sequences. Users have two ways to interact with the suite of applications at the VISTA portal. They can submit their own sequences and alignments for analysis (VISTA servers) or examine pre-computed whole-genome alignments of different species. A key menu option is the Enhancer Browser and Database at http://enhancer.lbl.gov/. The VISTA Enhancer Browser is a central resource for experimentally validated human noncoding fragments with gene enhancer activity as assessed in transgenic mice. Most of these noncoding elements were selected for testing based on their extreme conservation with other vertebrates. The results of this enhancer screen are provided through this publicly available website. The browser also features relevant results by external contributors and a large collection of additional genome-wide conserved noncoding elements which are candidate enhancer sequences. The LBL developers invite external groups to submit computational predictions of developmental enhancers. As of 10/19/2009 the database contains information on 1109 in vivo tested elements - 508 elements with enhancer activity.

  14. Multi-resolution analysis for ENO schemes

    NASA Technical Reports Server (NTRS)

    Harten, Ami

    1991-01-01

    Given an function, u(x), which is represented by its cell-averages in cells which are formed by some unstructured grid, we show how to decompose the function into various scales of variation. This is done by considering a set of nested grids in which the given grid is the finest, and identifying in each locality the coarsest grid in the set from which u(x) can be recovered to a prescribed accuracy. This multi-resolution analysis was applied to essentially non-oscillatory (ENO) schemes in order to advance the solution by one time-step. This is accomplished by decomposing the numerical solution at the beginning of each time-step into levels of resolution, and performing the computation in each locality at the appropriate coarser grid. An efficient algorithm for implementing this program in the 1-D case is presented; this algorithm can be extended to the multi-dimensional case with Cartesian grids.

  15. Physiological genomics analysis for Alzheimer's disease.

    PubMed

    Wiwanitkit, Viroj

    2013-01-01

    Alzheimer's disease is a common kind of dementia. This disorder can be detected in all countries around the world. This neurological disorder affects millions of population and becomes an important concern in modern neurology. There are many researches on the pathogenesis of Alzheimer's disease. Although it has been determined for a long time, there is no clear-cut that this is a case with genetic disorder or not. A physiological genomics is a new application that is useful for track function to genes within the human genome and can be applied for answering the problem of underlying pathobiology of complex diseases. The physiogenomics can be helpful for study of systemic approach on the pathophysiology, and genomics might provide useful information to better understand the pathogenesis of Alzheimer's disease. The present advent in genomics technique makes it possible to trace for the underlying genomics of disease. In this work, physiological genomics analysis for Alzheimer's disease was performed. The standard published technique is used for assessment. According to this work, there are 20 identified physiogenomics relationship on several chromosomes. Considering the results, the HADH2 gene on chromosome X, APBA1 gene on chromosome 9, AGER gene on chromosome 6, GSK3B gene on chromosome 3, CDKHR1 gene on chromosome 17, APPBP1 gene on chromosome 16, APBA2 gene on chromosome 15, GAL gene on chromosome 11, and APLP2 gene on chromosome 11 have the highest physiogenomics score (9.26) while the CASP3 gene on chromosome 4 and the SNCA gene on chromosome 4 have the lowest physiogenomics score (7.44). The results from this study confirm that Alzheimer's disease has a polygenomic origin. PMID:23661967

  16. High-resolution molecular genomic autopsy reveals complex sudden unexpected death in epilepsy risk profile.

    PubMed

    Klassen, Tara L; Bomben, Valerie C; Patel, Ankita; Drabek, Janice; Chen, Tim T; Gu, Wenli; Zhang, Feng; Chapman, Kevin; Lupski, James R; Noebels, Jeffrey L; Goldman, A M

    2014-02-01

    Advanced variant detection in genes underlying risk of sudden unexpected death in epilepsy (SUDEP) can uncover extensive epistatic complexity and improve diagnostic accuracy of epilepsy-related mortality. However, the sensitivity and clinical utility of diagnostic panels based solely on established cardiac arrhythmia genes in the molecular autopsy of SUDEP is unknown. We applied the established clinical diagnostic panels, followed by sequencing and a high density copy number variant (CNV) detection array of an additional 253 related ion channel subunit genes to analyze the overall genomic variation in a SUDEP of the 3-year-old proband with severe myoclonic epilepsy of infancy (SMEI). We uncovered complex combinations of single nucleotide polymorphisms and CNVs in genes expressed in both neurocardiac and respiratory control pathways, including SCN1A, KCNA1, RYR3, and HTR2C. Our findings demonstrate the importance of comprehensive high-resolution variant analysis in the assessment of personally relevant SUDEP risk. In this case, the combination of de novo single nucleotide polymorphisms (SNPs) and CNVs in the SCN1A and KCNA1 genes, respectively, is suspected to be the principal risk factor for both epilepsy and premature death. However, consideration of the overall biologically relevant variant complexity with its extensive functional epistatic interactions reveals potential personal risk more accurately. PMID:24372310

  17. MethylC-seq library preparation for base-resolution whole-genome bisulfite sequencing

    PubMed Central

    Urich, Mark A; Nery, Joseph R; Lister, Ryan; Schmitz, Robert J; Ecker, Joseph R

    2015-01-01

    Current high-throughput DNA sequencing technologies enable acquisition of billions of data points through which myriad biological processes can be interrogated, including genetic variation, chromatin structure, gene expression patterns, small RNAs and protein–DNA interactions. Here we describe the MethylC-sequencing (MethylC-seq) library preparation method, a 2-d protocol that enables the genome-wide identification of cytosine DNA methylation states at single-base resolution. The technique involves fragmentation of genomic DNA followed by adapter ligation, bisulfite conversion and limited amplification using adapter-specific PCR primers in preparation for sequencing. To date, this protocol has been successfully applied to genomic DNA isolated from primary cell culture, sorted cells and fresh tissue from over a thousand plant and animal samples. PMID:25692984

  18. Genomic signal analysis of Mycobacterium tuberculosis

    NASA Astrophysics Data System (ADS)

    Cristea, Paul Dan; Banica, Dorina; Tuduce, Rodica

    2007-02-01

    As previously shown the conversion of nucleotide sequences into digital signals offers the possibility to apply signal processing methods for the analysis of genomic data. Genomic Signal Analysis (GSA) has been used to analyze large scale features of DNA sequences, at the scale of whole chromosomes, including both coding and non-coding regions. The striking regularities of genomic signals reveal restrictions in the way nucleotides and pairs of nucleotides are distributed along nucleotide sequences. Structurally, a chromosome appears to be less of a "plain text", corresponding to certain semantic and grammar rules, but more of a "poem", satisfying additional symmetry restrictions that evoke the "rhythm" and "rhyme". Recurrent patterns in nucleotide sequences are reflected in simple mathematical regularities observed in genomic signals. GSA has also been used to track pathogen variability, especially concerning their resistance to drugs. Previous work has been dedicated to the study of HIV-1, Clade F and Avian Flu. The present paper applies GSA methodology to study Mycobacterium tuberculosis (MT) rpoB gene variability, relevant to its resistance to antibiotics. Isolates from 50 Romanian patients have been studied both by rapid LightCycler PCR and by sequencing of a segment of 190-250 nucleotides covering the region of interest. The variability is caused by SNPs occurring at specific sites along the gene strand, as well as by inclusions. Because of the mentioned symmetry restrictions, the GS variations tend to compensate. An important result is that MT can act as a vector for HIV virus, which is able to retrotranscribe its specific genes both into human and MT genomes.

  19. [Computational genome analysis of three marine algoviruses].

    PubMed

    Stepanova, O A; Boĭko, A L; Shcherbatenko, I S

    2013-01-01

    Computational analysis of genomic sequences of three new marine algoviruses: Tetraselmis viridis virus (TvV-S20 and TvV-SI1 strains) and Dunaliella viridis virus (DvV-SI2 strain) was conducted. Both considerable similarity and essential distinctions between studied strains and the most studied marine algoviruses of Phycodnaviridae family were revealed. Our data show that the tested strains are new viruses with the following features: only they were isolated from marine eukaryotic microalgae T. viridis and D. viridis, coding sequences (CDSs) of their genomes are localized mainly on one of the DNA strands and form several clusters with short intergenic spaces; there are considerable variations in genome structure within viruses and their strains; viral genomic DNA has a high GC-content (55.5 - 67.4%); their genes contain no well-known optimal contexts of translation start codones, and the contexts of terminal codons read-through; the vast majority of viral genes and proteins do not have any matches in gene banks. PMID:24479317

  20. Analysis of the allohexaploid bread wheat genome (Triticum aestivum) using comparative whole genome shotgun sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The large 17 Gb allopolyploid genome of bread wheat is a major challenge for genome analysis because it is composed of three closely- related and independently maintained genomes, with genes dispersed as small “islands” separated by vast tracts of repetitive DNA. We used a novel comparative genomi...

  1. PGSB/MIPS Plant Genome Information Resources and Concepts for the Analysis of Complex Grass Genomes.

    PubMed

    Spannagl, Manuel; Bader, Kai; Pfeifer, Matthias; Nussbaumer, Thomas; Mayer, Klaus F X

    2016-01-01

    PGSB (Plant Genome and Systems Biology; formerly MIPS-Munich Institute for Protein Sequences) has been involved in developing, implementing and maintaining plant genome databases for more than a decade. Genome databases and analysis resources have focused on individual genomes and aim to provide flexible and maintainable datasets for model plant genomes as a backbone against which experimental data, e.g., from high-throughput functional genomics, can be organized and analyzed. In addition, genomes from both model and crop plants form a scaffold for comparative genomics, assisted by specialized tools such as the CrowsNest viewer to explore conserved gene order (synteny) between related species on macro- and micro-levels.The genomes of many economically important Triticeae plants such as wheat, barley, and rye present a great challenge for sequence assembly and bioinformatic analysis due to their enormous complexity and large genome size. Novel concepts and strategies have been developed to deal with these difficulties and have been applied to the genomes of wheat, barley, rye, and other cereals. This includes the GenomeZipper concept, reference-guided exome assembly, and "chromosome genomics" based on flow cytometry sorted chromosomes. PMID:26519405

  2. GATB: Genome Assembly & Analysis Tool Box

    PubMed Central

    Drezen, Erwan; Rizk, Guillaume; Chikhi, Rayan; Deltel, Charles; Lemaitre, Claire; Peterlongo, Pierre; Lavenier, Dominique

    2014-01-01

    Motivation: Efficient and fast next-generation sequencing (NGS) algorithms are essential to analyze the terabytes of data generated by the NGS machines. A serious bottleneck can be the design of such algorithms, as they require sophisticated data structures and advanced hardware implementation. Results: We propose an open-source library dedicated to genome assembly and analysis to fasten the process of developing efficient software. The library is based on a recent optimized de-Bruijn graph implementation allowing complex genomes to be processed on desktop computers using fast algorithms with low memory footprints. Availability and implementation: The GATB library is written in C++ and is available at the following Web site http://gatb.inria.fr under the A-GPL license. Contact: lavenier@irisa.fr Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24990603

  3. Multi-resolution analysis for ENO schemes

    NASA Technical Reports Server (NTRS)

    Harten, Ami

    1993-01-01

    Given a function u(x) which is represented by its cell-averages in cells which are formed by some unstructured grid, we show how to decompose the function into various scales of variation. This is done by considering a set of nested grids in which the given grid is the finest, and identifying in each locality the coarsest grid in the set from which u(x) can be recovered to a prescribed accuracy. We apply this multi-resolution analysis to Essentially Non-oscillatory Schemes (ENO) schemes in order to reduce the number of numerical flux computations which is needed in order to advance the solution by one time-step. This is accomplished by decomposing the numerical solution at the beginning of each time-step into levels of resolution, and performing the computation in each locality at the appropriate coarser grid. We present an efficient algorithm for implementing this program in the one-dimensional case; this algorithm can be extended to the multi-dimensional case with cartesian grids.

  4. High resolution analysis of satellite gradiometry

    NASA Technical Reports Server (NTRS)

    Colombo, O. L.

    1989-01-01

    Satellite gravity gradiometry is a technique now under development which, by the middle of the next decade, may be used for the high resolution charting from space of the gravity field of the earth and, afterwards, of other planets. Some data analysis schemes are reviewed for getting detailed gravity maps from gradiometry on both a global and a local basis. It also presents estimates of the likely accuracies of such maps, in terms of normalized spherical harmonics expansions, both using gradiometry alone and in combination with data from a Global Positioning System (GPS) receiver carried on the same spacecraft. It compares these accuracies with those of current and future maps obtained from other data (conventional tracking, satellite-satellite tracking, etc.), and also with the spectra of various signals of geophysical interest.

  5. A High Resolution Genetic Map Anchoring Scaffolds of the Sequenced Watermelon Genome

    PubMed Central

    Kou, Qinghe; Jiang, Jiao; Guo, Shaogui; Zhang, Haiying; Hou, Wenju; Zou, Xiaohua; Sun, Honghe; Gong, Guoyi; Levi, Amnon; Xu, Yong

    2012-01-01

    As part of our ongoing efforts to sequence and map the watermelon (Citrullus spp.) genome, we have constructed a high density genetic linkage map. The map positioned 234 watermelon genome sequence scaffolds (an average size of 1.41 Mb) that cover about 330 Mb and account for 93.5% of the 353 Mb of the assembled genomic sequences of the elite Chinese watermelon line 97103 (Citrullus lanatus var. lanatus). The genetic map was constructed using an F8 population of 103 recombinant inbred lines (RILs). The RILs are derived from a cross between the line 97103 and the United States Plant Introduction (PI) 296341-FR (C. lanatus var. citroides) that contains resistance to fusarium wilt (races 0, 1, and 2). The genetic map consists of eleven linkage groups that include 698 simple sequence repeat (SSR), 219 insertion-deletion (InDel) and 36 structure variation (SV) markers and spans ∼800 cM with a mean marker interval of 0.8 cM. Using fluorescent in situ hybridization (FISH) with 11 BACs that produced chromosome-specifc signals, we have depicted watermelon chromosomes that correspond to the eleven linkage groups constructed in this study. The high resolution genetic map developed here should be a useful platform for the assembly of the watermelon genome, for the development of sequence-based markers used in breeding programs, and for the identification of genes associated with important agricultural traits. PMID:22247776

  6. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping.

    PubMed

    Rao, Suhas S P; Huntley, Miriam H; Durand, Neva C; Stamenova, Elena K; Bochkov, Ivan D; Robinson, James T; Sanborn, Adrian L; Machol, Ido; Omer, Arina D; Lander, Eric S; Aiden, Erez Lieberman

    2014-12-18

    We use in situ Hi-C to probe the 3D architecture of genomes, constructing haploid and diploid maps of nine cell types. The densest, in human lymphoblastoid cells, contains 4.9 billion contacts, achieving 1 kb resolution. We find that genomes are partitioned into contact domains (median length, 185 kb), which are associated with distinct patterns of histone marks and segregate into six subcompartments. We identify ∼10,000 loops. These loops frequently link promoters and enhancers, correlate with gene activation, and show conservation across cell types and species. Loop anchors typically occur at domain boundaries and bind CTCF. CTCF sites at loop anchors occur predominantly (>90%) in a convergent orientation, with the asymmetric motifs "facing" one another. The inactive X chromosome splits into two massive domains and contains large loops anchored at CTCF-binding repeats. PMID:25497547

  7. Initial sequencing and comparative analysis of the mouse genome

    SciTech Connect

    Waterston, Robert H.; Lindblad-Toh, Kerstin; Birney, Ewan; Rogers, Jane; Abril, Josep F.; Agarwal, Pankaj; Agarwala, Richa; Ainscough, Rachel; Alexandersson, Marina; An, Peter; Antonarakis, Stylianos E.; Attwood, John; Baertsch, Robert; Bailey, Jonathon; Barlow, Karen; Beck, Stephan; Berry, Eric; Birren, Bruce; Bloom, Toby; Bork, Peer; Botcherby, Marc; Bray, Nicolas; Brent, Michael R.; Brown, Daniel G.; Brown, Stephen D.; Bult, Carol; Burton, John; Butler, Jonathan; Campbell, Robert D.; Carninci, Piero; Cawley, Simon; Chiaromonte, Francesca; Chinwalla, Asif T.; Church, Deanna M.; Clamp, Michele; Clee, Christopher; Collins, Francis S.; Cook, Lisa L.; Copley, Richard R.; Coulson, Alan; Couronne, Olivier; Cuff, James; Curwen, Val; Cutts, Tim; Daly, Mark; David, Robert; Davies, Joy; Delehaunty, Kimberly D.; Deri, Justin; Dermitzakis, Emmanouil T.; Dewey, Colin; Dickens, Nicholas J.; Diekhans, Mark; Dodge, Sheila; Dubchak, Inna; Dunn, Diane M.; Eddy, Sean R.; Elnitski, Laura; Emes, Richard D.; Eswara, Pallavi; Eyras, Eduardo; Felsenfeld, Adam; Fewell, Ginger A.; Flicek, Paul; Foley, Karen; Frankel, Wayne N.; Fulton, Lucinda A.; Fulton, Robert S.; Furey, Terrence S.; Gage, Diane; Gibbs, Richard A.; Glusman, Gustavo; Gnerre, Sante; Goldman, Nick; Goodstadt, Leo; Grafham, Darren; Graves, Tina A.; Green, Eric D.; Gregory, Simon; Guigo, Roderic; Guyer, Mark; Hardison, Ross C.; Haussler, David; Hayashizaki, Yoshihide; Hillier, LaDeana W.; Hinrichs, Angela; Hlavina, Wratko; Holzer, Timothy; Hsu, Fan; Hua, Axin; Hubbard, Tim; Hunt, Adrienne; Jackson, Ian; Jaffe, David B.; Johnson, L. Steven; Jones, Matthew; Jones, Thomas A.; Joy, Ann; Kamal, Michael; Karlsson, Elinor K.; Karolchik, Donna; Kasprzyk, Arkadiusz; Kawai, Jun; Keibler, Evan; Kells, Cristyn; Kent, W. James; Kirby, Andrew; Kolbe, Diana L.; Korf, Ian; Kucherlapati, Raju S.; Kulbokas III, Edward J.; Kulp, David; Landers, Tom; Leger, J.P.; Leonard, Steven; Letunic, Ivica; Levine, Rosie; et al.

    2002-12-15

    The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.

  8. GRETINA commissioning and engineering run resolution analysis

    NASA Astrophysics Data System (ADS)

    Tarlow, Thomas; Beausang, Con; Ross, Tim; Hughes, Richard; Gell, Kristen; Good, Erin

    2012-10-01

    GRETINA, the first stage in the full Gamma Ray Energy Tracking Array (GRETA), consists of seven modules covering approximately 1 solid angle. Each module is made up of four large, highly-segmented germanium detectors capable of measuring the interaction points of individual gamma-rays. GRETINA has recently been assembled and commissioned in LBNL via a series of engineering and commissioning runs. Here we report on an analysis of data from the first engineering run (ER01) which was intended to probe the response of the data acquisition system to high multiplicity gamma-ray cascades. For this experiment the 122Sn(40Ar, 4n) reaction at a beam energy of 210 MeV was utilized to populate high spin states in 158Er. A variety of beam currents, targets and trigger conditions were utilized to test the acquisition. Here we report on the measured energy resolution, both with calibration and in-beam sources as well as a gamma-gamma coincidence analysis to confirm the known level scheme and the capability of the data acquisition system for high fold coincidence measurements. This work was partly supported by the US Department of Energy via grant numbers DE-FG52-09NA29454 and DE-FG02-05-ER41379.

  9. TBV-361 RESOLUTION ANALYSIS: EMPLACEMENT DRIFT ORIENTATION

    SciTech Connect

    M. Lin; D.C. Kicker; M.D. Sellers

    1999-07-17

    The purpose of this To Be Verified/To Be Determined (TBX) resolution analysis is to release ''To Be Verified'' (TBV)-361 related to the emplacement drift orientation. The system design criterion in ''Subsurface Facility System Description Document'' (CRWMS M&O 1998a, p.9) specifies that the emplacement drift orientation relative to the dominant joint orientations should be at least 30 degrees. The specific objectives for this analysis include the following: (1) Collect and evaluate key block data developed for the repository host horizon rock mass. (2) Assess the dominant joint orientations based on available fracture data. (3) Document the maximum block size as a function of drift orientation. (4) Assess the applicability of the drift orientation/joint orientation offset criterion in the ''Subsurface Facility System Description Document'' (CRWMS M&O 1998a, p.9). (5) Consider the effects of seepage on drift orientation. (6) Verify that the viability assessment (VA) drift orientation complies with the drift orientation/joint orientation offset criterion, or provide justifications and make recommendations for modifying the VA emplacement drift layout. In addition to providing direct support to the System Description Document (SDD), the release of TBV-361 will provide support to the Repository Subsurface Design Department. The results from this activity may also provide data and information needs to support the MGR Requirements Department, the MGR Safety Assurance Department, and the Performance Assessment Organization.

  10. Comparative Genomic Analysis of Meningitis- and Bacteremia-Causing Pneumococci Identifies a Common Core Genome.

    PubMed

    Kulohoma, Benard W; Cornick, Jennifer E; Chaguza, Chrispin; Yalcin, Feyruz; Harris, Simon R; Gray, Katherine J; Kiran, Anmol M; Molyneux, Elizabeth; French, Neil; Parkhill, Julian; Faragher, Brian E; Everett, Dean B; Bentley, Stephen D; Heyderman, Robert S

    2015-10-01

    Streptococcus pneumoniae is a nasopharyngeal commensal that occasionally invades normally sterile sites to cause bloodstream infection and meningitis. Although the pneumococcal population structure and evolutionary genetics are well defined, it is not clear whether pneumococci that cause meningitis are genetically distinct from those that do not. Here, we used whole-genome sequencing of 140 isolates of S. pneumoniae recovered from bloodstream infection (n = 70) and meningitis (n = 70) to compare their genetic contents. By fitting a double-exponential decaying-function model, we show that these isolates share a core of 1,427 genes (95% confidence interval [CI], 1,425 to 1,435 genes) and that there is no difference in the core genome or accessory gene content from these disease manifestations. Gene presence/absence alone therefore does not explain the virulence behavior of pneumococci that reach the meninges. Our analysis, however, supports the requirement of a range of previously described virulence factors and vaccine candidates for both meningitis- and bacteremia-causing pneumococci. This high-resolution view suggests that, despite considerable competency for genetic exchange, all pneumococci are under considerable pressure to retain key components advantageous for colonization and transmission and that these components are essential for access to and survival in sterile sites. PMID:26259813

  11. Genome-wide analysis correlates Ayurveda Prakriti

    PubMed Central

    Govindaraj, Periyasamy; Nizamuddin, Sheikh; Sharath, Anugula; Jyothi, Vuskamalla; Rotti, Harish; Raval, Ritu; Nayak, Jayakrishna; Bhat, Balakrishna K.; Prasanna, B. V.; Shintre, Pooja; Sule, Mayura; Joshi, Kalpana S.; Dedge, Amrish P.; Bharadwaj, Ramachandra; Gangadharan, G. G.; Nair, Sreekumaran; Gopinath, Puthiya M.; Patwardhan, Bhushan; Kondaiah, Paturu; Satyamoorthy, Kapaettu; Valiathan, Marthanda Varma Sankaran; Thangaraj, Kumarasamy

    2015-01-01

    The practice of Ayurveda, the traditional medicine of India, is based on the concept of three major constitutional types (Vata, Pitta and Kapha) defined as “Prakriti”. To the best of our knowledge, no study has convincingly correlated genomic variations with the classification of Prakriti. In the present study, we performed genome-wide SNP (single nucleotide polymorphism) analysis (Affymetrix, 6.0) of 262 well-classified male individuals (after screening 3416 subjects) belonging to three Prakritis. We found 52 SNPs (p ≤ 1 × 10−5) were significantly different between Prakritis, without any confounding effect of stratification, after 106 permutations. Principal component analysis (PCA) of these SNPs classified 262 individuals into their respective groups (Vata, Pitta and Kapha) irrespective of their ancestry, which represent its power in categorization. We further validated our finding with 297 Indian population samples with known ancestry. Subsequently, we found that PGM1 correlates with phenotype of Pitta as described in the ancient text of Caraka Samhita, suggesting that the phenotypic classification of India’s traditional medicine has a genetic basis; and its Prakriti-based practice in vogue for many centuries resonates with personalized medicine. PMID:26511157

  12. Rif1 Is Required for Resolution of Ultrafine DNA Bridges in Anaphase to Ensure Genomic Stability.

    PubMed

    Hengeveld, Rutger C C; de Boer, H Rudolf; Schoonen, Pepijn M; de Vries, Elisabeth G E; Lens, Susanne M A; van Vugt, Marcel A T M

    2015-08-24

    Sister-chromatid disjunction in anaphase requires the resolution of DNA catenanes by topoisomerase II together with Plk1-interacting checkpoint helicase (PICH) and Bloom's helicase (BLM). We here identify Rif1 as a factor involved in the resolution of DNA catenanes that are visible as ultrafine DNA bridges (UFBs) in anaphase to which PICH and BLM localize. Rif1, which during interphase functions downstream of 53BP1 in DNA repair, is recruited to UFBs in a PICH-dependent fashion, but independently of 53BP1 or BLM. Similar to PICH and BLM, Rif1 promotes the resolution of UFBs: its depletion increases the frequency of nucleoplasmic bridges and RPA70-positive UFBs in late anaphase. Moreover, in the absence of Rif1, PICH, or BLM, more nuclear bodies with damaged DNA arise in ensuing G1 cells, when chromosome decatenation is impaired. Our data reveal a thus far unrecognized function for Rif1 in the resolution of UFBs during anaphase to protect genomic integrity. PMID:26256213

  13. Does complete plastid genome sequencing improve species discrimination and phylogenetic resolution in Araucaria?

    PubMed

    Ruhsam, Markus; Rai, Hardeep S; Mathews, Sarah; Ross, T Gregory; Graham, Sean W; Raubeson, Linda A; Mei, Wenbin; Thomas, Philip I; Gardner, Martin F; Ennos, Richard A; Hollingsworth, Peter M

    2015-09-01

    Obtaining accurate phylogenies and effective species discrimination using a small standardized set of plastid genes is challenging in evolutionarily young lineages. Complete plastid genome sequencing offers an increasingly easy-to-access source of characters that helps address this. The usefulness of this approach, however, depends on the extent to which plastid haplotypes track morphological species boundaries. We have tested the power of complete plastid genomes to discriminate among multiple accessions of 11 of 13 New Caledonian Araucaria species, an evolutionarily young lineage where the standard DNA barcoding approach has so far failed and phylogenetic relationships have remained elusive. Additionally, 11 nuclear gene regions were Sanger sequenced for all accessions to ascertain the success of species discrimination using a moderate number of nuclear genes. Overall, fewer than half of the New Caledonian Araucaria species with multiple accessions were monophyletic in the plastid or nuclear trees. However, the plastid data retrieved a phylogeny with a higher resolution compared to any previously published tree of this clade and supported the monophyly of about twice as many species and nodes compared to the nuclear data set. Modest gains in discrimination thus are possible, but using complete plastid genomes or a small number of nuclear genes in DNA barcoding may not substantially raise species discriminatory power in many evolutionarily young lineages. The big challenge therefore remains to develop techniques that allow routine access to large numbers of nuclear markers scaleable to thousands of individuals from phylogenetically disparate sample sets. PMID:25611173

  14. Genome Data Exploration Using Correspondence Analysis.

    PubMed

    Tekaia, Fredj

    2016-01-01

    Recent developments of sequencing technologies that allow the production of massive amounts of genomic and genotyping data have highlighted the need for synthetic data representation and pattern recognition methods that can mine and help discovering biologically meaningful knowledge included in such large data sets. Correspondence analysis (CA) is an exploratory descriptive method designed to analyze two-way data tables, including some measure of association between rows and columns. It constructs linear combinations of variables, known as factors. CA has been used for decades to study high-dimensional data, and remarkable inferences from large data tables were obtained by reducing the dimensionality to a few orthogonal factors that correspond to the largest amount of variability in the data. Herein, I review CA and highlight its use by considering examples in handling high-dimensional data that can be constructed from genomic and genetic studies. Examples in amino acid compositions of large sets of species (viruses, phages, yeast, and fungi) as well as an example related to pairwise shared orthologs in a set of yeast and fungal species, as obtained from their proteome comparisons, are considered. For the first time, results show striking segregations between yeasts and fungi as well as between viruses and phages. Distributions obtained from shared orthologs show clusters of yeast and fungal species corresponding to their phylogenetic relationships. A direct comparison with the principal component analysis method is discussed using a recently published example of genotyping data related to newly discovered traces of an ancient hominid that was compared to modern human populations in the search for ancestral similarities. CA offers more detailed results highlighting links between modern humans and the ancient hominid and their characterizations. Compared to the popular principal component analysis method, CA allows easier and more effective interpretation of results

  15. Genome Data Exploration Using Correspondence Analysis

    PubMed Central

    Tekaia, Fredj

    2016-01-01

    Recent developments of sequencing technologies that allow the production of massive amounts of genomic and genotyping data have highlighted the need for synthetic data representation and pattern recognition methods that can mine and help discovering biologically meaningful knowledge included in such large data sets. Correspondence analysis (CA) is an exploratory descriptive method designed to analyze two-way data tables, including some measure of association between rows and columns. It constructs linear combinations of variables, known as factors. CA has been used for decades to study high-dimensional data, and remarkable inferences from large data tables were obtained by reducing the dimensionality to a few orthogonal factors that correspond to the largest amount of variability in the data. Herein, I review CA and highlight its use by considering examples in handling high-dimensional data that can be constructed from genomic and genetic studies. Examples in amino acid compositions of large sets of species (viruses, phages, yeast, and fungi) as well as an example related to pairwise shared orthologs in a set of yeast and fungal species, as obtained from their proteome comparisons, are considered. For the first time, results show striking segregations between yeasts and fungi as well as between viruses and phages. Distributions obtained from shared orthologs show clusters of yeast and fungal species corresponding to their phylogenetic relationships. A direct comparison with the principal component analysis method is discussed using a recently published example of genotyping data related to newly discovered traces of an ancient hominid that was compared to modern human populations in the search for ancestral similarities. CA offers more detailed results highlighting links between modern humans and the ancient hominid and their characterizations. Compared to the popular principal component analysis method, CA allows easier and more effective interpretation of results

  16. High-resolution genome-wide linkage mapping identifies susceptibility loci for BMI in the Chinese population.

    PubMed

    Zhang, Dong Feng; Pang, Zengchang; Li, Shuxia; Thomassen, Mads; Wang, Shaojie; Jiang, Wengjie; Hjelmborg, Jacob v B; Kruse, Torben A; Kyvik, Kirsten O; Christensen, Kaare; Tan, Qihua

    2012-04-01

    The genetic loci affecting the commonly used BMI have been intensively investigated using linkage approaches in multiple populations. This study aims at performing the first genome-wide linkage scan on BMI in the Chinese population in mainland China with hypothesis that heterogeneity in genetic linkage could exist in different ethnic populations. BMI was measured from 126 dizygotic twins in Qingdao municipality who were genotyped using high-resolution Affymetrix Genome-Wide Human SNP arrays containing about 1 million single-nucleotide polymorphisms (SNPs). Nonparametric linkage analysis was performed with Merlin software package for linkage analysis using variance components approach for quantitative trait loci mapping. We identified a strong linkage peak at the end of chromosome 7 (7q36 at 186 cM) with a lod score of 4.06 which overlaps with that reported by a large multicenter study in western countries. Multiple loci showing suggestive linkage were found on chromosome 1 (lod score 2.38 at 242 cM), chromosome 8 (2.48 at 95 cM), and chromosome 14 (2.2 at 89.4 cM). The strong linkage identified in the Chinese subjects that is consistent with that found in populations of European origin could suggest the existence of evolutionarily preserved genetic mechanisms for BMI whereas the multiple suggestive loci could represent genetic effect from gene-environment interaction as a result of population-specific environmental adaptation. PMID:21273998

  17. Computational Methods for the Analysis of Array Comparative Genomic Hybridization

    PubMed Central

    Chari, Raj; Lockwood, William W.; Lam, Wan L.

    2006-01-01

    Array comparative genomic hybridization (array CGH) is a technique for assaying the copy number status of cancer genomes. The widespread use of this technology has lead to a rapid accumulation of high throughput data, which in turn has prompted the development of computational strategies for the analysis of array CGH data. Here we explain the principles behind array image processing, data visualization and genomic profile analysis, review currently available software packages, and raise considerations for future software development. PMID:17992253

  18. Deciphering intratumor heterogeneity using cancer genome analysis.

    PubMed

    Ryu, Daeun; Joung, Je-Gun; Kim, Nayoung K D; Kim, Kyu-Tae; Park, Woong-Yang

    2016-06-01

    Intratumor heterogeneity within individual cancer tissues underlies the numerous phenotypes of cancer. Tumor subclones ultimately affect therapeutic outcomes due to their distinct molecular features. Drug-resistant subclones are present at a low frequency in tissues at the time of biopsy, but can also arise as a result of acquired somatic mutations. A number of different approaches have been utilized to understand the nature of intratumor heterogeneity. Clonal analysis using whole exome or genome sequencing data can help monitor subclones in the context of tumor progression. Multiregional biopsies permit the molecular characterization of subclones within tumors. Deep sequencing has also provided researchers with the ability to measure the low allele fraction variant within a small number of cells. Ultimately, single-cell sequencing will enable the identification of every minor population within a tumor microenvironment. In the clinical context, the ability to identify and monitor the subclonal architecture of a tumor is valuable for the development of precise cancer therapeutic methods. PMID:27126234

  19. Enhancing cancer clonality analysis with integrative genomics

    PubMed Central

    2015-01-01

    Introduction It is understood that cancer is a clonal disease initiated by a single cell, and that metastasis, which is the spread of cancer from the primary site, is also initiated by a single cell. The seemingly natural capability of cancer to adapt dynamically in a Darwinian manner is a primary reason for therapeutic failures. Survival advantages may be induced by cancer therapies and also occur as a result of inherent cell and microenvironmental factors. The selected "more fit" clones outmatch their competition and then become dominant in the tumor via propagation of progeny. This clonal expansion leads to relapse, therapeutic resistance and eventually death. The goal of this study is to develop and demonstrate a more detailed clonality approach by utilizing integrative genomics. Methods Patient tumor samples were profiled by Whole Exome Sequencing (WES) and RNA-seq on an Illumina HiSeq 2500 and methylation profiling was performed on the Illumina Infinium 450K array. STAR and the Haplotype Caller were used for RNA-seq processing. Custom approaches were used for the integration of the multi-omic datasets. Results Reported are major enhancements to CloneViz, which now provides capabilities enabling a formal tumor multi-dimensional clonality analysis by integrating: i) DNA mutations, ii) RNA expressed mutations, and iii) DNA methylation data. RNA and DNA methylation integration were not previously possible, by CloneViz (previous version) or any other clonality method to date. This new approach, named iCloneViz (integrated CloneViz) employs visualization and quantitative methods, revealing an integrative genomic mutational dissection and traceability (DNA, RNA, epigenetics) thru the different layers of molecular structures. Conclusion The iCloneViz approach can be used for analysis of clonal evolution and mutational dynamics of multi-omic data sets. Revealing tumor clonal complexity in an integrative and quantitative manner facilitates improved mutational

  20. High range resolution micro-Doppler analysis

    NASA Astrophysics Data System (ADS)

    Cammenga, Zachary A.; Smith, Graeme E.; Baker, Christopher J.

    2015-05-01

    This paper addresses use of the micro-Doppler effect and the use of high range-resolution profiles to observe complex targets in complex target scenes. The combination of micro-Doppler and high range-resolution provides the ability to separate the motion of complex targets from one another. This ability leads to the differentiation of targets based on their micro-Doppler signatures. Without the high-range resolution, this would not be possible because the individual signatures would not be separable. This paper also addresses the use of the micro-Doppler information and high range-resolution profiles to generate an approximation of the scattering properties of a complex target. This approximation gives insight into the structure of the complex target and, critically, is created without using a pre-determined target model.

  1. Genome sequence and comparative genome analysis of Pseudomonas syringae pv. syringae type strain ATCC 19310.

    PubMed

    Park, Yong-Soon; Jeong, Haeyoung; Sim, Young Mi; Yi, Hwe-Su; Ryu, Choong-Min

    2014-04-01

    Pseudomonas syringae pv. syringae (Psy) is a major bacterial pathogen of many economically important plant species. Despite the severity of its impact, the genome sequence of the type strain has not been reported. Here, we present the draft genome sequence of Psy ATCC 19310. Comparative genomic analysis revealed that Psy ATCC 19310 is closely related to Psy B728a. However, only a few type III effectors, which are key virulence factors, are shared by the two strains, indicating the possibility of host-pathogen specificity and genome dynamics, even under the pathovar level. PMID:24444998

  2. Identification of transcribed sequences in Arabidopsis thaliana by using high-resolution genome tiling arrays

    NASA Technical Reports Server (NTRS)

    Stolc, Viktor; Samanta, Manoj Pratim; Tongprasit, Waraporn; Sethi, Himanshu; Liang, Shoudan; Nelson, David C.; Hegeman, Adrian; Nelson, Clark; Rancour, David; Bednarek, Sebastian; Ulrich, Eldon L.; Zhao, Qin; Wrobel, Russell L.; Newman, Craig S.; Fox, Brian G.; Phillips, George N Jr; Markley, John L.; Sussman, Michael R.

    2005-01-01

    Using a maskless photolithography method, we produced DNA oligonucleotide microarrays with probe sequences tiled throughout the genome of the plant Arabidopsis thaliana. RNA expression was determined for the complete nuclear, mitochondrial, and chloroplast genomes by tiling 5 million 36-mer probes. These probes were hybridized to labeled mRNA isolated from liquid grown T87 cells, an undifferentiated Arabidopsis cell culture line. Transcripts were detected from at least 60% of the nearly 26,330 annotated genes, which included 151 predicted genes that were not identified previously by a similar genome-wide hybridization study on four different cell lines. In comparison with previously published results with 25-mer tiling arrays produced by chromium masking-based photolithography technique, 36-mer oligonucleotide probes were found to be more useful in identifying intron-exon boundaries. Using two-dimensional HPLC tandem mass spectrometry, a small-scale proteomic analysis was performed with the same cells. A large amount of strongly hybridizing RNA was found in regions "antisense" to known genes. Similarity of antisense activities between the 25-mer and 36-mer data sets suggests that it is a reproducible and inherent property of the experiments. Transcription activities were also detected for many of the intergenic regions and the small RNAs, including tRNA, small nuclear RNA, small nucleolar RNA, and microRNA. Expression of tRNAs correlates with genome-wide amino acid usage.

  3. High Resolution Typing by Whole Genome Mapping Enables Discrimination of LA-MRSA (CC398) Strains and Identification of Transmission Events.

    PubMed

    Bosch, Thijs; Verkade, Erwin; van Luit, Martijn; Pot, Bruno; Vauterin, Paul; Burggrave, Ronald; Savelkoul, Paul; Kluytmans, Jan; Schouls, Leo

    2013-01-01

    After its emergence in 2003, a livestock-associated (LA-)MRSA clade (CC398) has caused an impressive increase in the number of isolates submitted for the Dutch national MRSA surveillance and now comprises 40% of all isolates. The currently used molecular typing techniques have limited discriminatory power for this MRSA clade, which hampers studies on the origin and transmission routes. Recently, a new molecular analysis technique named whole genome mapping was introduced. This method creates high-resolution, ordered whole genome restriction maps that may have potential for strain typing. In this study, we assessed and validated the capability of whole genome mapping to differentiate LA-MRSA isolates. Multiple validation experiments showed that whole genome mapping produced highly reproducible results. Assessment of the technique on two well-documented MRSA outbreaks showed that whole genome mapping was able to confirm one outbreak, but revealed major differences between the maps of a second, indicating that not all isolates belonged to this outbreak. Whole genome mapping of LA-MRSA isolates that were epidemiologically unlinked provided a much higher discriminatory power than spa-typing or MLVA. In contrast, maps created from LA-MRSA isolates obtained during a proven LA-MRSA outbreak were nearly indistinguishable showing that transmission of LA-MRSA can be detected by whole genome mapping. Finally, whole genome maps of LA-MRSA isolates originating from two unrelated veterinarians and their household members showed that veterinarians may carry and transmit different LA-MRSA strains at the same time. No such conclusions could be drawn based spa-typing and MLVA. Although PFGE seems to be suitable for molecular typing of LA-MRSA, WGM provides a much higher discriminatory power. Furthermore, whole genome mapping can provide a comparison with other maps within 2 days after the bacterial culture is received, making it suitable to investigate transmission events and

  4. ASAP, a systematic annotation package for community analysis of genomes.

    PubMed

    Glasner, Jeremy D; Liss, Paul; Plunkett, Guy; Darling, Aaron; Prasad, Tejasvini; Rusch, Michael; Byrnes, Alexis; Gilson, Michael; Biehl, Bryan; Blattner, Frederick R; Perna, Nicole T

    2003-01-01

    ASAP (a systematic annotation package for community analysis of genomes) is a relational database and web interface developed to store, update and distribute genome sequence data and functional characterization (https://asap.ahabs.wisc.edu/annotation/php/ASAP1.htm). ASAP facilitates ongoing community annotation of genomes and tracking of information as genome projects move from preliminary data collection through post-sequencing functional analysis. The ASAP database includes multiple genome sequences at various stages of analysis, corresponding experimental data and access to collections of related genome resources. ASAP supports three levels of users: public viewers, annotators and curators. Public viewers can currently browse updated annotation information for Escherichia coli K-12 strain MG1655, genome-wide transcript profiles from more than 50 microarray experiments and an extensive collection of mutant strains and associated phenotypic data. Annotators worldwide are currently using ASAP to participate in a community annotation project for the Erwinia chrysanthemi strain 3937 genome. Curation of the E. chrysanthemi genome annotation as well as those of additional published enterobacterial genomes is underway and will be publicly accessible in the near future. PMID:12519969

  5. Barcode Server: A Visualization-Based Genome Analysis System

    PubMed Central

    Mao, Fenglou; Olman, Victor; Wang, Yan; Xu, Ying

    2013-01-01

    We have previously developed a computational method for representing a genome as a barcode image, which makes various genomic features visually apparent. We have demonstrated that this visual capability has made some challenging genome analysis problems relatively easy to solve. We have applied this capability to a number of challenging problems, including (a) identification of horizontally transferred genes, (b) identification of genomic islands with special properties and (c) binning of metagenomic sequences, and achieved highly encouraging results. These application results inspired us to develop this barcode-based genome analysis server for public service, which supports the following capabilities: (a) calculation of the k-mer based barcode image for a provided DNA sequence; (b) detection of sequence fragments in a given genome with distinct barcodes from those of the majority of the genome, (c) clustering of provided DNA sequences into groups having similar barcodes; and (d) homology-based search using Blast against a genome database for any selected genomic regions deemed to have interesting barcodes. The barcode server provides a job management capability, allowing processing of a large number of analysis jobs for barcode-based comparative genome analyses. The barcode server is accessible at http://csbl1.bmb.uga.edu/Barcode. PMID:23457606

  6. Barcode server: a visualization-based genome analysis system.

    PubMed

    Mao, Fenglou; Olman, Victor; Wang, Yan; Xu, Ying

    2013-01-01

    We have previously developed a computational method for representing a genome as a barcode image, which makes various genomic features visually apparent. We have demonstrated that this visual capability has made some challenging genome analysis problems relatively easy to solve. We have applied this capability to a number of challenging problems, including (a) identification of horizontally transferred genes, (b) identification of genomic islands with special properties and (c) binning of metagenomic sequences, and achieved highly encouraging results. These application results inspired us to develop this barcode-based genome analysis server for public service, which supports the following capabilities: (a) calculation of the k-mer based barcode image for a provided DNA sequence; (b) detection of sequence fragments in a given genome with distinct barcodes from those of the majority of the genome, (c) clustering of provided DNA sequences into groups having similar barcodes; and (d) homology-based search using Blast against a genome database for any selected genomic regions deemed to have interesting barcodes. The barcode server provides a job management capability, allowing processing of a large number of analysis jobs for barcode-based comparative genome analyses. The barcode server is accessible at http://csbl1.bmb.uga.edu/Barcode. PMID:23457606

  7. Application of Metabolomics for High Resolution Phenotype Analysis

    PubMed Central

    Fukusaki, Eiichiro

    2014-01-01

    Metabolome, a total profile of whole metabolites, is placed on downstream of proteome. Metabolome is thought to be results of implementation of genomic information. In other words, metabolome can be called as high resolution phenotype. The easiest operation of metabolomics is the integration to the upstream ome information including transcriptome and/or proteome. Those trials have been reported at a certain scientific level. In addition, metabolomics can be operated in stand-alone mode without any other ome information. Among metabolomics tactics, the author’s group is particularly focusing on metabolic fingerprinting, in which metabolome information is employed as explanatory variant to evaluate response variant. Metabolic fingerprinting technique is expected not only for analyzing slight difference depending on genotype difference but also for expressing dynamic variation of living organisms. The author introduces several good examples which he performed. Those are useful for easy understanding of the power of metabolomics. In addition, the author mentions the latest technology for analysis of metabolic dynamism. The author’s group developed a facile analytical method for semi-quantitative metabolic dynamism. The author introduces the novel method that uses time dependent variation of isotope distribution based on stable isotope dilution. PMID:26819889

  8. High-Resolution Whole-Genome Sequencing Reveals That Specific Chromatin Domains from Most Human Chromosomes Associate with Nucleoli

    PubMed Central

    van Koningsbruggen, Silvana; Gierliński, Marek; Schofield, Pietá; Martin, David; Barton, Geoffey J.; Ariyurek, Yavuz; den Dunnen, Johan T.

    2010-01-01

    The nuclear space is mostly occupied by chromosome territories and nuclear bodies. Although this organization of chromosomes affects gene function, relatively little is known about the role of nuclear bodies in the organization of chromosomal regions. The nucleolus is the best-studied subnuclear structure and forms around the rRNA repeat gene clusters on the acrocentric chromosomes. In addition to rDNA, other chromatin sequences also surround the nucleolar surface and may even loop into the nucleolus. These additional nucleolar-associated domains (NADs) have not been well characterized. We present here a whole-genome, high-resolution analysis of chromatin endogenously associated with nucleoli. We have used a combination of three complementary approaches, namely fluorescence comparative genome hybridization, high-throughput deep DNA sequencing and photoactivation combined with time-lapse fluorescence microscopy. The data show that specific sequences from most human chromosomes, in addition to the rDNA repeat units, associate with nucleoli in a reproducible and heritable manner. NADs have in common a high density of AT-rich sequence elements, low gene density and a statistically significant enrichment in transcriptionally repressed genes. Unexpectedly, both the direct DNA sequencing and fluorescence photoactivation data show that certain chromatin loci can specifically associate with either the nucleolus, or the nuclear envelope. PMID:20826608

  9. Bacillus anthracis comparative genome analysis in support of the Amerithrax investigation

    PubMed Central

    Rasko, David A.; Worsham, Patricia L.; Abshire, Terry G.; Stanley, Scott T.; Bannan, Jason D.; Wilson, Mark R.; Langham, Richard J.; Decker, R. Scott; Jiang, Lingxia; Read, Timothy D.; Phillippy, Adam M.; Salzberg, Steven L.; Pop, Mihai; Van Ert, Matthew N.; Kenefic, Leo J.; Keim, Paul S.; Fraser-Liggett, Claire M.; Ravel, Jacques

    2011-01-01

    Before the anthrax letter attacks of 2001, the developing field of microbial forensics relied on microbial genotyping schemes based on a small portion of a genome sequence. Amerithrax, the investigation into the anthrax letter attacks, applied high-resolution whole-genome sequencing and comparative genomics to identify key genetic features of the letters’ Bacillus anthracis Ames strain. During systematic microbiological analysis of the spore material from the letters, we identified a number of morphological variants based on phenotypic characteristics and the ability to sporulate. The genomes of these morphological variants were sequenced and compared with that of the B. anthracis Ames ancestor, the progenitor of all B. anthracis Ames strains. Through comparative genomics, we identified four distinct loci with verifiable genetic mutations. Three of the four mutations could be directly linked to sporulation pathways in B. anthracis and more specifically to the regulation of the phosphorylation state of Spo0F, a key regulatory protein in the initiation of the sporulation cascade, thus linking phenotype to genotype. None of these variant genotypes were identified in single-colony environmental B. anthracis Ames isolates associated with the investigation. These genotypes were identified only in B. anthracis morphotypes isolated from the letters, indicating that the variants were not prevalent in the environment, not even the environments associated with the investigation. This study demonstrates the forensic value of systematic microbiological analysis combined with whole-genome sequencing and comparative genomics. PMID:21383169

  10. Integrative genomic analysis by interoperation of bioinformatics tools in GenomeSpace

    PubMed Central

    Thorvaldsdottir, Helga; Liefeld, Ted; Ocana, Marco; Borges-Rivera, Diego; Pochet, Nathalie; Robinson, James T.; Demchak, Barry; Hull, Tim; Ben-Artzi, Gil; Blankenberg, Daniel; Barber, Galt P.; Lee, Brian T.; Kuhn, Robert M.; Nekrutenko, Anton; Segal, Eran; Ideker, Trey; Reich, Michael; Regev, Aviv; Chang, Howard Y.; Mesirov, Jill P.

    2015-01-01

    Integrative analysis of multiple data types to address complex biomedical questions requires the use of multiple software tools in concert and remains an enormous challenge for most of the biomedical research community. Here we introduce GenomeSpace (http://www.genomespace.org), a cloud-based, cooperative community resource. Seeded as a collaboration of six of the most popular genomics analysis tools, GenomeSpace now supports the streamlined interaction of 20 bioinformatics tools and data resources. To facilitate the ability of non-programming users’ to leverage GenomeSpace in integrative analysis, it offers a growing set of ‘recipes’, short workflows involving a few tools and steps to guide investigators through high utility analysis tasks. PMID:26780094

  11. High Resolution Genomic Scans Reveal Genetic Architecture Controlling Alcohol Preference in Bidirectionally Selected Rat Model.

    PubMed

    Lo, Chiao-Ling; Lossie, Amy C; Liang, Tiebing; Liu, Yunlong; Xuei, Xiaoling; Lumeng, Lawrence; Zhou, Feng C; Muir, William M

    2016-08-01

    Investigations on the influence of nature vs. nurture on Alcoholism (Alcohol Use Disorder) in human have yet to provide a clear view on potential genomic etiologies. To address this issue, we sequenced a replicated animal model system bidirectionally-selected for alcohol preference (AP). This model is uniquely suited to map genetic effects with high reproducibility, and resolution. The origin of the rat lines (an 8-way cross) resulted in small haplotype blocks (HB) with a corresponding high level of resolution. We sequenced DNAs from 40 samples (10 per line of each replicate) to determine allele frequencies and HB. We achieved ~46X coverage per line and replicate. Excessive differentiation in the genomic architecture between lines, across replicates, termed signatures of selection (SS), were classified according to gene and region. We identified SS in 930 genes associated with AP. The majority (50%) of the SS were confined to single gene regions, the greatest numbers of which were in promoters (284) and intronic regions (169) with the least in exon's (4), suggesting that differences in AP were primarily due to alterations in regulatory regions. We confirmed previously identified genes and found many new genes associated with AP. Of those newly identified genes, several demonstrated neuronal function involved in synaptic memory and reward behavior, e.g. ion channels (Kcnf1, Kcnn3, Scn5a), excitatory receptors (Grin2a, Gria3, Grip1), neurotransmitters (Pomc), and synapses (Snap29). This study not only reveals the polygenic architecture of AP, but also emphasizes the importance of regulatory elements, consistent with other complex traits. PMID:27490364

  12. High Resolution Genomic Scans Reveal Genetic Architecture Controlling Alcohol Preference in Bidirectionally Selected Rat Model

    PubMed Central

    Lo, Chiao-Ling; Liang, Tiebing; Liu, Yunlong; Lumeng, Lawrence; Zhou, Feng C.; Muir, William M.

    2016-01-01

    Investigations on the influence of nature vs. nurture on Alcoholism (Alcohol Use Disorder) in human have yet to provide a clear view on potential genomic etiologies. To address this issue, we sequenced a replicated animal model system bidirectionally-selected for alcohol preference (AP). This model is uniquely suited to map genetic effects with high reproducibility, and resolution. The origin of the rat lines (an 8-way cross) resulted in small haplotype blocks (HB) with a corresponding high level of resolution. We sequenced DNAs from 40 samples (10 per line of each replicate) to determine allele frequencies and HB. We achieved ~46X coverage per line and replicate. Excessive differentiation in the genomic architecture between lines, across replicates, termed signatures of selection (SS), were classified according to gene and region. We identified SS in 930 genes associated with AP. The majority (50%) of the SS were confined to single gene regions, the greatest numbers of which were in promoters (284) and intronic regions (169) with the least in exon's (4), suggesting that differences in AP were primarily due to alterations in regulatory regions. We confirmed previously identified genes and found many new genes associated with AP. Of those newly identified genes, several demonstrated neuronal function involved in synaptic memory and reward behavior, e.g. ion channels (Kcnf1, Kcnn3, Scn5a), excitatory receptors (Grin2a, Gria3, Grip1), neurotransmitters (Pomc), and synapses (Snap29). This study not only reveals the polygenic architecture of AP, but also emphasizes the importance of regulatory elements, consistent with other complex traits. PMID:27490364

  13. Genome-Wide Organization of GATA1 and TAL1 Determined at High Resolution

    PubMed Central

    Han, G. Celine; Vinayachandran, Vinesh; Bataille, Alain R.; Park, Bongsoo; Chan-Salis, Ka Yim; Keller, Cheryl A.; Long, Maria; Mahony, Shaun; Hardison, Ross C.

    2015-01-01

    Erythroid development and differentiation from multiprogenitor cells into red blood cells requires precise transcriptional regulation. Key erythroid transcription factors, GATA1 and TAL1, cooperate, along with other proteins, to regulate many aspects of this process. How GATA1 and TAL1 are juxtaposed along the DNA and their cognate DNA binding site across the mouse genome remains unclear. We applied high-resolution ChIP-exo (chromatin immunoprecipitation followed by 5′-to-3′ exonuclease treatment and then massively parallel DNA sequencing) to GATA1 and TAL1 to study their positional organization across the mouse genome during GATA1-dependent maturation. Two complementary methods, MultiGPS and peak pairing, were used to determine high-confidence binding locations by ChIP-exo. We identified ∼10,000 GATA1 and ∼15,000 TAL1 locations, which were essentially confirmed by ChIP-seq (chromatin immunoprecipitation followed by massively parallel DNA sequencing). Of these, ∼4,000 locations were bound by both GATA1 and TAL1. About three-quarters of them were tightly linked to a partial E-box located 7 or 8 bp upstream of a WGATAA motif. Both TAL1 and GATA1 generated distinct characteristic ChIP-exo peaks around WGATAA motifs that reflect their positional arrangement within a complex. We show that TAL1 and GATA1 form a precisely organized complex at a compound motif consisting of a TG 7 or 8 bp upstream of a WGATAA motif across thousands of genomic locations. PMID:26503782

  14. Genome-Wide Organization of GATA1 and TAL1 Determined at High Resolution.

    PubMed

    Han, G Celine; Vinayachandran, Vinesh; Bataille, Alain R; Park, Bongsoo; Chan-Salis, Ka Yim; Keller, Cheryl A; Long, Maria; Mahony, Shaun; Hardison, Ross C; Pugh, B Franklin

    2016-01-01

    Erythroid development and differentiation from multiprogenitor cells into red blood cells requires precise transcriptional regulation. Key erythroid transcription factors, GATA1 and TAL1, cooperate, along with other proteins, to regulate many aspects of this process. How GATA1 and TAL1 are juxtaposed along the DNA and their cognate DNA binding site across the mouse genome remains unclear. We applied high-resolution ChIP-exo (chromatin immunoprecipitation followed by 5'-to-3' exonuclease treatment and then massively parallel DNA sequencing) to GATA1 and TAL1 to study their positional organization across the mouse genome during GATA1-dependent maturation. Two complementary methods, MultiGPS and peak pairing, were used to determine high-confidence binding locations by ChIP-exo. We identified ∼10,000 GATA1 and ∼15,000 TAL1 locations, which were essentially confirmed by ChIP-seq (chromatin immunoprecipitation followed by massively parallel DNA sequencing). Of these, ∼4,000 locations were bound by both GATA1 and TAL1. About three-quarters of them were tightly linked to a partial E-box located 7 or 8 bp upstream of a WGATAA motif. Both TAL1 and GATA1 generated distinct characteristic ChIP-exo peaks around WGATAA motifs that reflect their positional arrangement within a complex. We show that TAL1 and GATA1 form a precisely organized complex at a compound motif consisting of a TG 7 or 8 bp upstream of a WGATAA motif across thousands of genomic locations. PMID:26503782

  15. A rapid whole genome sequencing and analysis system supporting genomic epidemiology (7th Annual SFAF Meeting, 2012)

    ScienceCinema

    FitzGerald, Michael [Broad Institute

    2013-02-12

    Michael FitzGerald on "A rapid whole genome sequencing and analysis system supporting genomic epidemiology" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

  16. A rapid whole genome sequencing and analysis system supporting genomic epidemiology (7th Annual SFAF Meeting, 2012)

    SciTech Connect

    FitzGerald, Michael

    2012-06-01

    Michael FitzGerald on "A rapid whole genome sequencing and analysis system supporting genomic epidemiology" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

  17. High-resolution mapping of open chromatin in the rice genome

    PubMed Central

    Zhang, Wenli; Wu, Yufeng; Schnable, James C.; Zeng, Zixian; Freeling, Michael; Crawford, Gregory E.; Jiang, Jiming

    2012-01-01

    Gene expression is controlled by the complex interaction of transcription factors binding to promoters and other regulatory DNA elements. One common characteristic of the genomic regions associated with regulatory proteins is a pronounced sensitivity to DNase I digestion. We generated genome-wide high-resolution maps of DNase I hypersensitive (DH) sites from both seedling and callus tissues of rice (Oryza sativa). Approximately 25% of the DH sites from both tissues were found in putative promoters, indicating that the vast majority of the gene regulatory elements in rice are not located in promoter regions. We found 58% more DH sites in the callus than in the seedling. For DH sites detected in both the seedling and callus, 31% displayed significantly different levels of DNase I sensitivity within the two tissues. Genes that are differentially expressed in the seedling and callus were frequently associated with DH sites in both tissues. The DNA sequences contained within the DH sites were hypomethylated, consistent with what is known about active gene regulatory elements. Interestingly, tissue-specific DH sites located in the promoters showed a higher level of DNA methylation than the average DNA methylation level of all the DH sites located in the promoters. A distinct elevation of H3K27me3 was associated with intergenic DH sites. These results suggest that epigenetic modifications play a role in the dynamic changes of the numbers and DNase I sensitivity of DH sites during development. PMID:22110044

  18. Whole-Genome Mapping as a Novel High-Resolution Typing Tool for Legionella pneumophila

    PubMed Central

    Euser, Sjoerd M.; Landman, Fabian; Bruin, Jacob P.; IJzerman, Ed P.; den Boer, Jeroen W.; Schouls, Leo M.

    2015-01-01

    Legionella is the causative agent for Legionnaires' disease (LD) and is responsible for several large outbreaks in the world. More than 90% of LD cases are caused by Legionella pneumophila, and studies on the origin and transmission routes of this pathogen rely on adequate molecular characterization of isolates. Current typing of L. pneumophila mainly depends on sequence-based typing (SBT). However, studies have shown that in some outbreak situations, SBT does not have sufficient discriminatory power to distinguish between related and nonrelated L. pneumophila isolates. In this study, we used a novel high-resolution typing technique, called whole-genome mapping (WGM), to differentiate between epidemiologically related and nonrelated L. pneumophila isolates. Assessment of the method by various validation experiments showed highly reproducible results, and WGM was able to confirm two well-documented Dutch L. pneumophila outbreaks. Comparison of whole-genome maps of the two outbreaks together with WGMs of epidemiologically nonrelated L. pneumophila isolates showed major differences between the maps, and WGM yielded a higher discriminatory power than SBT. In conclusion, WGM can be a valuable alternative to perform outbreak investigations of L. pneumophila in real time since the turnaround time from culture to comparison of the L. pneumophila maps is less than 24 h. PMID:26202110

  19. Genome wide association study of spontaneous resolution of hepatitis C virus infection

    PubMed Central

    Duggal, Priya; Thio, Chloe L.; Wojcik, Genevieve L.; Goedert, James J.; Mangia, Alessandra; Latanich, Rachel; Kim, Arthur Y.; Lauer, Georg M.; Chung, Raymond T.; Peters, Marion G.; Kirk, Greg D.; Mehta, Shruti H.; Cox, Andrea L.; Khakoo, Salim I.; Alric, Laurent; Cramp, Matthew E.; Donfield, Sharyne M.; Edlin, Brian R.; Tobler, Leslie H; Busch, Michael P.; Alexander, Graeme; Rosen, Hugo R.; Gao, Xiaojiang; Abdel-Hamid, Mohamed; Apps, Richard; Carrington, Mary; Thomas, David L.

    2013-01-01

    Background Hepatitis C virus (HCV) infections occur worldwide and either spontaneously resolve or persist and markedly increase the person’s lifetime risk of cirrhosis and hepatocellular carcinoma. Although HCV persistence occurs more often in persons of African ancestry and in persons with a genetic variant near IL28B, the genetic basis is not well understood. Objective To evaluate the host genetic basis for spontaneous resolution of HCV infection. Design Two-stage genome wide association study (GWAS). Setting 13 international multicenter study sites. Patients 919 individuals with serum HCV antibodies but no HCV RNA (spontaneous resolution) and 1482 individuals with serum HCV antibodies and RNA (persistence). Measurements Frequencies of 792,721 SNPs. Results Differences in allele frequencies between persons with spontaneous resolution and persistence were identified on chromosomes 19q13.13 and 6p21.32. On chromosome 19, allele frequency differences localized near IL28B and included rs12979860 (overall per-allele OR = 0.45, P = 2.17 × 10−30) and 10 additional SNPs spanning 55,000 bases. On chromosome 6, allele frequency differences localized near genes for class II human leukocyte antigens (HLA) and included rs4273729 (overall per-allele OR= 0.59, P = 1.71 × 10−16) near DQB1*03:01 and an additional 116 SNPs spanning 1,090,000 base pairs. The associations in chromosomes 19 and 6 were independent, additive, and explain an estimated 14.9% (95% CI: 8.5–22.6%) of the variation in HCV resolution in those of European-Ancestry, and 15.8% (95% CI:4.4–31.0%) in individuals of African-Ancestry. Replication of the chromosome 6 SNP, rs4272729 in an additional 746 individuals confirmed the findings (p=0.015). Limitations Epigenetic effects were not studied. Conclusions IL28B and HLA class II are independently associated with spontaneous resolution of HCV infection and SNPs marking IL28B and DQB1*03:01 may explain ~15% of spontaneous resolution of HCV infection. PMID

  20. GENOME ANALYSIS OF BURKHOLDERIA CEPACIA AC1100

    EPA Science Inventory

    Burkholderia cepacia is an important organism in bioremediation of environmental pollutants and it is also of increasing interest as a human pathogen. The genomic organization of B. cepacia is being studied in order to better understand its unusual adaptive capacity and genome pl...

  1. The human genome: a multifractal analysis

    PubMed Central

    2011-01-01

    Background Several studies have shown that genomes can be studied via a multifractal formalism. Recently, we used a multifractal approach to study the genetic information content of the Caenorhabditis elegans genome. Here we investigate the possibility that the human genome shows a similar behavior to that observed in the nematode. Results We report here multifractality in the human genome sequence. This behavior correlates strongly on the presence of Alu elements and to a lesser extent on CpG islands and (G+C) content. In contrast, no or low relationship was found for LINE, MIR, MER, LTRs elements and DNA regions poor in genetic information. Gene function, cluster of orthologous genes, metabolic pathways, and exons tended to increase their frequencies with ranges of multifractality and large gene families were located in genomic regions with varied multifractality. Additionally, a multifractal map and classification for human chromosomes are proposed. Conclusions Based on these findings, we propose a descriptive non-linear model for the structure of the human genome, with some biological implications. This model reveals 1) a multifractal regionalization where many regions coexist that are far from equilibrium and 2) this non-linear organization has significant molecular and medical genetic implications for understanding the role of Alu elements in genome stability and structure of the human genome. Given the role of Alu sequences in gene regulation, genetic diseases, human genetic diversity, adaptation and phylogenetic analyses, these quantifications are especially useful. PMID:21999602

  2. SMASH, a fragmentation and sequencing method for genomic copy number analysis.

    PubMed

    Wang, Zihua; Andrews, Peter; Kendall, Jude; Ma, Beicong; Hakker, Inessa; Rodgers, Linda; Ronemus, Michael; Wigler, Michael; Levy, Dan

    2016-06-01

    Copy number variants (CNVs) underlie a significant amount of genetic diversity and disease. CNVs can be detected by a number of means, including chromosomal microarray analysis (CMA) and whole-genome sequencing (WGS), but these approaches suffer from either limited resolution (CMA) or are highly expensive for routine screening (both CMA and WGS). As an alternative, we have developed a next-generation sequencing-based method for CNV analysis termed SMASH, for short multiply aggregated sequence homologies. SMASH utilizes random fragmentation of input genomic DNA to create chimeric sequence reads, from which multiple mappable tags can be parsed using maximal almost-unique matches (MAMs). The SMASH tags are then binned and segmented, generating a profile of genomic copy number at the desired resolution. Because fewer reads are necessary relative to WGS to give accurate CNV data, SMASH libraries can be highly multiplexed, allowing large numbers of individuals to be analyzed at low cost. Increased genomic resolution can be achieved by sequencing to higher depth. PMID:27197213

  3. SMASH, a fragmentation and sequencing method for genomic copy number analysis

    PubMed Central

    Wang, Zihua; Andrews, Peter; Kendall, Jude; Ma, Beicong; Hakker, Inessa; Rodgers, Linda; Ronemus, Michael; Wigler, Michael; Levy, Dan

    2016-01-01

    Copy number variants (CNVs) underlie a significant amount of genetic diversity and disease. CNVs can be detected by a number of means, including chromosomal microarray analysis (CMA) and whole-genome sequencing (WGS), but these approaches suffer from either limited resolution (CMA) or are highly expensive for routine screening (both CMA and WGS). As an alternative, we have developed a next-generation sequencing-based method for CNV analysis termed SMASH, for short multiply aggregated sequence homologies. SMASH utilizes random fragmentation of input genomic DNA to create chimeric sequence reads, from which multiple mappable tags can be parsed using maximal almost-unique matches (MAMs). The SMASH tags are then binned and segmented, generating a profile of genomic copy number at the desired resolution. Because fewer reads are necessary relative to WGS to give accurate CNV data, SMASH libraries can be highly multiplexed, allowing large numbers of individuals to be analyzed at low cost. Increased genomic resolution can be achieved by sequencing to higher depth. PMID:27197213

  4. Genomic resolution of an aggressive, widespread, diverse and expanding meningococcal serogroup B, C and W lineage

    PubMed Central

    Lucidarme, Jay; Hill, Dorothea M.C.; Bratcher, Holly B.; Gray, Steve J.; du Plessis, Mignon; Tsang, Raymond S.W.; Vazquez, Julio A.; Taha, Muhamed-Kheir; Ceyhan, Mehmet; Efron, Adriana M.; Gorla, Maria C.; Findlow, Jamie; Jolley, Keith A.; Maiden, Martin C.J.; Borrow, Ray

    2015-01-01

    Summary Objectives Neisseria meningitidis is a leading cause of meningitis and septicaemia. The hyperinvasive ST-11 clonal complex (cc11) caused serogroup C (MenC) outbreaks in the US military in the 1960s and UK universities in the 1990s, a global Hajj-associated serogroup W (MenW) outbreak in 2000–2001, and subsequent MenW epidemics in sub-Saharan Africa. More recently, endemic MenW disease has expanded in South Africa, South America and the UK, and MenC cases have been reported among European and North American men who have sex with men (MSM). Routine typing schemes poorly resolve cc11 so we established the population structure at genomic resolution. Methods Representatives of these episodes and other geo-temporally diverse cc11 meningococci (n = 750) were compared across 1546 core genes and visualised on phylogenetic networks. Results MenW isolates were confined to a distal portion of one of two main lineages with MenB and MenC isolates interspersed elsewhere. An expanding South American/UK MenW strain was distinct from the ‘Hajj outbreak’ strain and a closely related endemic South African strain. Recent MenC isolates from MSM in France and the UK were closely related but distinct. Conclusions High resolution ‘genomic’ multilocus sequence typing is necessary to resolve and monitor the spread of diverse cc11 lineages globally. PMID:26226598

  5. Initial sequencing and analysis of the human genome.

    PubMed

    Lander, E S; Linton, L M; Birren, B; Nusbaum, C; Zody, M C; Baldwin, J; Devon, K; Dewar, K; Doyle, M; FitzHugh, W; Funke, R; Gage, D; Harris, K; Heaford, A; Howland, J; Kann, L; Lehoczky, J; LeVine, R; McEwan, P; McKernan, K; Meldrim, J; Mesirov, J P; Miranda, C; Morris, W; Naylor, J; Raymond, C; Rosetti, M; Santos, R; Sheridan, A; Sougnez, C; Stange-Thomann, Y; Stojanovic, N; Subramanian, A; Wyman, D; Rogers, J; Sulston, J; Ainscough, R; Beck, S; Bentley, D; Burton, J; Clee, C; Carter, N; Coulson, A; Deadman, R; Deloukas, P; Dunham, A; Dunham, I; Durbin, R; French, L; Grafham, D; Gregory, S; Hubbard, T; Humphray, S; Hunt, A; Jones, M; Lloyd, C; McMurray, A; Matthews, L; Mercer, S; Milne, S; Mullikin, J C; Mungall, A; Plumb, R; Ross, M; Shownkeen, R; Sims, S; Waterston, R H; Wilson, R K; Hillier, L W; McPherson, J D; Marra, M A; Mardis, E R; Fulton, L A; Chinwalla, A T; Pepin, K H; Gish, W R; Chissoe, S L; Wendl, M C; Delehaunty, K D; Miner, T L; Delehaunty, A; Kramer, J B; Cook, L L; Fulton, R S; Johnson, D L; Minx, P J; Clifton, S W; Hawkins, T; Branscomb, E; Predki, P; Richardson, P; Wenning, S; Slezak, T; Doggett, N; Cheng, J F; Olsen, A; Lucas, S; Elkin, C; Uberbacher, E; Frazier, M; Gibbs, R A; Muzny, D M; Scherer, S E; Bouck, J B; Sodergren, E J; Worley, K C; Rives, C M; Gorrell, J H; Metzker, M L; Naylor, S L; Kucherlapati, R S; Nelson, D L; Weinstock, G M; Sakaki, Y; Fujiyama, A; Hattori, M; Yada, T; Toyoda, A; Itoh, T; Kawagoe, C; Watanabe, H; Totoki, Y; Taylor, T; Weissenbach, J; Heilig, R; Saurin, W; Artiguenave, F; Brottier, P; Bruls, T; Pelletier, E; Robert, C; Wincker, P; Smith, D R; Doucette-Stamm, L; Rubenfield, M; Weinstock, K; Lee, H M; Dubois, J; Rosenthal, A; Platzer, M; Nyakatura, G; Taudien, S; Rump, A; Yang, H; Yu, J; Wang, J; Huang, G; Gu, J; Hood, L; Rowen, L; Madan, A; Qin, S; Davis, R W; Federspiel, N A; Abola, A P; Proctor, M J; Myers, R M; Schmutz, J; Dickson, M; Grimwood, J; Cox, D R; Olson, M V; Kaul, R; Raymond, C; Shimizu, N; Kawasaki, K; Minoshima, S; Evans, G A; Athanasiou, M; Schultz, R; Roe, B A; Chen, F; Pan, H; Ramser, J; Lehrach, H; Reinhardt, R; McCombie, W R; de la Bastide, M; Dedhia, N; Blöcker, H; Hornischer, K; Nordsiek, G; Agarwala, R; Aravind, L; Bailey, J A; Bateman, A; Batzoglou, S; Birney, E; Bork, P; Brown, D G; Burge, C B; Cerutti, L; Chen, H C; Church, D; Clamp, M; Copley, R R; Doerks, T; Eddy, S R; Eichler, E E; Furey, T S; Galagan, J; Gilbert, J G; Harmon, C; Hayashizaki, Y; Haussler, D; Hermjakob, H; Hokamp, K; Jang, W; Johnson, L S; Jones, T A; Kasif, S; Kaspryzk, A; Kennedy, S; Kent, W J; Kitts, P; Koonin, E V; Korf, I; Kulp, D; Lancet, D; Lowe, T M; McLysaght, A; Mikkelsen, T; Moran, J V; Mulder, N; Pollara, V J; Ponting, C P; Schuler, G; Schultz, J; Slater, G; Smit, A F; Stupka, E; Szustakowki, J; Thierry-Mieg, D; Thierry-Mieg, J; Wagner, L; Wallis, J; Wheeler, R; Williams, A; Wolf, Y I; Wolfe, K H; Yang, S P; Yeh, R F; Collins, F; Guyer, M S; Peterson, J; Felsenfeld, A; Wetterstrand, K A; Patrinos, A; Morgan, M J; de Jong, P; Catanese, J J; Osoegawa, K; Shizuya, H; Choi, S; Chen, Y J; Szustakowki, J

    2001-02-15

    The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence. PMID:11237011

  6. Analysis of recent segmental duplications in the bovine genome

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Duplicated sequences are an important source of gene innovation and structural variation within mammalian genomes. We describe the first systematic and genome-wide analysis of segmental duplications in the modern domesticated cattle (Bos taurus). Using two distinct computational analyses, we estimat...

  7. Complete genome sequencing and comparative genomic analysis of functionally diverse Lysinibacillus sphaericus III(3)7.

    PubMed

    Rey, Andrés; Silva-Quintero, Laura; Dussán, Jenny

    2016-09-01

    Lysinibacillus sphaericus III(3)7 is a native Colombian strain, the first one isolated from soil samples. This strain has shown high levels of pathogenic activity against Culex quinquefaciatus larvae in laboratory assays compared to other members of the same species. Using Pacific Biosciences sequencing technology we sequenced, annotated (de novo) and described the genome of strain III(3)7, achieving a complete genome sequence status. We then performed a comparative analysis between the newly sequenced genome and the ones previously reported for Colombian isolates L. sphaericus OT4b.31, CBAM5 and OT4b.25, with the inclusion of L. sphaericus C3-41 that has been used as a reference genome for most of previous genome sequencing projects. We concluded that L. sphaericus III(3)7 is highly similar with strain OT4b.25 and shares high levels of synteny with isolates CBAM5 and C3-41. PMID:27419068

  8. GenomePeek—an online tool for prokaryotic genome and metagenome analysis

    DOE PAGESBeta

    McNair, Katelyn; Edwards, Robert A.

    2015-06-16

    As increases in prokaryotic sequencing take place, a method to quickly and accurately analyze this data is needed. Previous tools are mainly designed for metagenomic analysis and have limitations; such as long runtimes and significant false positive error rates. The online tool GenomePeek (edwards.sdsu.edu/GenomePeek) was developed to analyze both single genome and metagenome sequencing files, quickly and with low error rates. GenomePeek uses a sequence assembly approach where reads to a set of conserved genes are extracted, assembled and then aligned against the highly specific reference database. GenomePeek was found to be faster than traditional approaches while still keeping errormore » rates low, as well as offering unique data visualization options.« less

  9. GenomePeek—an online tool for prokaryotic genome and metagenome analysis

    SciTech Connect

    McNair, Katelyn; Edwards, Robert A.

    2015-06-16

    As increases in prokaryotic sequencing take place, a method to quickly and accurately analyze this data is needed. Previous tools are mainly designed for metagenomic analysis and have limitations; such as long runtimes and significant false positive error rates. The online tool GenomePeek (edwards.sdsu.edu/GenomePeek) was developed to analyze both single genome and metagenome sequencing files, quickly and with low error rates. GenomePeek uses a sequence assembly approach where reads to a set of conserved genes are extracted, assembled and then aligned against the highly specific reference database. GenomePeek was found to be faster than traditional approaches while still keeping error rates low, as well as offering unique data visualization options.

  10. Genomic islands of divergence and their consequences for the resolution of spatial structure in an exploited marine fish

    PubMed Central

    Bradbury, Ian R; Hubert, Sophie; Higgins, Brent; Bowman, Sharen; Borza, Tudor; Paterson, Ian G; Snelgrove, Paul V R; Morris, Corey J; Gregory, Robert S; Hardie, David; Hutchings, Jeffrey A; Ruzzante, Daniel E; Taggart, Christopher T; Bentzen, Paul

    2013-01-01

    As populations diverge, genomic regions associated with adaptation display elevated differentiation. These genomic islands of adaptive divergence can inform conservation efforts in exploited species, by refining the delineation of management units, and providing genomic tools for more precise and effective population monitoring and the successful assignment of individuals and products. We explored heterogeneity in genomic divergence and its impact on the resolution of spatial population structure in exploited populations of Atlantic cod, Gadus morhua, using genome wide expressed sequence derived single nucleotide polymorphisms in 466 individuals sampled across the range. Outlier tests identified elevated divergence at 5.2% of SNPs, consistent with directional selection in one-third of linkage groups. Genomic regions of elevated divergence ranged in size from a single position to several cM. Structuring at neutral loci was associated with geographic features, whereas outlier SNPs revealed genetic discontinuities in both the eastern and western Atlantic. This fine-scale geographic differentiation enhanced assignment to region of origin, and through the identification of adaptive diversity, fundamentally changes how these populations should be conserved. This work demonstrates the utility of genome scans for adaptive divergence in the delineation of stock structure, the traceability of individuals and products, and ultimately a role for population genomics in fisheries conservation. PMID:23745137

  11. Genomic islands of divergence and their consequences for the resolution of spatial structure in an exploited marine fish.

    PubMed

    Bradbury, Ian R; Hubert, Sophie; Higgins, Brent; Bowman, Sharen; Borza, Tudor; Paterson, Ian G; Snelgrove, Paul V R; Morris, Corey J; Gregory, Robert S; Hardie, David; Hutchings, Jeffrey A; Ruzzante, Daniel E; Taggart, Christopher T; Bentzen, Paul

    2013-04-01

    As populations diverge, genomic regions associated with adaptation display elevated differentiation. These genomic islands of adaptive divergence can inform conservation efforts in exploited species, by refining the delineation of management units, and providing genomic tools for more precise and effective population monitoring and the successful assignment of individuals and products. We explored heterogeneity in genomic divergence and its impact on the resolution of spatial population structure in exploited populations of Atlantic cod, Gadus morhua, using genome wide expressed sequence derived single nucleotide polymorphisms in 466 individuals sampled across the range. Outlier tests identified elevated divergence at 5.2% of SNPs, consistent with directional selection in one-third of linkage groups. Genomic regions of elevated divergence ranged in size from a single position to several cM. Structuring at neutral loci was associated with geographic features, whereas outlier SNPs revealed genetic discontinuities in both the eastern and western Atlantic. This fine-scale geographic differentiation enhanced assignment to region of origin, and through the identification of adaptive diversity, fundamentally changes how these populations should be conserved. This work demonstrates the utility of genome scans for adaptive divergence in the delineation of stock structure, the traceability of individuals and products, and ultimately a role for population genomics in fisheries conservation. PMID:23745137

  12. Exploratory analysis of genomic segmentations with Segtools

    PubMed Central

    2011-01-01

    Background As genome-wide experiments and annotations become more prevalent, researchers increasingly require tools to help interpret data at this scale. Many functional genomics experiments involve partitioning the genome into labeled segments, such that segments sharing the same label exhibit one or more biochemical or functional traits. For example, a collection of ChlP-seq experiments yields a compendium of peaks, each labeled with one or more associated DNA-binding proteins. Similarly, manually or automatically generated annotations of functional genomic elements, including cis-regulatory modules and protein-coding or RNA genes, can also be summarized as genomic segmentations. Results We present a software toolkit called Segtools that simplifies and automates the exploration of genomic segmentations. The software operates as a series of interacting tools, each of which provides one mode of summarization. These various tools can be pipelined and summarized in a single HTML page. We describe the Segtools toolkit and demonstrate its use in interpreting a collection of human histone modification data sets and Plasmodium falciparum local chromatin structure data sets. Conclusions Segtools provides a convenient, powerful means of interpreting a genomic segmentation. PMID:22029426

  13. Analysis of DOA estimation spatial resolution using MUSIC algorithm

    NASA Astrophysics Data System (ADS)

    Guo, Yue; Wang, Hongyuan; Luo, Bin

    2005-11-01

    This paper presents a performance analysis of the spatial resolution of the direction of arrival (DOA) estimates attained by the multiple signal classification (MUSIC) algorithm for uncorrelated sources. The confidence interval of estimation angle which is much more intuitionistic will be considered as the new evaluation standard for the spatial resolution. Then, based on the statistic method, the qualitative analysis reveals the factors influencing the performance of the MUSIC algorithm. At last, quantitative simulations prove the theoretical analysis result exactly.

  14. A high-resolution map of the Nile tilapia genome: a resource for studying cichlids and other percomorphs

    PubMed Central

    2012-01-01

    Background The Nile tilapia (Oreochromis niloticus) is the second most farmed fish species worldwide. It is also an important model for studies of fish physiology, particularly because of its broad tolerance to an array of environments. It is a good model to study evolutionary mechanisms in vertebrates, because of its close relationship to haplochromine cichlids, which have undergone rapid speciation in East Africa. The existing genomic resources for Nile tilapia include a genetic map, BAC end sequences and ESTs, but comparative genome analysis and maps of quantitative trait loci (QTL) are still limited. Results We have constructed a high-resolution radiation hybrid (RH) panel for the Nile tilapia and genotyped 1358 markers consisting of 850 genes, 82 markers corresponding to BAC end sequences, 154 microsatellites and 272 single nucleotide polymorphisms (SNPs). From these, 1296 markers could be associated in 81 RH groups, while 62 were not linked. The total size of the RH map is 34,084 cR3500 and 937,310 kb. It covers 88% of the entire genome with an estimated inter-marker distance of 742 Kb. Mapping of microsatellites enabled integration to the genetic map. We have merged LG8 and LG24 into a single linkage group, and confirmed that LG16-LG21 are also merged. The orientation and association of RH groups to each chromosome and LG was confirmed by chromosomal in situ hybridizations (FISH) of 55 BACs. Fifty RH groups were localized on the 22 chromosomes while 31 remained small orphan groups. Synteny relationships were determined between Nile tilapia, stickleback, medaka and pufferfish. Conclusion The RH map and associated FISH map provide a valuable gene-ordered resource for gene mapping and QTL studies. All genetic linkage groups with their corresponding RH groups now have a corresponding chromosome which can be identified in the karyotype. Placement of conserved segments indicated that multiple inter-chromosomal rearrangements have occurred between Nile tilapia

  15. High resolution radiation hybrid maps of bovine chromosomes 19 and 29: comparison with the bovine genome sequence assembly

    PubMed Central

    Prasad, Aparna; Schiex, Thomas; McKay, Stephanie; Murdoch, Brenda; Wang, Zhiquan; Womack, James E; Stothard, Paul; Moore, Stephen S

    2007-01-01

    Background High resolution radiation hybrid (RH) maps can facilitate genome sequence assembly by correctly ordering genes and genetic markers along chromosomes. The objective of the present study was to generate high resolution RH maps of bovine chromosomes 19 (BTA19) and 29 (BTA29), and compare them with the current 7.1X bovine genome sequence assembly (bovine build 3.1). We have chosen BTA19 and 29 as candidate chromosomes for mapping, since many Quantitative Trait Loci (QTL) for the traits of carcass merit and residual feed intake have been identified on these chromosomes. Results We have constructed high resolution maps of BTA19 and BTA29 consisting of 555 and 253 Single Nucleotide Polymorphism (SNP) markers respectively using a 12,000 rad whole genome RH panel. With these markers, the RH map of BTA19 and BTA29 extended to 4591.4 cR and 2884.1 cR in length respectively. When aligned with the current bovine build 3.1, the order of markers on the RH map for BTA19 and 29 showed inconsistencies with respect to the genome assembly. Maps of both the chromosomes show that there is a significant internal rearrangement of the markers involving displacement, inversion and flips within the scaffolds with some scaffolds being misplaced in the genome assembly. We also constructed cattle-human comparative maps of these chromosomes which showed an overall agreement with the comparative maps published previously. However, minor discrepancies in the orientation of few homologous synteny blocks were observed. Conclusion The high resolution maps of BTA19 (average 1 locus/139 kb) and BTA29 (average 1 locus/208 kb) presented in this study suggest that by the incorporation of RH mapping information, the current bovine genome sequence assembly can be significantly improved. Furthermore, these maps can serve as a potential resource for fine mapping QTL and identification of causative mutations underlying QTL for economically important traits. PMID:17784962

  16. Genome-wide comparative analysis of the Brassica rapa gene space reveals genome shrinkage and differential loss of duplicated genes after whole genome triplication

    PubMed Central

    2009-01-01

    Background Brassica rapa is one of the most economically important vegetable crops worldwide. Owing to its agronomic importance and phylogenetic position, B. rapa provides a crucial reference to understand polyploidy-related crop genome evolution. The high degree of sequence identity and remarkably conserved genome structure between Arabidopsis and Brassica genomes enables comparative tiling sequencing using Arabidopsis sequences as references to select the counterpart regions in B. rapa, which is a strong challenge of structural and comparative crop genomics. Results We assembled 65.8 megabase-pairs of non-redundant euchromatic sequence of B. rapa and compared this sequence to the Arabidopsis genome to investigate chromosomal relationships, macrosynteny blocks, and microsynteny within blocks. The triplicated B. rapa genome contains only approximately twice the number of genes as in Arabidopsis because of genome shrinkage. Genome comparisons suggest that B. rapa has a distinct organization of ancestral genome blocks as a result of recent whole genome triplication followed by a unique diploidization process. A lack of the most recent whole genome duplication (3R) event in the B. rapa genome, atypical of other Brassica genomes, may account for the emergence of B. rapa from the Brassica progenitor around 8 million years ago. Conclusions This work demonstrates the potential of using comparative tiling sequencing for genome analysis of crop species. Based on a comparative analysis of the B. rapa sequences and the Arabidopsis genome, it appears that polyploidy and chromosomal diploidization are ongoing processes that collectively stabilize the B. rapa genome and facilitate its evolution. PMID:19821981

  17. Software tool for the analysis and visualization of whole genome alignments

    Energy Science and Technology Software Center (ESTSC)

    2011-08-01

    GenomeVISTA is a tool which performs and displays pairwise and multiple whole genome DNA alignments. The tools provides a graphical user interface by which users can navigate alignments and multiple levels of resolution and get imformation about individual aligned regions. Users can load their own sequences into GenomeVISTA or view pre-computed alignments for genomes in the VISTA database.

  18. Detection of Indel Mutations in Drosophila by High-Resolution Melt Analysis (HRMA).

    PubMed

    Housden, Benjamin E; Perrimon, Norbert

    2016-01-01

    Although CRISPR technology allows specific genome alterations to be created with relative ease, detection of these events can be problematic. For example, CRISPR-induced double-strand breaks are often repaired imprecisely to generate unpredictable short indel mutations. Detection of these events requires the use of molecular screening techniques such as endonuclease assays, restriction profiling, or high-resolution melt analysis (HRMA). Here, we provide detailed protocols for HRMA-based mutation screening in Drosophila and analysis of the resulting data using the online tool HRMAnalyzer. PMID:27587781

  19. Genomic Analysis of Broad-Host-Range Enterobacteriophage Av-05

    PubMed Central

    Amarillas, Luis; López-Cuevas, Osvaldo; León-Félix, Josefina; Castro-del Campo, Nohelia; Gerba, Charles P.

    2015-01-01

    Lytic bacteriophages have reemerged as an alternative for the control of pathogenic bacteria. However, the effective use of phage relies on appropriate genomic characterization. In this study, we report the genome of bacteriophage Av-05 and its sequence analysis, which has strong lytic activity against Escherichia coli O157:H7 strains and several Salmonella serotypes. The analysis revealed that the phage Av-05 genome consists of 120,938 bp, containing 209 putative open reading frames (ORFs) and 9 tRNAs. PMID:26067947

  20. Structural and functional genome analysis using extended chromatin

    SciTech Connect

    Heaf, T.; Ward, D.C.

    1994-09-01

    Highly extended linear chromatin fibers (ECFs) produced by detergent and high-salt lysis and stretching of nuclear chromatin across the surface of a glass slide can by hybridized over physical distances of at least several Mb. This allows long-range FISH analysis of the human genome with excellent DNA resolution (<10 kb/{mu}m). The insertion of Alu elements which are more than 50-fold underrepresented in centromeres can be seen within and near long tandem arrays of alpha-satellite DNA. Long tracts of trinucleotide repeats, i.e. (CCA){sub n}, can be localized within larger genomic regions. The combined application of BrdU incorporation and ECFs allows one to study the spatio-temporal distribution of DNA replication sites in finer detail. DNA synthesis occurs at multiple discrete sites within Mb arrays of alpha-satellite. Replicating DNA is tightly associated with the nuclear matrix and highly resistant to stretching out, while ECFs containing newly replicated DNA are easily released. Asynchrony in replication timing is accompanied by differences in condensation of homologous DNA segments. Extended chromatin reveals differential packaging of active and inactive DNA. Upon transcriptional inactivation by AMD, the normally compact rRNA genes become much more susceptible to decondensation procedures. By extending the chromatin from pachytene spermatocytes, meiotic pairing and genetic exchange between homologs can be visualized directly. Histone depletion by high salt and detergent produces loop chromatin surrounding the nuclear matrix in a halo-like fashion. DNA halos can be used to map nuclear matrix attachment sites in somatic cells and in mature sperm. Alpha-satellite containing DNA loops appear to be attached to the sperm-cell matrix by CENP-B boxes, short 17 bp sequences found in a subset of alpha satellite monomers. Sperm telomeres almost always appear as hybridization doublets, suggesting the presence of already replicated chromosome ends.

  1. A Comparative Genomic Analysis of Diverse Clonal Types of Enterotoxigenic Escherichia coli Reveals Pathovar-Specific Conservation▿ †

    PubMed Central

    Sahl, Jason W.; Steinsland, Hans; Redman, Julia C.; Angiuoli, Samuel V.; Nataro, James P.; Sommerfelt, Halvor; Rasko, David A.

    2011-01-01

    Enterotoxigenic Escherichia coli (ETEC) is a major cause of diarrheal illness in children less than 5 years of age in low- and middle-income nations, whereas it is an emerging enteric pathogen in industrialized nations. Despite being an important cause of diarrhea, little is known about the genomic composition of ETEC. To address this, we sequenced the genomes of five ETEC isolates obtained from children in Guinea-Bissau with diarrhea. These five isolates represent distinct and globally dominant ETEC clonal groups. Comparative genomic analyses utilizing a gene-independent whole-genome alignment method demonstrated that sequenced ETEC strains share approximately 2.7 million bases of genomic sequence. Phylogenetic analysis of this “core genome” confirmed the diverse history of the ETEC pathovar and provides a finer resolution of the E. coli relationships than multilocus sequence typing. No identified genomic regions were conserved exclusively in all ETEC genomes; however, we identified more genomic content conserved among ETEC genomes than among non-ETEC E. coli genomes, suggesting that ETEC isolates share a genomic core. Comparisons of known virulence and of surface-exposed and colonization factor genes across all sequenced ETEC genomes not only identified variability but also indicated that some antigens are restricted to the ETEC pathovar. Overall, the generation of these five genome sequences, in addition to the two previously generated ETEC genomes, highlights the genomic diversity of ETEC. These studies increase our understanding of ETEC evolution, as well as provide insight into virulence factors and conserved proteins, which may be targets for vaccine development. PMID:21078854

  2. Mycobacterial species as case-study of comparative genome analysis.

    PubMed

    Zakham, F; Belayachi, L; Ussery, D; Akrim, M; Benjouad, A; El Aouad, R; Ennaji, M M

    2011-01-01

    The genus Mycobacterium represents more than 120 species including important pathogens of human and cause major public health problems and illnesses. Further, with more than 100 genome sequences from this genus, comparative genome analysis can provide new insights for better understanding the evolutionary events of these species and improving drugs, vaccines, and diagnostics tools for controlling Mycobacterial diseases. In this present study we aim to outline a comparative genome analysis of fourteen Mycobacterial genomes: M. avium subsp. paratuberculosis K—10, M. bovis AF2122/97, M. bovis BCG str. Pasteur 1173P2, M. leprae Br4923, M. marinum M, M. sp. KMS, M. sp. MCS, M. tuberculosis CDC1551, M. tuberculosis F11, M. tuberculosis H37Ra, M. tuberculosis H37Rv, M. tuberculosis KZN 1435 , M. ulcerans Agy99,and M. vanbaalenii PYR—1, For this purpose a comparison has been done based on their length of genomes, GC content, number of genes in different data bases (Genbank, Refseq, and Prodigal). The BLAST matrix of these genomes has been figured to give a lot of information about the similarity between species in a simple scheme. As a result of multiple genome analysis, the pan and core genome have been defined for twelve Mycobacterial species. We have also introduced the genome atlas of the reference strain M. tuberculosis H37Rv which can give a good overview of this genome. And for examining the phylogenetic relationships among these bacteria, a phylogenic tree has been constructed from 16S rRNA gene for tuberculosis and non tuberculosis Mycobacteria to understand the evolutionary events of these species. PMID:21396338

  3. Comparative analysis of rosaceous genomes and the reconstruction of a putative ancestral genome for the family

    PubMed Central

    2011-01-01

    Background Comparative genome mapping studies in Rosaceae have been conducted until now by aligning genetic maps within the same genus, or closely related genera and using a limited number of common markers. The growing body of genomics resources and sequence data for both Prunus and Fragaria permits detailed comparisons between these genera and the recently released Malus × domestica genome sequence. Results We generated a comparative analysis using 806 molecular markers that are anchored genetically to the Prunus and/or Fragaria reference maps, and physically to the Malus genome sequence. Markers in common for Malus and Prunus, and Malus and Fragaria, respectively were 784 and 148. The correspondence between marker positions was high and conserved syntenic blocks were identified among the three genera in the Rosaceae. We reconstructed a proposed ancestral genome for the Rosaceae. Conclusions A genome containing nine chromosomes is the most likely candidate for the ancestral Rosaceae progenitor. The number of chromosomal translocations observed between the three genera investigated was low. However, the number of inversions identified among Malus and Prunus was much higher than any reported genome comparisons in plants, suggesting that small inversions have played an important role in the evolution of these two genera or of the Rosaceae. PMID:21226921

  4. Resequencing of the common marmoset genome improves genome assemblies and gene-coding sequence analysis

    PubMed Central

    Sato, Kengo; Kuroki, Yoko; Kumita, Wakako; Fujiyama, Asao; Toyoda, Atsushi; Kawai, Jun; Iriki, Atsushi; Sasaki, Erika; Okano, Hideyuki; Sakakibara, Yasubumi

    2015-01-01

    The first draft of the common marmoset (Callithrix jacchus) genome was published by the Marmoset Genome Sequencing and Analysis Consortium. The draft was based on whole-genome shotgun sequencing, and the current assembly version is Callithrix_jacches-3.2.1, but there still exist 187,214 undetermined gap regions and supercontigs and relatively short contigs that are unmapped to chromosomes in the draft genome. We performed resequencing and assembly of the genome of common marmoset by deep sequencing with high-throughput sequencing technology. Several different sequence runs using Illumina sequencing platforms were executed, and 181 Gbp of high-quality bases including mate-pairs with long insert lengths of 3, 8, 20, and 40 Kbp were obtained, that is, approximately 60× coverage. The resequencing significantly improved the MGSAC draft genome sequence. The N50 of the contigs, which is a statistical measure used to evaluate assembly quality, doubled. As a result, 51% of the contigs (total length: 299 Mbp) that were unmapped to chromosomes in the MGSAC draft were merged with chromosomal contigs, and the improved genome sequence helped to detect 5,288 new genes that are homologous to human cDNAs and the gaps in 5,187 transcripts of the Ensembl gene annotations were completely filled. PMID:26586576

  5. Recurrent chromosomal aberrations in intravenous leiomyomatosis of the uterus: high-resolution array comparative genomic hybridization study.

    PubMed

    Buza, Natalia; Xu, Fang; Wu, Weiqing; Carr, Ryan J; Li, Peining; Hui, Pei

    2014-09-01

    Uterine intravenous leiomyomatosis (IVL) is a distinct smooth muscle neoplasm with a potential of clinical aggressiveness due to its ability to extend into intrauterine and extrauterine vasculature. In this study, chromosomal alterations analyzed by oligonucleotide array comparative genomic hybridization were performed in 9 cases of IVL. The analysis was informative in all cases with multiple copy number losses and/or gains observed in each tumor. The most frequent recurrent loss of 22q12.3-q13.1 was observed in 6 tumors (66.7%), followed by losses of 22q11.23-q13.31, 1p36.13-p33, 2p25.3-p23.3, and 2q24.2-q32.2 and gains of 6p22.2, 2q37.3 and 10q22.2-q22.3, in decreasing order of frequency. Copy number variants were identified at 14q11.2, 15q11.1-q11.2, and 15q26.2. Genes mapping to the regions of loss include CHEK2, EWS, NF2, PDGFB, and MAP3K7IP1 on chromosome 22q, HEI10 on chromosome 14q, and succinate dehydrogenase subunit B, E2F2, ARID1A KPNA6, EIF3S2 , PTCH2, and PIK3R3 on chromosome 1p. Regional losses on chromosomes 22q and 1p and gains on chromosomes 12q showed overlaps with those previously observed in uterine leiomyosarcomas. In addition, presence of multiple chromosomal aberrations implies a higher level of genetic instability. Follow-up polymerase chain reaction (PCR) sequencing analysis of MED12 gene revealed absence of G> A transition at nucleotides c.130 or c.131 in all 9 cases, a frequent mutation found in uterine leiomyoma and its variants. In conclusion, this is the first report of high-resolution, genome-wide investigation of IVL by oligonucleotide array comparative genomic hybridization. The presence of high frequencies of recurrent regional loss involving several chromosomes is an important finding and likely related to the pathogenesis of the disease. PMID:25033729

  6. Bioinformatic tools for using whole genome sequencing as a rapid high resolution diagnostic typing tool when tracing bioterror organisms in the food and feed chain.

    PubMed

    Segerman, Bo; De Medici, Dario; Ehling Schulz, Monika; Fach, Patrick; Fenicia, Lucia; Fricker, Martina; Wielinga, Peter; Van Rotterdam, Bart; Knutsson, Rickard

    2011-03-01

    The rapid technological development in the field of parallel sequencing offers new opportunities when tracing and tracking microorganisms in the food and feed chain. If a bioterror organism is deliberately spread it is of crucial importance to get as much information as possible regarding the strain as fast as possible to aid the decision process and select suitable controls, tracing and tracking tools. A lot of efforts have been made to sequence multiple strains of potential bioterror organisms so there is a relatively large set of reference genomes available. This study is focused on how to use parallel sequencing for rapid phylogenomic analysis and screen for genetic modifications. A bioinformatic methodology has been developed to rapidly analyze sequence data with minimal post-processing. Instead of assembling the genome, defining genes, defining orthologous relations and calculating distances, the present method can achieve a similar high resolution directly from the raw sequence data. The method defines orthologous sequence reads instead of orthologous genes and the average similarity of the core genome (ASC) is calculated. The sequence reads from the core and from the non-conserved genomic regions can also be separated for further analysis. Finally, the comparison algorithm is used to visualize the phylogenomic diversity of the bacterial bioterror organisms Bacillus anthracis and Clostridium botulinum using heat plot diagrams. PMID:20826036

  7. DNA sequence copy number analysis by Comparative Genomic Hybridization (CGH)

    SciTech Connect

    Pinkel, D.; Kallioniemi, A.; Kallioniemi, O.; Waldman, F.; Sudar, D.; Gray, I. ); Rutovitz, D.; Piper, I. )

    1993-01-01

    Comparative Genomic Hybridization (CGH) uses the kinetics of in situ hybridization to compare the copy numbers of different DNA sequences within the same genome and the copy numbers of the same sequences among different genomes. In a typical application genomic DNA from a tumor and from normal cells are differentially labeled and simultaneously hybridized to normal metaphase chromosomes, and detected with different fluorochromes. Properly registered images of each fluorochrome are obtained using a microscope equipped with multi-band filters and a CCD camera. Digital image analysis permits measurement of intensity ratio profiles along each of the target chromosomes. Studies of cells with known aberrations indicate that the intensity ratio at each position is proportional to the ratio of the copy numbers of the sequences that bind there in the tumor and normal genomes. Analytical challenges posed by the need to efficiently obtain copy number karyotypes are discussed.

  8. Genome sequencing and analysis of the model grass Brachypodium distachyon.

    PubMed

    2010-02-11

    Three subfamilies of grasses, the Ehrhartoideae, Panicoideae and Pooideae, provide the bulk of human nutrition and are poised to become major sources of renewable energy. Here we describe the genome sequence of the wild grass Brachypodium distachyon (Brachypodium), which is, to our knowledge, the first member of the Pooideae subfamily to be sequenced. Comparison of the Brachypodium, rice and sorghum genomes shows a precise history of genome evolution across a broad diversity of the grasses, and establishes a template for analysis of the large genomes of economically important pooid grasses such as wheat. The high-quality genome sequence, coupled with ease of cultivation and transformation, small size and rapid life cycle, will help Brachypodium reach its potential as an important model system for developing new energy and food crops. PMID:20148030

  9. Genome sequencing and analysis of the model grass Brachypodium distachyon

    SciTech Connect

    Yang, Xiaohan; Kalluri, Udaya C; Tuskan, Gerald A

    2010-01-01

    Three subfamilies of grasses, the Ehrhartoideae, Panicoideae and Pooideae, provide the bulk of human nutrition and are poised to become major sources of renewable energy. Here we describe the genome sequence of the wild grass Brachypodium distachyon (Brachypodium), which is, to our knowledge, the first member of the Pooideae subfamily to be sequenced. Comparison of the Brachypodium, rice and sorghum genomes shows a precise history of genome evolution across a broad diversity of the grasses, and establishes a template for analysis of the large genomes of economically important pooid grasses such as wheat. The high-quality genome sequence, coupled with ease of cultivation and transformation, small size and rapid life cycle, will help Brachypodium reach its potential as an important model system for developing new energy and food crops.

  10. Functional genomic analysis of chromosomal aberrations in a compendium of 8000 cancer genomes

    PubMed Central

    Kim, Tae-Min; Xi, Ruibin; Luquette, Lovelace J.; Park, Richard W.; Johnson, Mark D.; Park, Peter J.

    2013-01-01

    A large database of copy number profiles from cancer genomes can facilitate the identification of recurrent chromosomal alterations that often contain key cancer-related genes. It can also be used to explore low-prevalence genomic events such as chromothripsis. In this study, we report an analysis of 8227 human cancer copy number profiles obtained from 107 array comparative genomic hybridization (CGH) studies. Our analysis reveals similarity of chromosomal arm-level alterations among developmentally related tumor types as well as a number of co-occurring pairs of arm-level alterations. Recurrent (“pan-lineage”) focal alterations identified across diverse tumor types show an enrichment of known cancer-related genes and genes with relevant functions in cancer-associated phenotypes (e.g., kinase and cell cycle). Tumor type-specific (“lineage-restricted”) alterations and their enriched functional categories were also identified. Furthermore, we developed an algorithm for detecting regions in which the copy number oscillates rapidly between fixed levels, indicative of chromothripsis. We observed these massive genomic rearrangements in 1%–2% of the samples with variable tumor type-specific incidence rates. Taken together, our comprehensive view of copy number alterations provides a framework for understanding the functional significance of various genomic alterations in cancer genomes. PMID:23132910

  11. A Comprehensive, High-Resolution Genomic Transcript Map of Human Skeletal Muscle

    PubMed Central

    Bortoluzzi, Stefania; Rampoldi, Luca; Simionati, Barbara; Zimbello, Rosanna; Barbon, Alessandro; d’Alessi, Fabio; Tiso, Natascia; Pallavicini, Alberto; Toppo, Stefano; Cannata, Nicola; Valle, Giorgio; Lanfranchi, Gerolamo; Danieli, Gian Antonio

    1998-01-01

    We present the Human Muscle Gene Map (HMGM), the first comprehensive and updated high-resolution expression map of human skeletal muscle. The 1078 entries of the map were obtained by merging data retrieved from UniGene with the RH mapping information on 46 novel muscle transcripts, which showed no similarity to any known sequence. In the map, distances are expressed in megabase pairs. About one-quarter of the map entries represents putative novel genes. Genes known to be specifically expressed in muscle account for <4% of the total. The genomic distribution of the map entries confirmed the previous finding that muscle genes are selectively concentrated in chromosomes 17, 19, and X. Five chromosomal regions are suspected to have a significant excess of muscle genes. Present data support the hypothesis that the biochemical and functional properties of differentiated muscle cells may result from the transcription of a very limited number of muscle-specific genes along with the activity of a large number of genes, shared with other tissues, but showing different levels of expression in muscle. [The sequence data described in this paper have been submitted to the EMBL data library under accession nos. F23198–F23242.] PMID:9724327

  12. A high-resolution map of copy number variation in the cattle genome

    Technology Transfer Automated Retrieval System (TEKTRAN)

    We conducted a systematic study of the cattle copy number variation (CNV) using array comparative genomic hybridization (array CGH). Oligonucleotide CGH arrays were designed and fabricated to provide a genome-wide coverage with an average interval of 6 kb using the Bta3.1 genome assembly. Dual-lab...

  13. Genome-wide and fine resolution association studies of 14 agronomic traits in rice land races

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Here we report genome sequences of 517 diverse rice land races and the identification of ~3.6 million single nucleotide polymorphisms. A high-density haplotype map of rice genome was constructed using a highly accurate imputation method developed for next-generation sequencing data. Initial genome-w...

  14. MIPS: analysis and annotation of proteins from whole genomes.

    PubMed

    Mewes, H W; Amid, C; Arnold, R; Frishman, D; Güldener, U; Mannhaupt, G; Münsterkötter, M; Pagel, P; Strack, N; Stümpflen, V; Warfsmann, J; Ruepp, A

    2004-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein-protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de). PMID:14681354

  15. Comparative genome analysis of Solanum lycopersicum and Solanum tuberosum

    PubMed Central

    Lall, Rohit; Thomas, George; Singh, Satendra; Singh, Archana; Wadhwa, Gulshan

    2013-01-01

    Solanum lycopersicum and Solanum tuberosum are agriculturally important crop species as they are rich sources of starch, protein, antioxidants, lycopene, beta-carotene, vitamin C, and fiber. The genomes of S. lycopersicum and S. tuberosum are currently available. However the linear strings of nucleotides that together comprise a genome sequence are of limited significance by themselves. Computational and bioinformatics approaches can be used to exploit the genomes for fundamental research for improving their varieties. The comparative genome analysis, Pfam analysis of predicted reviewed paralogous proteins was performed. It was found that S. lycopersicum proteins belong to more families, domains and clans in comparison with S. tuberosum. It was also found that mostly intergenic regions are conserved in two genomes followed by exons, intron and UTR. This can be exploited to predict regions between genomes that are similar to each other and to study the evolutionary relationship between two genomes, leading towards the development of disease resistance, stress tolerance and improved varieties of tomato. PMID:24307771

  16. Integrated Genomic Analysis of Pancreatic Ductal Adenocarcinomas Reveals Genomic Rearrangement Events as Significant Drivers of Disease.

    PubMed

    Murphy, Stephen J; Hart, Steven N; Halling, Geoffrey C; Johnson, Sarah H; Smadbeck, James B; Drucker, Travis; Lima, Joema Felipe; Rohakhtar, Fariborz Rakhshan; Harris, Faye R; Kosari, Farhad; Subramanian, Subbaya; Petersen, Gloria M; Wiltshire, Timothy D; Kipp, Benjamin R; Truty, Mark J; McWilliams, Robert R; Couch, Fergus J; Vasmatzis, George

    2016-02-01

    Many somatic mutations have been detected in pancreatic ductal adenocarcinoma (PDAC), leading to the identification of some key drivers of disease progression, but the involvement of large genomic rearrangements has often been overlooked. In this study, we performed mate pair sequencing (MPseq) on genomic DNA from 24 PDAC tumors, including 15 laser-captured microdissected PDAC and 9 patient-derived xenografts, to identify genome-wide rearrangements. Large genomic rearrangements with intragenic breakpoints altering key regulatory genes involved in PDAC progression were detected in all tumors. SMAD4, ZNF521, and FHIT were among the most frequently hit genes. Conversely, commonly reported genes with copy number gains, including MYC and GATA6, were frequently observed in the absence of direct intragenic breakpoints, suggesting a requirement for sustaining oncogenic function during PDAC progression. Integration of data from MPseq, exome sequencing, and transcriptome analysis of primary PDAC cases identified limited overlap in genes affected by both rearrangements and point mutations. However, significant overlap was observed in major PDAC-associated signaling pathways, with all PDAC exhibiting reduced SMAD4 expression, reduced SMAD-dependent TGFβ signaling, and increased WNT and Hedgehog signaling. The frequent loss of SMAD4 and FHIT due to genomic rearrangements strongly implicates these genes as key drivers of PDAC, thus highlighting the strengths of an integrated genomic and transcriptomic approach for identifying mechanisms underlying disease initiation and progression. PMID:26676757

  17. Toward a Comprehensive Genomic Analysis of Cancer - TCGA

    Cancer.gov

    The National Cancer Institute (NCI) and National Human Genome Research Institute (NHGRI) convened a "Toward a Comprehensive Genomic Analysis of Cancer" workshop in Washington, D.C. This workshop brought together physicians, basic scientists and other members of the U.S. and international cancer communities to assist in outlining the most effective strategies for the development of a successful project. Information about this workshop is reported in the Executive Summary.

  18. Meta-analysis of genome-wide linkage scans for renal function traits

    PubMed Central

    Rao, Madhumathi; Mottl, Amy K.; Cole, Shelley A.; Umans, Jason G.; Freedman, Barry I.; Bowden, Donald W.; Langefeld, Carl D.; Fox, Caroline S.; Yang, Qiong; Cupples, Adrienne; Iyengar, Sudha K.; Hunt, Steven C.

    2012-01-01

    Background. Several genome scans have explored the linkage of chronic kidney disease phenotypes to chromosomic regions with disparate results. Genome scan meta-analysis (GSMA) is a quantitative method to synthesize linkage results from independent studies and assess their concordance. Methods. We searched PubMed to identify genome linkage analyses of renal function traits in humans, such as estimated glomerular filtration rate (GFR), albuminuria, serum creatinine concentration and creatinine clearance. We contacted authors for numerical data and extracted information from individual studies. We applied the GSMA nonparametric approach to combine results across 14 linkage studies for GFR, 11 linkage studies for albumin creatinine ratio, 11 linkage studies for serum creatinine and 4 linkage studies for creatinine clearance. Results. No chromosomal region reached genome-wide statistical significance in the main analysis which included all scans under each phenotype; however, regions on Chromosomes 7, 10 and 16 reached suggestive significance for linkage to two or more phenotypes. Subgroup analyses by disease status or ethnicity did not yield additional information. Conclusions. While heterogeneity across populations, methodologies and study designs likely explain this lack of agreement, it is possible that linkage scan methodologies lack the resolution for investigating complex traits. Combining family-based linkage studies with genome-wide association studies may be a powerful approach to detect private mutations contributing to complex renal phenotypes. PMID:21622988

  19. Genomic analysis of cichlid fish 'natural mutants'.

    PubMed

    Kuraku, Shigehiro; Meyer, Axel

    2008-12-01

    In the lakes of East Africa, cichlid fishes have formed adaptive radiations that are each composed of hundreds of endemic, morphologically stunningly diverse, but genetically extremely similar species. In the past 20 years, it became clear that their extreme phenotypic diversity arose within very short time spans, and that phenotypically radically different species are exceptionally similar genetically; hence, they could be considered to be 'natural mutants'. Many species can be hybridized and, therefore, provide a unique opportunity to study the genetic underpinnings of phenotypic diversification. Comparative large-scale genomic analyses are beginning to unravel the patterns and processes that led to the formation of the cichlid species flocks. Cichlids are an emerging evolutionary genomic model system for fundamental questions on the origin of phenotypic diversity. PMID:19095433

  20. ChopSticks: High-resolution analysis of homozygous deletions by exploiting concordant read pairs

    PubMed Central

    2012-01-01

    Background Structural variations (SVs) in genomes are commonly observed even in healthy individuals and play key roles in biological functions. To understand their functional impact or to infer molecular mechanisms of SVs, they have to be characterized with the maximum resolution. However, high-resolution analysis is a difficult task because it requires investigation of the complex structures involved in an enormous number of alignments of next-generation sequencing (NGS) reads and genome sequences that contain errors. Results We propose a new method called ChopSticks that improves the resolution of SV detection for homozygous deletions even when the depth of coverage is low. Conventional methods based on read pairs use only discordant pairs to localize the positions of deletions, where a discordant pair is a read pair whose alignment has an aberrant strand or distance. In contrast, our method exploits concordant reads as well. We theoretically proved that when the depth of coverage approaches zero or infinity, the expected resolution of our method is asymptotically equal to that of methods based only on discordant pairs under double coverage. To confirm the effectiveness of ChopSticks, we conducted computational experiments against both simulated NGS reads and real NGS sequences. The resolution of deletion calls by other methods was significantly improved, thus demonstrating the usefulness of ChopSticks. Conclusions ChopSticks can generate high-resolution deletion calls of homozygous deletions using information independent of other methods, and it is therefore useful to examine the functional impact of SVs or to infer SV generation mechanisms. PMID:23110596

  1. Private genome analysis through homomorphic encryption

    PubMed Central

    2015-01-01

    Background The rapid development of genome sequencing technology allows researchers to access large genome datasets. However, outsourcing the data processing o the cloud poses high risks for personal privacy. The aim of this paper is to give a practical solution for this problem using homomorphic encryption. In our approach, all the computations can be performed in an untrusted cloud without requiring the decryption key or any interaction with the data owner, which preserves the privacy of genome data. Methods We present evaluation algorithms for secure computation of the minor allele frequencies and χ2 statistic in a genome-wide association studies setting. We also describe how to privately compute the Hamming distance and approximate Edit distance between encrypted DNA sequences. Finally, we compare performance details of using two practical homomorphic encryption schemes - the BGV scheme by Gentry, Halevi and Smart and the YASHE scheme by Bos, Lauter, Loftus and Naehrig. Results The approach with the YASHE scheme analyzes data from 400 people within about 2 seconds and picks a variant associated with disease from 311 spots. For another task, using the BGV scheme, it took about 65 seconds to securely compute the approximate Edit distance for DNA sequences of size 5K and figure out the differences between them. Conclusions The performance numbers for BGV are better than YASHE when homomorphically evaluating deep circuits (like the Hamming distance algorithm or approximate Edit distance algorithm). On the other hand, it is more efficient to use the YASHE scheme for a low-degree computation, such as minor allele frequencies or χ2 test statistic in a case-control study. PMID:26733152

  2. Comparative genome analysis of Pseudomonas genomes including Populus-associated isolates

    SciTech Connect

    Jun, Se Ran; Wassenaar, Trudy; Nookaew, Intawat; Hauser, Loren John; Wanchai, Visanu; Land, Miriam L.; Timm, Collin M.; Lu, Tse-Yuan S.; Schadt, Christopher Warren; Doktycz, Mitchel John; Pelletier, Dale A; Ussery, David W

    2016-01-01

    The Pseudomonas genus contains a metabolically versatile group of organisms that are known to occupy numerous ecological niches including the rhizosphere and endosphere of many plants influencing phylogenetic diversity and heterogeneity. In this study, comparative genome analysis was performed on over one thousand Pseudomonas genomes, including 21 Pseudomonas strains isolated from the roots of native Populus deltoides. Based on average amino acid identity, genomic clusters were identified within the Pseudomonas genus, which showed agreements with clades by NCBI and cliques by IMG. The P. fluorescens group was organized into 20 distinct genomic clusters, representing enormous diversity and heterogeneity. The species P. aeruginosa showed clear distinction in their genomic relatedness compared to other Pseudomonas species groups based on the pan and core genome analysis. The 19 isolates of our 21 Populus-associated isolates formed three distinct subgroups within the P. fluorescens major group, supported by pathway profiles analysis, while two isolates were more closely related to P. chlororaphis and P. putida. The specific genes to Populus-associated subgroups were identified where genes specific to subgroup 1 include several sensory systems such as proteins which act in two-component signal transduction, a TonB-dependent receptor, and a phosphorelay sensor; specific genes to subgroup 2 contain unique hypothetical genes; and genes specific to subgroup 3 organisms have a different hydrolase activity. IMPORTANCE The comparative genome analyses of the genus Pseudomonas that included Populus-associated isolates resulted in novel insights into high diversity of Pseudomonas. Consistent and robust genomic clusters with phylogenetic homogeneity were identified, which resolved species-clades that are not clearly defined by 16S rRNA gene sequence analysis alone. The genomic clusters may be reflective of distinct ecological niches to which the organisms have adapted, but this

  3. Comparative genome analysis of Pseudomonas genomes including Populus-associated isolates

    DOE PAGESBeta

    Jun, Se Ran; Wassenaar, Trudy; Nookaew, Intawat; Hauser, Loren John; Wanchai, Visanu; Land, Miriam L.; Timm, Collin M.; Lu, Tse-Yuan S.; Schadt, Christopher Warren; Doktycz, Mitchel John; et al

    2016-01-01

    The Pseudomonas genus contains a metabolically versatile group of organisms that are known to occupy numerous ecological niches including the rhizosphere and endosphere of many plants influencing phylogenetic diversity and heterogeneity. In this study, comparative genome analysis was performed on over one thousand Pseudomonas genomes, including 21 Pseudomonas strains isolated from the roots of native Populus deltoides. Based on average amino acid identity, genomic clusters were identified within the Pseudomonas genus, which showed agreements with clades by NCBI and cliques by IMG. The P. fluorescens group was organized into 20 distinct genomic clusters, representing enormous diversity and heterogeneity. The speciesmore » P. aeruginosa showed clear distinction in their genomic relatedness compared to other Pseudomonas species groups based on the pan and core genome analysis. The 19 isolates of our 21 Populus-associated isolates formed three distinct subgroups within the P. fluorescens major group, supported by pathway profiles analysis, while two isolates were more closely related to P. chlororaphis and P. putida. The specific genes to Populus-associated subgroups were identified where genes specific to subgroup 1 include several sensory systems such as proteins which act in two-component signal transduction, a TonB-dependent receptor, and a phosphorelay sensor; specific genes to subgroup 2 contain unique hypothetical genes; and genes specific to subgroup 3 organisms have a different hydrolase activity. IMPORTANCE The comparative genome analyses of the genus Pseudomonas that included Populus-associated isolates resulted in novel insights into high diversity of Pseudomonas. Consistent and robust genomic clusters with phylogenetic homogeneity were identified, which resolved species-clades that are not clearly defined by 16S rRNA gene sequence analysis alone. The genomic clusters may be reflective of distinct ecological niches to which the organisms have adapted, but

  4. The Arabidopsis TAC Position Viewer: a high-resolution map of transformation-competent artificial chromosome (TAC) clones aligned with the Arabidopsis thaliana Columbia-0 genome.

    PubMed

    Hirose, Yoshitsugu; Suda, Kunihiro; Liu, Yao-Guang; Sato, Shusei; Nakamura, Yukino; Yokoyama, Koji; Yamamoto, Naoki; Hanano, Shigeru; Takita, Eiji; Sakurai, Nozomu; Suzuki, Hideyuki; Nakamura, Yasukazu; Kaneko, Takakazu; Yano, Kentaro; Tabata, Satoshi; Shibata, Daisuke

    2015-09-01

    We present a high-resolution map of genomic transformation-competent artificial chromosome (TAC) clones extending over all Arabidopsis thaliana (Arabidopsis) chromosomes. The Arabidopsis genomic TAC clones have been valuable genetic tools. Previously, we constructed an Arabidopsis genomic TAC library consisting of more than 10,000 TAC clones harboring large genomic DNA fragments extending over the whole Arabidopsis genome. Here, we determined 13,577 end sequences from 6987 Arabidopsis TAC clones and mapped 5937 TAC clones to precise locations, covering approximately 90% of the Arabidopsis chromosomes. We present the large-scale data set of TAC clones with high-resolution mapping information as a Java application tool, the Arabidopsis TAC Position Viewer, which provides ready-to-go transformable genomic DNA clones corresponding to certain loci on Arabidopsis chromosomes. The TAC clone resources will accelerate genomic DNA cloning, positional walking, complementation of mutants and DNA transformation for heterologous gene expression. PMID:26227242

  5. Mechanisms of assembly and genome packaging in an RNA virus revealed by high-resolution cryo-EM

    PubMed Central

    Hesketh, Emma L.; Meshcheriakova, Yulia; Dent, Kyle C.; Saxena, Pooja; Thompson, Rebecca F.; Cockburn, Joseph J.; Lomonossoff, George P.; Ranson, Neil A.

    2015-01-01

    Cowpea mosaic virus is a plant-infecting member of the Picornavirales and is of major interest in the development of biotechnology applications. Despite the availability of >100 crystal structures of Picornavirales capsids, relatively little is known about the mechanisms of capsid assembly and genome encapsidation. Here we have determined cryo-electron microscopy reconstructions for the wild-type virus and an empty virus-like particle, to 3.4 Å and 3.0 Å resolution, respectively, and built de novo atomic models of their capsids. These new structures reveal the C-terminal region of the small coat protein subunit, which is essential for virus assembly and which was missing from previously determined crystal structures, as well as residues that bind to the viral genome. These observations allow us to develop a new model for genome encapsidation and capsid assembly. PMID:26657148

  6. Mechanisms of assembly and genome packaging in an RNA virus revealed by high-resolution cryo-EM.

    PubMed

    Hesketh, Emma L; Meshcheriakova, Yulia; Dent, Kyle C; Saxena, Pooja; Thompson, Rebecca F; Cockburn, Joseph J; Lomonossoff, George P; Ranson, Neil A

    2015-01-01

    Cowpea mosaic virus is a plant-infecting member of the Picornavirales and is of major interest in the development of biotechnology applications. Despite the availability of >100 crystal structures of Picornavirales capsids, relatively little is known about the mechanisms of capsid assembly and genome encapsidation. Here we have determined cryo-electron microscopy reconstructions for the wild-type virus and an empty virus-like particle, to 3.4 Å and 3.0 Å resolution, respectively, and built de novo atomic models of their capsids. These new structures reveal the C-terminal region of the small coat protein subunit, which is essential for virus assembly and which was missing from previously determined crystal structures, as well as residues that bind to the viral genome. These observations allow us to develop a new model for genome encapsidation and capsid assembly. PMID:26657148

  7. The Complete Genome Sequence and Comparative Genome Analysis of the High Pathogenicity Yersinia enterocolitica Strain 8081

    PubMed Central

    Thomson, Nicholas R; Howard, Sarah; Wren, Brendan W; Holden, Matthew T. G; Crossman, Lisa; Challis, Gregory L; Churcher, Carol; Mungall, Karen; Brooks, Karen; Chillingworth, Tracey; Feltwell, Theresa; Abdellah, Zahra; Hauser, Heidi; Jagels, Kay; Maddison, Mark; Moule, Sharon; Sanders, Mandy; Whitehead, Sally; Quail, Michael A; Dougan, Gordon; Parkhill, Julian; Prentice, Michael B

    2006-01-01

    The human enteropathogen, Yersinia enterocolitica, is a significant link in the range of Yersinia pathologies extending from mild gastroenteritis to bubonic plague. Comparison at the genomic level is a key step in our understanding of the genetic basis for this pathogenicity spectrum. Here we report the genome of Y. enterocolitica strain 8081 (serotype 0:8; biotype 1B) and extensive microarray data relating to the genetic diversity of the Y. enterocolitica species. Our analysis reveals that the genome of Y. enterocolitica strain 8081 is a patchwork of horizontally acquired genetic loci, including a plasticity zone of 199 kb containing an extraordinarily high density of virulence genes. Microarray analysis has provided insights into species-specific Y. enterocolitica gene functions and the intraspecies differences between the high, low, and nonpathogenic Y. enterocolitica biotypes. Through comparative genome sequence analysis we provide new information on the evolution of the Yersinia. We identify numerous loci that represent ancestral clusters of genes potentially important in enteric survival and pathogenesis, which have been lost or are in the process of being lost, in the other sequenced Yersinia lineages. Our analysis also highlights large metabolic operons in Y. enterocolitica that are absent in the related enteropathogen, Yersinia pseudotuberculosis, indicating major differences in niche and nutrients used within the mammalian gut. These include clusters directing, the production of hydrogenases, tetrathionate respiration, cobalamin synthesis, and propanediol utilisation. Along with ancestral gene clusters, the genome of Y. enterocolitica has revealed species-specific and enteropathogen-specific loci. This has provided important insights into the pathology of this bacterium and, more broadly, into the evolution of the genus. Moreover, wider investigations looking at the patterns of gene loss and gain in the Yersinia have highlighted common themes in the

  8. The complete genome sequence and comparative genome analysis of the high pathogenicity Yersinia enterocolitica strain 8081.

    PubMed

    Thomson, Nicholas R; Howard, Sarah; Wren, Brendan W; Holden, Matthew T G; Crossman, Lisa; Challis, Gregory L; Churcher, Carol; Mungall, Karen; Brooks, Karen; Chillingworth, Tracey; Feltwell, Theresa; Abdellah, Zahra; Hauser, Heidi; Jagels, Kay; Maddison, Mark; Moule, Sharon; Sanders, Mandy; Whitehead, Sally; Quail, Michael A; Dougan, Gordon; Parkhill, Julian; Prentice, Michael B

    2006-12-15

    The human enteropathogen, Yersinia enterocolitica, is a significant link in the range of Yersinia pathologies extending from mild gastroenteritis to bubonic plague. Comparison at the genomic level is a key step in our understanding of the genetic basis for this pathogenicity spectrum. Here we report the genome of Y. enterocolitica strain 8081 (serotype 0:8; biotype 1B) and extensive microarray data relating to the genetic diversity of the Y. enterocolitica species. Our analysis reveals that the genome of Y. enterocolitica strain 8081 is a patchwork of horizontally acquired genetic loci, including a plasticity zone of 199 kb containing an extraordinarily high density of virulence genes. Microarray analysis has provided insights into species-specific Y. enterocolitica gene functions and the intraspecies differences between the high, low, and nonpathogenic Y. enterocolitica biotypes. Through comparative genome sequence analysis we provide new information on the evolution of the Yersinia. We identify numerous loci that represent ancestral clusters of genes potentially important in enteric survival and pathogenesis, which have been lost or are in the process of being lost, in the other sequenced Yersinia lineages. Our analysis also highlights large metabolic operons in Y. enterocolitica that are absent in the related enteropathogen, Yersinia pseudotuberculosis, indicating major differences in niche and nutrients used within the mammalian gut. These include clusters directing, the production of hydrogenases, tetrathionate respiration, cobalamin synthesis, and propanediol utilisation. Along with ancestral gene clusters, the genome of Y. enterocolitica has revealed species-specific and enteropathogen-specific loci. This has provided important insights into the pathology of this bacterium and, more broadly, into the evolution of the genus. Moreover, wider investigations looking at the patterns of gene loss and gain in the Yersinia have highlighted common themes in the

  9. Genome Sequence and Comparative Genome Analysis of Lactobacillus casei: Insights into Their Niche-Associated Evolution

    PubMed Central

    Cai, Hui; Thompson, Rebecca; Budinich, Mateo F.; Broadbent, Jeff R.

    2009-01-01

    Lactobacillus casei is remarkably adaptable to diverse habitats and widely used in the food industry. To reveal the genomic features that contribute to its broad ecological adaptability and examine the evolution of the species, the genome sequence of L. casei ATCC 334 is analyzed and compared with other sequenced lactobacilli. This analysis reveals that ATCC 334 contains a high number of coding sequences involved in carbohydrate utilization and transcriptional regulation, reflecting its requirement for dealing with diverse environmental conditions. A comparison of the genome sequences of ATCC 334 to L. casei BL23 reveals 12 and 19 genomic islands, respectively. For a broader assessment of the genetic variability within L. casei, gene content of 21 L. casei strains isolated from various habitats (cheeses, n = 7; plant materials, n = 8; and human sources, n = 6) was examined by comparative genome hybridization with an ATCC 334-based microarray. This analysis resulted in identification of 25 hypervariable regions. One of these regions contains an overrepresentation of genes involved in carbohydrate utilization and transcriptional regulation and was thus proposed as a lifestyle adaptation island. Differences in L. casei genome inventory reveal both gene gain and gene decay. Gene gain, via acquisition of genomic islands, likely confers a fitness benefit in specific habitats. Gene decay, that is, loss of unnecessary ancestral traits, is observed in the cheese isolates and likely results in enhanced fitness in the dairy niche. This study gives the first picture of the stable versus variable regions in L. casei and provides valuable insights into evolution, lifestyle adaptation, and metabolic diversity of L. casei. PMID:20333194

  10. Genome sequence and comparative genome analysis of Lactobacillus casei: insights into their niche-associated evolution.

    PubMed

    Cai, Hui; Thompson, Rebecca; Budinich, Mateo F; Broadbent, Jeff R; Steele, James L

    2009-01-01

    Lactobacillus casei is remarkably adaptable to diverse habitats and widely used in the food industry. To reveal the genomic features that contribute to its broad ecological adaptability and examine the evolution of the species, the genome sequence of L. casei ATCC 334 is analyzed and compared with other sequenced lactobacilli. This analysis reveals that ATCC 334 contains a high number of coding sequences involved in carbohydrate utilization and transcriptional regulation, reflecting its requirement for dealing with diverse environmental conditions. A comparison of the genome sequences of ATCC 334 to L. casei BL23 reveals 12 and 19 genomic islands, respectively. For a broader assessment of the genetic variability within L. casei, gene content of 21 L. casei strains isolated from various habitats (cheeses, n = 7; plant materials, n = 8; and human sources, n = 6) was examined by comparative genome hybridization with an ATCC 334-based microarray. This analysis resulted in identification of 25 hypervariable regions. One of these regions contains an overrepresentation of genes involved in carbohydrate utilization and transcriptional regulation and was thus proposed as a lifestyle adaptation island. Differences in L. casei genome inventory reveal both gene gain and gene decay. Gene gain, via acquisition of genomic islands, likely confers a fitness benefit in specific habitats. Gene decay, that is, loss of unnecessary ancestral traits, is observed in the cheese isolates and likely results in enhanced fitness in the dairy niche. This study gives the first picture of the stable versus variable regions in L. casei and provides valuable insights into evolution, lifestyle adaptation, and metabolic diversity of L. casei. PMID:20333194

  11. Nucleotide resolution analysis of TMPRSS2 and ERG rearrangements in prostate cancer

    PubMed Central

    Weier, Christopher; Haffner, Michael C.; Mosbruger, Timothy; Esopi, David M.; Hicks, Jessica; Zheng, Qizhi; Fedor, Helen; Isaacs, William B.; De Marzo, Angelo M.; Nelson, William G.; Yegnasubramanian, Srinivasan

    2013-01-01

    TMPRSS2-ERG rearrangements occur in approximately 50% of prostate cancers and therefore represent one of the most frequently observed structural rearrangements in all cancers. However, little is known about the genomic architecture of such rearrangements. We therefore designed and optimized a pipeline involving target-capture of TMPRSS2 and ERG genomic sequences coupled with paired-end next generation sequencing to resolve genomic rearrangement breakpoints in TMPRSS2 and ERG at nucleotide resolution in a large series of primary prostate cancer specimens (n = 83). This strategy showed >90% sensitivity and specificity in identifying TMPRSS2-ERG rearrangements, and allowed identification of intra- and inter-chromosomal rearrangements involving TMPRSS2 and ERG with known and novel fusion partners. Our results indicate that rearrangement breakpoints show strong clustering in specific intronic regions of TMPRSS2 and ERG. The observed TMPRSS2-ERG rearrangements often exhibited complex chromosomal architecture associated with several intra- and inter-chromosomal rearrangements. Nucleotide resolution analysis of breakpoint junctions revealed that the majority of TMPRSS2 and ERG rearrangements (~88%) occurred at or near regions of microhomology or involved insertions of one or more base pairs. This architecture implicates nonhomologous end joining (NHEJ) and microhomology mediated end joining (MMEJ) pathways in the generation of such rearrangements. These analyses have provided important insights into the molecular mechanisms involved in generating prostate cancer-specific recurrent rearrangements. PMID:23447416

  12. Enhancing genomic laboratory reports: A qualitative analysis of provider review.

    PubMed

    Williams, Janet L; Rahm, Alanna Kulchak; Stuckey, Heather; Green, Jamie; Feldman, Lynn; Zallen, Doris T; Bonhag, Michele; Segal, Michael M; Fan, Audrey L; Williams, Marc S

    2016-05-01

    This study reports on the responses of physicians who reviewed provider and patient versions of a genomic laboratory report designed to communicate results of whole genome sequencing. Semi-structured interviews addressed concept communication, elements, and format of example genome reports. Analysis of the coded transcripts resulted in recognition of three constructs around communication of genome sequencing results: (1) Providers agreed that whole genomic sequencing results are complex and they welcomed a report that provided supportive interpretation information to accompany sequencing results; (2) Providers strongly endorsed a report that included active clinical guidance, such as reference to practice guidelines, if available; and (3) Providers valued the genomic report as a resource that would serve as the basis to facilitate communication of genome sequencing results with their patients and families. Providers valued both versions of the report, though they affirmed the need for a provider-oriented report. Critical elements of the report included clear language to explain the result, as well as consolidated yet comprehensive prognostic information with clear guidance over time for the clinical care of the patient. Most importantly, it appears a report with this design has the potential not only to return results but also serves as a communication tool to help providers and patients discuss and coordinate care over time. © 2016 The Authors. American Journal of Medical Genetics Part A published by Wiley Periodicals, Inc. PMID:26842872

  13. Power, resolution and bias: recent advances in insect phylogeny driven by the genomic revolution.

    PubMed

    Yeates, David K; Meusemann, Karen; Trautwein, Michelle; Wiegmann, Brian; Zwick, Andreas

    2016-02-01

    Our understanding on the phylogenetic relationships of insects has been revolutionised in the last decade by the proliferation of next generation sequencing technologies (NGS). NGS has allowed insect systematists to assemble very large molecular datasets that include both model and non-model organisms. Such datasets often include a large proportion of the total number of protein coding sequences available for phylogenetic comparison. We review some early entomological phylogenomic studies that employ a range of different data sampling protocols and analyses strategies, illustrating a fundamental renaissance in our understanding of insect evolution all driven by the genomic revolution. The analysis of phylogenomic datasets is challenging because of their size and complexity, and it is obvious that the increasing size alone does not ensure that phylogenetic signal overcomes systematic biases in the data. Biases can be due to various factors such as the method of data generation and assembly, or intrinsic biological feature of the data per se, such as similarities due to saturation or compositional heterogeneity. Such biases often cause violations in the underlying assumptions of phylogenetic models. We review some of the bioinformatics tools available and being developed to detect and minimise systematic biases in phylogenomic datasets. Phylogenomic-scale data coupled with sophisticated analyses will revolutionise our understanding of insect functional genomics. This will illuminate the relationship between the vast range of insect phenotypic diversity and underlying genetic diversity. In combination with rapidly developing methods to estimate divergence times, these analyses will also provide a compelling view of the rates and patterns of lineagenesis (birth of lineages) over the half billion years of insect evolution. PMID:27436549

  14. A high-resolution radiation hybrid map of the human genome draft sequence.

    PubMed

    Olivier, M; Aggarwal, A; Allen, J; Almendras, A A; Bajorek, E S; Beasley, E M; Brady, S D; Bushard, J M; Bustos, V I; Chu, A; Chung, T R; De Witte, A; Denys, M E; Dominguez, R; Fang, N Y; Foster, B D; Freudenberg, R W; Hadley, D; Hamilton, L R; Jeffrey, T J; Kelly, L; Lazzeroni, L; Levy, M R; Lewis, S C; Liu, X; Lopez, F J; Louie, B; Marquis, J P; Martinez, R A; Matsuura, M K; Misherghi, N S; Norton, J A; Olshen, A; Perkins, S M; Perou, A J; Piercy, C; Piercy, M; Qin, F; Reif, T; Sheppard, K; Shokoohi, V; Smick, G A; Sun, W L; Stewart, E A; Fernando, J; Tejeda; Tran, N M; Trejo, T; Vo, N T; Yan, S C; Zierten, D L; Zhao, S; Sachidanandam, R; Trask, B J; Myers, R M; Cox, D R

    2001-02-16

    We have constructed a physical map of the human genome by using a panel of 90 whole-genome radiation hybrids (the TNG panel) in conjunction with 40,322 sequence-tagged sites (STSs) derived from random genomic sequences as well as expressed sequences. Of 36,678 STSs on the TNG radiation hybrid map, only 3604 (9.8%) were absent from the unassembled draft sequence of the human genome. Of 20,030 STSs ordered on the TNG map as well as the assembled human genome draft sequence and the Celera assembled human genome sequence, 36% of the STSs had a discrepant order between the working draft sequence and the Celera sequence. The TNG map order was identical to one of the two sequence orders in 60% of these discrepant cases. PMID:11181994

  15. MBGD update 2015: microbial genome database for flexible ortholog analysis utilizing a diverse set of genomic data.

    PubMed

    Uchiyama, Ikuo; Mihara, Motohiro; Nishide, Hiroyo; Chiba, Hirokazu

    2015-01-01

    The microbial genome database for comparative analysis (MBGD) (available at http://mbgd.genome.ad.jp/) is a comprehensive ortholog database for flexible comparative analysis of microbial genomes, where the users are allowed to create an ortholog table among any specified set of organisms. Because of the rapid increase in microbial genome data owing to the next-generation sequencing technology, it becomes increasingly challenging to maintain high-quality orthology relationships while allowing the users to incorporate the latest genomic data available into an analysis. Because many of the recently accumulating genomic data are draft genome sequences for which some complete genome sequences of the same or closely related species are available, MBGD now stores draft genome data and allows the users to incorporate them into a user-specific ortholog database using the MyMBGD functionality. In this function, draft genome data are incorporated into an existing ortholog table created only from the complete genome data in an incremental manner to prevent low-quality draft data from affecting clustering results. In addition, to provide high-quality orthology relationships, the standard ortholog table containing all the representative genomes, which is first created by the rapid classification program DomClust, is now refined using DomRefine, a recently developed program for improving domain-level clustering using multiple sequence alignment information. PMID:25398900

  16. Somatic alterations in the melanoma genome: a high-resolution array-based comparative genomic hybridization study.

    PubMed

    Gast, Andreas; Scherer, Dominique; Chen, Bowang; Bloethner, Sandra; Melchert, Stephanie; Sucker, Antje; Hemminki, Kari; Schadendorf, Dirk; Kumar, Rajiv

    2010-08-01

    We performed DNA microarray-based comparative genomic hybridization to identify somatic alterations specific to melanoma genome in 60 human cell lines from metastasized melanoma and from 44 corresponding peripheral blood mononuclear cells. Our data showed gross but nonrandom somatic changes specific to the tumor genome. Although the CDKN2A (78%) and PTEN (70%) loci were the major targets of mono-allelic and bi-allelic deletions, amplifications affected loci with BRAF (53%) and NRAS (12%) as well as EGFR (52%), MITF (40%), NOTCH2 (35%), CCND1 (18%), MDM2 (18%), CCNE1 (10%), and CDK4 (8%). The amplified loci carried additional genes, many of which could potentially play a role in melanoma. Distinct patterns of copy number changes showed that alterations in CDKN2A tended to be more clustered in cell lines with mutations in the BRAF and NRAS genes; the PTEN locus was targeted mainly in conjunction with BRAF mutations. Amplification of CCND1, CDK4, and other loci was significantly increased in cell lines without BRAF-NRAS mutations and so was the loss of chromosome arms 13q and 16q. Our data suggest involvement of distinct genetic pathways that are driven either through oncogenic BRAF and NRAS mutations complemented by aberrations in the CDKN2A and PTEN genes or involve amplification of oncogenic genomic loci and loss of 13q and 16q. It also emerges that each tumor besides being affected by major and most common somatic genetic alterations also acquires additional genetic alterations that could be crucial in determining response to small molecular inhibitors that are being currently pursued. PMID:20544847

  17. Differential DNA Methylation Analysis without a Reference Genome

    PubMed Central

    Klughammer, Johanna; Datlinger, Paul; Printz, Dieter; Sheffield, Nathan C.; Farlik, Matthias; Hadler, Johanna; Fritsch, Gerhard; Bock, Christoph

    2015-01-01

    Summary Genome-wide DNA methylation mapping uncovers epigenetic changes associated with animal development, environmental adaptation, and species evolution. To address the lack of high-throughput methods for DNA methylation analysis in non-model organisms, we developed an integrated approach for studying DNA methylation differences independent of a reference genome. Experimentally, our method relies on an optimized 96-well protocol for reduced representation bisulfite sequencing (RRBS), which we have validated in nine species (human, mouse, rat, cow, dog, chicken, carp, sea bass, and zebrafish). Bioinformatically, we developed the RefFreeDMA software to deduce ad hoc genomes directly from RRBS reads and to pinpoint differentially methylated regions between samples or groups of individuals (http://RefFreeDMA.computational-epigenetics.org). The identified regions are interpreted using motif enrichment analysis and/or cross-mapping to annotated genomes. We validated our method by reference-free analysis of cell-type-specific DNA methylation in the blood of human, cow, and carp. In summary, we present a cost-effective method for epigenome analysis in ecology and evolution, which enables epigenome-wide association studies in natural populations and species without a reference genome. PMID:26673328

  18. High-resolution Brillouin analysis of composite materials beams

    NASA Astrophysics Data System (ADS)

    London, Yosef; Antman, Yair; Silbiger, Maayan; Efraim, Liel; Froochzad, Avihay; Adler, Gadi; Levenberg, Eyal; Zadok, Avi

    2015-09-01

    High-resolution Brillouin optical correlation domain analysis of fibers embedded within beams of composite materials is performed with 4 cm resolution and 0.5 MHz sensitivity. Two new contributions are presented. First, analysis was carried out continuously over 30 hours following the production of a beam, observing heating during exothermal curing and buildup of residual strains. Second, the bending stiffness and Young's modulus of the composite beam were extracted based on distributed strain measurements, taken during a static three-point bending experiment. The calculated parameters were used to forecast the beam deflections. The latter were favorably compared against external displacement measurements.

  19. A Novel Approach to Helicobacter pylori Pan-Genome Analysis for Identification of Genomic Islands.

    PubMed

    Uchiyama, Ikuo; Albritton, Jacob; Fukuyo, Masaki; Kojima, Kenji K; Yahara, Koji; Kobayashi, Ichizo

    2016-01-01

    Genomes of a given bacterial species can show great variation in gene content and thus systematic analysis of the entire gene repertoire, termed the pan-genome, is important for understanding bacterial intra-species diversity, population genetics, and evolution. Here, we analyzed the pan-genome from 30 completely sequenced strains of the human gastric pathogen Helicobacter pylori belonging to various phylogeographic groups, focusing on 991 accessory (not fully conserved) orthologous groups (OGs). We developed a method to evaluate the mobility of genes within a genome, using the gene order in the syntenically conserved regions as a reference, and classified the 991 accessory OGs into five classes: Core, Stable, Intermediate, Mobile, and Unique. Phylogenetic networks based on the gene content of Core and Stable classes are highly congruent with that created from the concatenated alignment of fully conserved core genes, in contrast to those of Intermediate and Mobile classes, which show quite different topologies. By clustering the accessory OGs on the basis of phylogenetic pattern similarity and chromosomal proximity, we identified 60 co-occurring gene clusters (CGCs). In addition to known genomic islands, including cag pathogenicity island, bacteriophages, and integrating conjugative elements, we identified some novel ones. One island encodes TerY-phosphorylation triad, which includes the eukaryote-type protein kinase/phosphatase gene pair, and components of type VII secretion system. Another one contains a reverse-transcriptase homolog, which may be involved in the defense against phage infection through altruistic suicide. Many of the CGCs contained restriction-modification (RM) genes. Different RM systems sometimes occupied the same (orthologous) locus in the strains. We anticipate that our method will facilitate pan-genome studies in general and help identify novel genomic islands in various bacterial species. PMID:27504980

  20. A Novel Approach to Helicobacter pylori Pan-Genome Analysis for Identification of Genomic Islands

    PubMed Central

    Uchiyama, Ikuo; Albritton, Jacob; Fukuyo, Masaki; Kojima, Kenji K.; Yahara, Koji; Kobayashi, Ichizo

    2016-01-01

    Genomes of a given bacterial species can show great variation in gene content and thus systematic analysis of the entire gene repertoire, termed the pan-genome, is important for understanding bacterial intra-species diversity, population genetics, and evolution. Here, we analyzed the pan-genome from 30 completely sequenced strains of the human gastric pathogen Helicobacter pylori belonging to various phylogeographic groups, focusing on 991 accessory (not fully conserved) orthologous groups (OGs). We developed a method to evaluate the mobility of genes within a genome, using the gene order in the syntenically conserved regions as a reference, and classified the 991 accessory OGs into five classes: Core, Stable, Intermediate, Mobile, and Unique. Phylogenetic networks based on the gene content of Core and Stable classes are highly congruent with that created from the concatenated alignment of fully conserved core genes, in contrast to those of Intermediate and Mobile classes, which show quite different topologies. By clustering the accessory OGs on the basis of phylogenetic pattern similarity and chromosomal proximity, we identified 60 co-occurring gene clusters (CGCs). In addition to known genomic islands, including cag pathogenicity island, bacteriophages, and integrating conjugative elements, we identified some novel ones. One island encodes TerY-phosphorylation triad, which includes the eukaryote-type protein kinase/phosphatase gene pair, and components of type VII secretion system. Another one contains a reverse-transcriptase homolog, which may be involved in the defense against phage infection through altruistic suicide. Many of the CGCs contained restriction-modification (RM) genes. Different RM systems sometimes occupied the same (orthologous) locus in the strains. We anticipate that our method will facilitate pan-genome studies in general and help identify novel genomic islands in various bacterial species. PMID:27504980

  1. Genomic Analysis of Companion Rabbit Staphylococcus aureus

    PubMed Central

    Holmes, Mark A.; Harrison, Ewan M.; Fisher, Elizabeth A.; Graham, Elizabeth M.; Parkhill, Julian; Foster, Geoffrey; Paterson, Gavin K.

    2016-01-01

    In addition to being an important human pathogen, Staphylococcus aureus is able to cause a variety of infections in numerous other host species. While the S. aureus strains causing infection in several of these hosts have been well characterised, this is not the case for companion rabbits (Oryctolagus cuniculus), where little data are available on S. aureus strains from this host. To address this deficiency we have performed antimicrobial susceptibility testing and genome sequencing on a collection of S. aureus isolates from companion rabbits. The findings show a diverse S. aureus population is able to cause infection in this host, and while antimicrobial resistance was uncommon, the isolates possess a range of known and putative virulence factors consistent with a diverse clinical presentation in companion rabbits including severe abscesses. We additionally show that companion rabbit isolates carry polymorphisms within dltB as described as underlying host-adaption of S. aureus to farmed rabbits. The availability of S. aureus genome sequences from companion rabbits provides an important aid to understanding the pathogenesis of disease in this host and in the clinical management and surveillance of these infections. PMID:26963381

  2. Quantitative analysis of comparative genomic hybridization

    SciTech Connect

    Manoir, S. du; Bentz, M.; Joos, S. |

    1995-01-01

    Comparative genomic hybridization (CGH) is a new molecular cytogenetic method for the detection of chromosomal imbalances. Following cohybridization of DNA prepared from a sample to be studied and control DNA to normal metaphase spreads, probes are detected via different fluorochromes. The ratio of the test and control fluorescence intensities along a chromosome reflects the relative copy number of segments of a chromosome in the test genome. Quantitative evaluation of CGH experiments is required for the determination of low copy changes, e.g., monosomy or trisomy, and for the definition of the breakpoints involved in unbalanced rearrangements. In this study, a program for quantitation of CGH preparations is presented. This program is based on the extraction of the fluorescence ratio profile along each chromosome, followed by averaging of individual profiles from several metaphase spreads. Objective parameters critical for quantitative evaluations were tested, and the criteria for selection of suitable CGH preparations are described. The granularity of the chromosome painting and the regional inhomogeneity of fluorescence intensities in metaphase spreads proved to be crucial parameters. The coefficient of variation of the ratio value for chromosomes in balanced state (CVBS) provides a general quality criterion for CGH experiments. Different cutoff levels (thresholds) of average fluorescence ratio values were compared for their specificity and sensitivity with regard to the detection of chromosomal imbalances. 27 refs., 15 figs., 1 tab.

  3. Macrorestriction Analysis of Caenorhabditis Elegans Genomic DNA

    PubMed Central

    Browning, H.; Berkowitz, L.; Madej, C.; Paulsen, J. E.; Zolan, M. E.; Strome, S.

    1996-01-01

    The usefulness of genomic physical maps is greatly enhanced by linkage of the physical map with the genetic map. We describe a ``macrorestriction mapping'' procedure for Caenorhabditis elegans that we have applied to this endeavor. High molecular weight, genomic DNA is digested with infrequently cutting restriction enzymes and size-fractionated by pulsed field gel electrophoresis. Southern blots of the gels are probed with clones from the C. elegans physical map. This procedure allows the construction of restriction maps covering several hundred kilobases and the detection of polymorphic restriction fragments using probes that map several hundred kilobases away. We describe several applications of this technique. (1) We determined that the amount of DNA in a previously uncloned region is <220 kb. (2) We mapped the mes-1 gene to a cosmid, by detecting polymorphic restriction fragments associated with a deletion allele of the gene. The 25-kb deletion was initially detected using as a probe sequences located ~400 kb away from the gene. (3) We mapped the molecular endpoint of the deficiency hDf6, and determined that three spontaneously derived duplications in the unc-38-dpy-5 region have very complex molecular structures, containing internal rearrangements and deletions. PMID:8889524

  4. Genomic Analysis of Companion Rabbit Staphylococcus aureus.

    PubMed

    Holmes, Mark A; Harrison, Ewan M; Fisher, Elizabeth A; Graham, Elizabeth M; Parkhill, Julian; Foster, Geoffrey; Paterson, Gavin K

    2016-01-01

    In addition to being an important human pathogen, Staphylococcus aureus is able to cause a variety of infections in numerous other host species. While the S. aureus strains causing infection in several of these hosts have been well characterised, this is not the case for companion rabbits (Oryctolagus cuniculus), where little data are available on S. aureus strains from this host. To address this deficiency we have performed antimicrobial susceptibility testing and genome sequencing on a collection of S. aureus isolates from companion rabbits. The findings show a diverse S. aureus population is able to cause infection in this host, and while antimicrobial resistance was uncommon, the isolates possess a range of known and putative virulence factors consistent with a diverse clinical presentation in companion rabbits including severe abscesses. We additionally show that companion rabbit isolates carry polymorphisms within dltB as described as underlying host-adaption of S. aureus to farmed rabbits. The availability of S. aureus genome sequences from companion rabbits provides an important aid to understanding the pathogenesis of disease in this host and in the clinical management and surveillance of these infections. PMID:26963381

  5. Integrative genomic analysis by interoperation of bioinformatics tools in GenomeSpace.

    PubMed

    Qu, Kun; Garamszegi, Sara; Wu, Felix; Thorvaldsdottir, Helga; Liefeld, Ted; Ocana, Marco; Borges-Rivera, Diego; Pochet, Nathalie; Robinson, James T; Demchak, Barry; Hull, Tim; Ben-Artzi, Gil; Blankenberg, Daniel; Barber, Galt P; Lee, Brian T; Kuhn, Robert M; Nekrutenko, Anton; Segal, Eran; Ideker, Trey; Reich, Michael; Regev, Aviv; Chang, Howard Y; Mesirov, Jill P

    2016-03-01

    Complex biomedical analyses require the use of multiple software tools in concert and remain challenging for much of the biomedical research community. We introduce GenomeSpace (http://www.genomespace.org), a cloud-based, cooperative community resource that currently supports the streamlined interaction of 20 bioinformatics tools and data resources. To facilitate integrative analysis by non-programmers, it offers a growing set of 'recipes', short workflows to guide investigators through high-utility analysis tasks. PMID:26780094

  6. Population-based 3D genome structure analysis reveals driving forces in spatial genome organization

    PubMed Central

    Li, Wenyuan; Kalhor, Reza; Dai, Chao; Hao, Shengli; Gong, Ke; Zhou, Yonggang; Li, Haochen; Zhou, Xianghong Jasmine; Le Gros, Mark A.; Larabell, Carolyn A.; Chen, Lin; Alber, Frank

    2016-01-01

    Conformation capture technologies (e.g., Hi-C) chart physical interactions between chromatin regions on a genome-wide scale. However, the structural variability of the genome between cells poses a great challenge to interpreting ensemble-averaged Hi-C data, particularly for long-range and interchromosomal interactions. Here, we present a probabilistic approach for deconvoluting Hi-C data into a model population of distinct diploid 3D genome structures, which facilitates the detection of chromatin interactions likely to co-occur in individual cells. Our approach incorporates the stochastic nature of chromosome conformations and allows a detailed analysis of alternative chromatin structure states. For example, we predict and experimentally confirm the presence of large centromere clusters with distinct chromosome compositions varying between individual cells. The stability of these clusters varies greatly with their chromosome identities. We show that these chromosome-specific clusters can play a key role in the overall chromosome positioning in the nucleus and stabilizing specific chromatin interactions. By explicitly considering genome structural variability, our population-based method provides an important tool for revealing novel insights into the key factors shaping the spatial genome organization. PMID:26951677

  7. Sequencing and analysis of a genomic fragment provide an insight into the Dunaliella viridis genomic sequence.

    PubMed

    Sun, Xiao-Ming; Tang, Yuan-Ping; Meng, Xiang-Zong; Zhang, Wen-Wen; Li, Shan; Deng, Zhi-Rui; Xu, Zheng-Kai; Song, Ren-Tao

    2006-11-01

    Dunaliella is a genus of wall-less unicellular eukaryotic green alga. Its exceptional resistances to salt and various other stresses have made it an ideal model for stress tolerance study. However, very little is known about its genome and genomic sequences. In this study, we sequenced and analyzed a 29,268 bp genomic fragment from Dunaliella viridis. The fragment showed low sequence homology to the GenBank database. At the nucleotide level, only a segment with significant sequence homology to 18S rRNA was found. The fragment contained six putative genes, but only one gene showed significant homology at the protein level to GenBank database. The average GC content of this sequence was 51.1%, which was much lower than that of close related green algae Chlamydomonas (65.7%). Significant segmental duplications were found within this fragment. The duplicated sequences accounted for about 35.7% of the entire region. Large amounts of simple sequence repeats (microsatellites) were found, with strong bias towards (AC)(n) type (76%). Analysis of other Dunaliella genomic sequences in the GenBank database (total 25,749 bp) was in agreement with these findings. These sequence features made it difficult to sequence Dunaliella genomic sequences. Further investigation should be made to reveal the biological significance of these unique sequence features. PMID:17091199

  8. A high resolution genetic map anchoring scaffolds of the sequenced watermelon genome

    Technology Transfer Automated Retrieval System (TEKTRAN)

    As part of our ongoing efforts to sequence and map the watermelon (Citrullus spp.) genome, we have constructed a high-density genetic linkage map. The map positioned 234 watermelon genome sequence scaffolds (an average size of 1.41 Mb) that cover about 330 Mb and account for 93.5% of the 353 Mb of ...

  9. Annotation of the zebrafish genome through an integrated transcriptomic and proteomic analysis.

    PubMed

    Kelkar, Dhanashree S; Provost, Elayne; Chaerkady, Raghothama; Muthusamy, Babylakshmi; Manda, Srikanth S; Subbannayya, Tejaswini; Selvan, Lakshmi Dhevi N; Wang, Chieh-Huei; Datta, Keshava K; Woo, Sunghee; Dwivedi, Sutopa B; Renuse, Santosh; Getnet, Derese; Huang, Tai-Chung; Kim, Min-Sik; Pinto, Sneha M; Mitchell, Christopher J; Madugundu, Anil K; Kumar, Praveen; Sharma, Jyoti; Advani, Jayshree; Dey, Gourav; Balakrishnan, Lavanya; Syed, Nazia; Nanjappa, Vishalakshi; Subbannayya, Yashwanth; Goel, Renu; Prasad, T S Keshava; Bafna, Vineet; Sirdeshmukh, Ravi; Gowda, Harsha; Wang, Charles; Leach, Steven D; Pandey, Akhilesh

    2014-11-01

    Accurate annotation of protein-coding genes is one of the primary tasks upon the completion of whole genome sequencing of any organism. In this study, we used an integrated transcriptomic and proteomic strategy to validate and improve the existing zebrafish genome annotation. We undertook high-resolution mass-spectrometry-based proteomic profiling of 10 adult organs, whole adult fish body, and two developmental stages of zebrafish (SAT line), in addition to transcriptomic profiling of six organs. More than 7,000 proteins were identified from proteomic analyses, and ∼ 69,000 high-confidence transcripts were assembled from the RNA sequencing data. Approximately 15% of the transcripts mapped to intergenic regions, the majority of which are likely long non-coding RNAs. These high-quality transcriptomic and proteomic data were used to manually reannotate the zebrafish genome. We report the identification of 157 novel protein-coding genes. In addition, our data led to modification of existing gene structures including novel exons, changes in exon coordinates, changes in frame of translation, translation in annotated UTRs, and joining of genes. Finally, we discovered four instances of genome assembly errors that were supported by both proteomic and transcriptomic data. Our study shows how an integrative analysis of the transcriptome and the proteome can extend our understanding of even well-annotated genomes. PMID:25060758

  10. TCGA4U: A Web-Based Genomic Analysis Platform To Explore And Mine TCGA Genomic Data For Translational Research.

    PubMed

    Huang, Zhenzhen; Duan, Huilong; Li, Haomin

    2015-01-01

    Large-scale human cancer genomics projects, such as TCGA, generated large genomics data for further study. Exploring and mining these data to obtain meaningful analysis results can help researchers find potential genomics alterations that intervene the development and metastasis of tumors. We developed a web-based gene analysis platform, named TCGA4U, which used statistics methods and models to help translational investigators explore, mine and visualize human cancer genomic characteristic information from the TCGA datasets. Furthermore, through Gene Ontology (GO) annotation and clinical data integration, the genomic data were transformed into biological process, molecular function, cellular component and survival curves to help researchers identify potential driver genes. Clinical researchers without expertise in data analysis will benefit from such a user-friendly genomic analysis platform. PMID:26262133

  11. Differentiating between monozygotic twins through DNA methylation-specific high-resolution melt curve analysis.

    PubMed

    Stewart, Leander; Evans, Neil; Bexon, Kimberley J; van der Meer, Dieudonne J; Williams, Graham A

    2015-05-01

    Although short tandem repeat profiling is extremely powerful in identifying individuals from crime scene stains, it is unable to differentiate between monozygotic (MZ) twins. Efforts to address this include mutation analysis through whole genome sequencing and through DNA methylation studies. Methylation of DNA is affected by environmental factors; thus, as MZ twins age, their DNA methylation patterns change. This can be characterized by bisulfite treatment followed by pyrosequencing. However, this can be time-consuming and expensive; thus, it is unlikely to be widely used by investigators. If the sequences are different, then in theory the melting temperature should be different. Thus, the aim of this study was to assess whether high-resolution melt curve analysis can be used to differentiate between MZ twins. Five sets of MZ twins provided buccal swabs that underwent extraction, quantification, bisulfite treatment, polymerase chain reaction amplification and high-resolution melting curve analysis targeting two markers, Alu-E2F3 and Alu-SP. Significant differences were observed between all MZ twins targeting Alu-E2F3 and in four of five MZ twins targeting Alu-SP (P<0.05). Thus, it has been demonstrated that bisulfite treatment followed by high-resolution melting curve analysis could be used to differentiate between MZ twins. PMID:25677265

  12. Whole genome DNA methylation analysis based on high throughput sequencing technology.

    PubMed

    Li, Ning; Ye, Mingzhi; Li, Yingrui; Yan, Zhixiang; Butcher, Lee M; Sun, Jihua; Han, Xu; Chen, Quan; Zhang, Xiuqing; Wang, Jun

    2010-11-01

    There are numerous approaches to decipher a whole genome DNA methylation profile ("methylome"), each varying in cost, throughput and resolution. The gold standard of these methods, whole genome bisulfite-sequencing (BS-seq), involves treatment of DNA with sodium bisulfite combined with subsequent high throughput sequencing. Using BS-seq, we generated a single-base-resolution methylome in human peripheral blood mononuclear cells (in press). This BS-seq map was then used as the reference methylome to compare two alternative sequencing-based methylome assays (performed on the same donor of PBMCs): methylated DNA immunoprecipitation (MeDIP-seq) and methyl-binding protein (MBD-seq). In our analysis, we found that MeDIP-seq and MBD-seq are complementary strategies, with MeDIP-seq more sensitive to highly methylated, high-CpG densities and MDB-seq more sensitive to highly methylated, moderate-CpG densities. Taking into account the size of a mammalian genome and the current expense of sequencing, we feel 3gigabases (Gbp) 45bp paired-end MeDIP-seq or MBD-seq uniquely mapped reads is the minimum requirement and cost-effective strategy for methylome pattern analysis. PMID:20430099

  13. Digital microarray analysis for digital artifact genomics

    NASA Astrophysics Data System (ADS)

    Jaenisch, Holger; Handley, James; Williams, Deborah

    2013-06-01

    We implement a Spatial Voting (SV) based analogy of microarray analysis for digital gene marker identification in malware code sections. We examine a famous set of malware formally analyzed by Mandiant and code named Advanced Persistent Threat (APT1). APT1 is a Chinese organization formed with specific intent to infiltrate and exploit US resources. Manidant provided a detailed behavior and sting analysis report for the 288 malware samples available. We performed an independent analysis using a new alternative to the traditional dynamic analysis and static analysis we call Spatial Analysis (SA). We perform unsupervised SA on the APT1 originating malware code sections and report our findings. We also show the results of SA performed on some members of the families associated by Manidant. We conclude that SV based SA is a practical fast alternative to dynamics analysis and static analysis.

  14. FVGWAS: Fast voxelwise genome wide association analysis of large-scale imaging genetic data.

    PubMed

    Huang, Meiyan; Nichols, Thomas; Huang, Chao; Yu, Yang; Lu, Zhaohua; Knickmeyer, Rebecca C; Feng, Qianjin; Zhu, Hongtu

    2015-09-01

    More and more large-scale imaging genetic studies are being widely conducted to collect a rich set of imaging, genetic, and clinical data to detect putative genes for complexly inherited neuropsychiatric and neurodegenerative disorders. Several major big-data challenges arise from testing genome-wide (NC>12 million known variants) associations with signals at millions of locations (NV~10(6)) in the brain from thousands of subjects (n~10(3)). The aim of this paper is to develop a Fast Voxelwise Genome Wide Association analysiS (FVGWAS) framework to efficiently carry out whole-genome analyses of whole-brain data. FVGWAS consists of three components including a heteroscedastic linear model, a global sure independence screening (GSIS) procedure, and a detection procedure based on wild bootstrap methods. Specifically, for standard linear association, the computational complexity is O (nNVNC) for voxelwise genome wide association analysis (VGWAS) method compared with O ((NC+NV)n(2)) for FVGWAS. Simulation studies show that FVGWAS is an efficient method of searching sparse signals in an extremely large search space, while controlling for the family-wise error rate. Finally, we have successfully applied FVGWAS to a large-scale imaging genetic data analysis of ADNI data with 708 subjects, 193,275voxels in RAVENS maps, and 501,584 SNPs, and the total processing time was 203,645s for a single CPU. Our FVGWAS may be a valuable statistical toolbox for large-scale imaging genetic analysis as the field is rapidly advancing with ultra-high-resolution imaging and whole-genome sequencing. PMID:26025292

  15. Stacks: an analysis tool set for population genomics

    PubMed Central

    CATCHEN, JULIAN; HOHENLOHE, PAUL A.; BASSHAM, SUSAN; AMORES, ANGEL; CRESKO, WILLIAM A.

    2014-01-01

    Massively parallel short-read sequencing technologies, coupled with powerful software platforms, are enabling investigators to analyse tens of thousands of genetic markers. This wealth of data is rapidly expanding and allowing biological questions to be addressed with unprecedented scope and precision. The sizes of the data sets are now posing significant data processing and analysis challenges. Here we describe an extension of the Stacks software package to efficiently use genotype-by-sequencing data for studies of populations of organisms. Stacks now produces core population genomic summary statistics and SNP-by-SNP statistical tests. These statistics can be analysed across a reference genome using a smoothed sliding window. Stacks also now provides several output formats for several commonly used downstream analysis packages. The expanded population genomics functions in Stacks will make it a useful tool to harness the newest generation of massively parallel genotyping data for ecological and evolutionary genetics. PMID:23701397

  16. Evacuee Compliance Behavior Analysis using High Resolution Demographic Information

    SciTech Connect

    Lu, Wei; Han, Lee; Liu, Cheng; Tuttle, Mark A; Bhaduri, Budhendra L

    2014-01-01

    The purpose of this study is to examine whether evacuee compliance behavior with route assignments from different resolutions of demographic data would impact the evacuation performance. Most existing evacuation strategies assume that travelers will follow evacuation instructions, while in reality a certain percent of evacuees do not comply with prescribed instructions. In this paper, a comparison study of evacuation assignment based on Traffic Analysis Zones (TAZ) and high resolution LandScan USA Population Cells (LPC) were conducted for the detailed road network representing Alexandria, Virginia. A revised platform for evacuation modeling built on high resolution demographic data and activity-based microscopic traffic simulation is proposed. The results indicate that evacuee compliance behavior affects evacuation efficiency with traditional TAZ assignment, but it does not significantly compromise the efficiency with high resolution LPC assignment. The TAZ assignment also underestimates the real travel time during evacuation, especially for high compliance simulations. This suggests that conventional evacuation studies based on TAZ assignment might not be effective at providing efficient guidance to evacuees. From the high resolution data perspective, traveler compliance behavior is an important factor but it does not impact the system performance significantly. The highlight of evacuee compliance behavior analysis should be emphasized on individual evacuee level route/shelter assignments, rather than the whole system performance.

  17. Coevolution analysis of Hepatitis C virus genome to identify the structural and functional dependency network of viral proteins

    PubMed Central

    Champeimont, Raphaël; Laine, Elodie; Hu, Shuang-Wei; Penin, Francois; Carbone, Alessandra

    2016-01-01

    A novel computational approach of coevolution analysis allowed us to reconstruct the protein-protein interaction network of the Hepatitis C Virus (HCV) at the residue resolution. For the first time, coevolution analysis of an entire viral genome was realized, based on a limited set of protein sequences with high sequence identity within genotypes. The identified coevolving residues constitute highly relevant predictions of protein-protein interactions for further experimental identification of HCV protein complexes. The method can be used to analyse other viral genomes and to predict the associated protein interaction networks. PMID:27198619

  18. Coevolution analysis of Hepatitis C virus genome to identify the structural and functional dependency network of viral proteins

    NASA Astrophysics Data System (ADS)

    Champeimont, Raphaël; Laine, Elodie; Hu, Shuang-Wei; Penin, Francois; Carbone, Alessandra

    2016-05-01

    A novel computational approach of coevolution analysis allowed us to reconstruct the protein-protein interaction network of the Hepatitis C Virus (HCV) at the residue resolution. For the first time, coevolution analysis of an entire viral genome was realized, based on a limited set of protein sequences with high sequence identity within genotypes. The identified coevolving residues constitute highly relevant predictions of protein-protein interactions for further experimental identification of HCV protein complexes. The method can be used to analyse other viral genomes and to predict the associated protein interaction networks.

  19. Coevolution analysis of Hepatitis C virus genome to identify the structural and functional dependency network of viral proteins.

    PubMed

    Champeimont, Raphaël; Laine, Elodie; Hu, Shuang-Wei; Penin, Francois; Carbone, Alessandra

    2016-01-01

    A novel computational approach of coevolution analysis allowed us to reconstruct the protein-protein interaction network of the Hepatitis C Virus (HCV) at the residue resolution. For the first time, coevolution analysis of an entire viral genome was realized, based on a limited set of protein sequences with high sequence identity within genotypes. The identified coevolving residues constitute highly relevant predictions of protein-protein interactions for further experimental identification of HCV protein complexes. The method can be used to analyse other viral genomes and to predict the associated protein interaction networks. PMID:27198619

  20. Evolution Analysis of Simple Sequence Repeats in Plant Genome

    PubMed Central

    Qin, Zhen; Wang, Yanping; Wang, Qingmei; Li, Aixian; Hou, Fuyun; Zhang, Liming

    2015-01-01

    Simple sequence repeats (SSRs) are widespread units on genome sequences, and play many important roles in plants. In order to reveal the evolution of plant genomes, we investigated the evolutionary regularities of SSRs during the evolution of plant species and the plant kingdom by analysis of twelve sequenced plant genome sequences. First, in the twelve studied plant genomes, the main SSRs were those which contain repeats of 1–3 nucleotides combination. Second, in mononucleotide SSRs, the A/T percentage gradually increased along with the evolution of plants (except for P. patens). With the increase of SSRs repeat number the percentage of A/T in C. reinhardtii had no significant change, while the percentage of A/T in terrestrial plants species gradually declined. Third, in dinucleotide SSRs, the percentage of AT/TA increased along with the evolution of plant kingdom and the repeat number increased in terrestrial plants species. This trend was more obvious in dicotyledon than monocotyledon. The percentage of CG/GC showed the opposite pattern to the AT/TA. Forth, in trinucleotide SSRs, the percentages of combinations including two or three A/T were in a rising trend along with the evolution of plant kingdom; meanwhile with the increase of SSRs repeat number in plants species, different species chose different combinations as dominant SSRs. SSRs in C. reinhardtii, P. patens, Z. mays and A. thaliana showed their specific patterns related to evolutionary position or specific changes of genome sequences. The results showed that, SSRs not only had the general pattern in the evolution of plant kingdom, but also were associated with the evolution of the specific genome sequence. The study of the evolutionary regularities of SSRs provided new insights for the analysis of the plant genome evolution. PMID:26630570

  1. FusoBase: an online Fusobacterium comparative genomic analysis platform

    PubMed Central

    Ang, Mia Yang; Heydari, Hamed; Jakubovics, Nick S.; Mahmud, Mahafizul Imran; Dutta, Avirup; Wee, Wei Yee; Wong, Guat Jah; Mutha, Naresh V.R.; Tan, Shi Yang; Choo, Siew Woh

    2014-01-01

    Fusobacterium are anaerobic gram-negative bacteria that have been associated with a wide spectrum of human infections and diseases. As the biology of Fusobacterium is still not well understood, comparative genomic analysis on members of this species will provide further insights on their taxonomy, phylogeny, pathogenicity and other information that may contribute to better management of infections and diseases. To facilitate the ongoing genomic research on Fusobacterium, a specialized database with easy-to-use analysis tools is necessary. Here we present FusoBase, an online database providing access to genome-wide annotated sequences of Fusobacterium strains as well as bioinformatics tools, to support the expanding scientific community. Using our custom-developed Pairwise Genome Comparison tool, we demonstrate how differences between two user-defined genomes and how insertion of putative prophages can be identified. In addition, Pathogenomics Profiling Tool is capable of clustering predicted genes across Fusobacterium strains and visualizing the results in the form of a heat map with dendrogram. Database URL: http://fusobacterium.um.edu.my. PMID:25149689

  2. Castor Bean Organelle Genome Sequencing and Worldwide Genetic Diversity Analysis

    PubMed Central

    Chan, Agnes P.; Williams, Amber L.; Rice, Danny W.; Liu, Xinyue; Melake-Berhan, Admasu; Huot Creasy, Heather; Puiu, Daniela; Rosovitz, M. J.; Khouri, Hoda M.; Beckstrom-Sternberg, Stephen M.; Allan, Gerard J.; Keim, Paul; Ravel, Jacques; Rabinowicz, Pablo D.

    2011-01-01

    Castor bean is an important oil-producing plant in the Euphorbiaceae family. Its high-quality oil contains up to 90% of the unusual fatty acid ricinoleate, which has many industrial and medical applications. Castor bean seeds also contain ricin, a highly toxic Type 2 ribosome-inactivating protein, which has gained relevance in recent years due to biosafety concerns. In order to gain knowledge on global genetic diversity in castor bean and to ultimately help the development of breeding and forensic tools, we carried out an extensive chloroplast sequence diversity analysis. Taking advantage of the recently published genome sequence of castor bean, we assembled the chloroplast and mitochondrion genomes extracting selected reads from the available whole genome shotgun reads. Using the chloroplast reference genome we used the methylation filtration technique to readily obtain draft genome sequences of 7 geographically and genetically diverse castor bean accessions. These sequence data were used to identify single nucleotide polymorphism markers and phylogenetic analysis resulted in the identification of two major clades that were not apparent in previous population genetic studies using genetic markers derived from nuclear DNA. Two distinct sub-clades could be defined within each major clade and large-scale genotyping of castor bean populations worldwide confirmed previously observed low levels of genetic diversity and showed a broad geographic distribution of each sub-clade. PMID:21750729

  3. Clinical Analysis and Interpretation of Cancer Genome Data

    PubMed Central

    Van Allen, Eliezer M.; Wagle, Nikhil; Levy, Mia A.

    2013-01-01

    The scale of tumor genomic profiling is rapidly outpacing human cognitive capacity to make clinical decisions without the aid of tools. New frameworks are needed to help researchers and clinicians process the information emerging from the explosive growth in both the number of tumor genetic variants routinely tested and the respective knowledge to interpret their clinical significance. We review the current state, limitations, and future trends in methods to support the clinical analysis and interpretation of cancer genomes. This includes the processes of genome-scale variant identification, including tools for sequence alignment, tumor–germline comparison, and molecular annotation of variants. The process of clinical interpretation of tumor variants includes classification of the effect of the variant, reporting the results to clinicians, and enabling the clinician to make a clinical decision based on the genomic information integrated with other clinical features. We describe existing knowledge bases, databases, algorithms, and tools for identification and visualization of tumor variants and their actionable subsets. With the decreasing cost of tumor gene mutation testing and the increasing number of actionable therapeutics, we expect the methods for analysis and interpretation of cancer genomes to continue to evolve to meet the needs of patient-centered clinical decision making. The science of computational cancer medicine is still in its infancy; however, there is a clear need to continue the development of knowledge bases, best practices, tools, and validation experiments for successful clinical implementation in oncology. PMID:23589549

  4. Genomic Analysis at the Single-Cell Level

    PubMed Central

    Kalisky, Tomer; Blainey, Paul; Quake, Stephen R.

    2013-01-01

    Studying complex biological systems such as a developing embryo, a tumor, or a microbial ecosystem often involves understanding the behavior and heterogeneity of the individual cells that constitute the system and their interactions. In this review, we discuss a variety of approaches to single-cell genomic analysis. PMID:21942365

  5. GENOMIC ANALYSIS OF THE TESTICULAR TOXICITY OF HALOACETIC ACIDS

    EPA Science Inventory

    Genomic analysis of the testicular toxicity of haloacetic acids

    David J. Dix and John C. Rockett
    Reproductive Toxicology Division, National Health and Environmental Effects Research Laboratory, Office of Research and Development, U.S. Environmental Protection Agency, R...

  6. Thyroid insufficiency in developing rat brain: A genomic analysis.

    EPA Science Inventory

    Thyroid Insufficiency in the Developing Rat Brain: A Genomic Analysis. JE Royland and ME Gilbert, Neurotox. Div., U.S. EPA, RTP, NC, USA. Endocrine disruption (ED) is an area of major concern in environmental neurotoxicity. Severe deficits in thyroid hormone (TH) levels have bee...

  7. Integrated translational genomics for analysis of complex traits in sorghum

    Technology Transfer Automated Retrieval System (TEKTRAN)

    We will report on the integration of sequencing and genotype data from natural variation (by whole genome resequencing [wgs] or genotype by sequencing [gbs]), transcriptome (RNA-seq) and mutant analysis (also by wgs) with the goal of identifying genes controlling important agronomic traits and tran...

  8. Correlation between DNA ploidy, metaphase high-resolution comparative genomic hybridization results and clinical outcome of synovial sarcoma

    PubMed Central

    2011-01-01

    Background Although synovial sarcoma is the 3rd most commonly occurring mesenchymal tumor in young adults, usually with a highly aggressive clinical course; remarkable differences can be seen regarding the clinical outcome. According to comparative genomic hybridization (CGH) data published in the literature, the simple and complex karyotypes show a correlation between the prognosis and clinical outcome. In addition, the connection between DNA ploidy and clinical course is controversial. The aim of this study was using a fine-tuning interpretation of our DNA ploidy results and to compare these with metaphase high-resolution CGH (HR-CGH) results. Methods DNA ploidy was determined on Feulgen-stained smears in 56 synovial sarcoma cases by image cytometry; follow up was available in 46 cases (average: 78 months). In 9 cases HR-CGH analysis was also available. Results 10 cases were found DNA-aneuploid, 46 were DNA-diploid by image cytometry. With fine-tuning of the diploid cases according to the 5c exceeding events (single cell aneuploidy), 33 cases were so called "simple-diploid" (without 5c exceeding events) and 13 cases were "complex-diploid"; containing 5c exceeding events (any number). Aneuploid tumors contained large numbers of genetic alterations with the sum gain of at least 2 chromosomes (A-, B- or C-group) detected by HR-CGH. In the "simple-diploid" cases no or few genetic alterations could be detected, whereas the "complex-diploid" samples numerous aberrations (equal or more than 3) could be found. Conclusions Our results show a correlation between the DNA-ploidy, a fine-tuned DNA-ploidy and the HR-CGH results. Furthermore, we found significant correlation between the different ploidy groups and the clinical outcome (p < 0.05). PMID:22053830

  9. Local sequence assembly reveals a high-resolution profile of somatic structural variations in 97 cancer genomes

    PubMed Central

    Zhuang, Jiali; Weng, Zhiping

    2015-01-01

    Genomic structural variations (SVs) are pervasive in many types of cancers. Characterizing their underlying mechanisms and potential molecular consequences is crucial for understanding the basic biology of tumorigenesis. Here, we engineered a local assembly-based algorithm (laSV) that detects SVs with high accuracy from paired-end high-throughput genomic sequencing data and pinpoints their breakpoints at single base-pair resolution. By applying laSV to 97 tumor-normal paired genomic sequencing datasets across six cancer types produced by The Cancer Genome Atlas Research Network, we discovered that non-allelic homologous recombination is the primary mechanism for generating somatic SVs in acute myeloid leukemia. This finding contrasts with results for the other five types of solid tumors, in which non-homologous end joining and microhomology end joining are the predominant mechanisms. We also found that the genes recursively mutated by single nucleotide alterations differed from the genes recursively mutated by SVs, suggesting that these two types of genetic alterations play different roles during cancer progression. We further characterized how the gene structures of the oncogene JAK1 and the tumor suppressors KDM6A and RB1 are affected by somatic SVs and discussed the potential functional implications of intergenic SVs. PMID:26283183

  10. Functional Genomic Analysis of C. elegans Molting

    PubMed Central

    Frand, Alison R; Russel, Sascha

    2005-01-01

    Although the molting cycle is a hallmark of insects and nematodes, neither the endocrine control of molting via size, stage, and nutritional inputs nor the enzymatic mechanism for synthesis and release of the exoskeleton is well understood. Here, we identify endocrine and enzymatic regulators of molting in C. elegans through a genome-wide RNA-interference screen. Products of the 159 genes discovered include annotated transcription factors, secreted peptides, transmembrane proteins, and extracellular matrix enzymes essential for molting. Fusions between several genes and green fluorescent protein show a pulse of expression before each molt in epithelial cells that synthesize the exoskeleton, indicating that the corresponding proteins are made in the correct time and place to regulate molting. We show further that inactivation of particular genes abrogates expression of the green fluorescent protein reporter genes, revealing regulatory networks that might couple the expression of genes essential for molting to endocrine cues. Many molting genes are conserved in parasitic nematodes responsible for human disease, and thus represent attractive targets for pesticide and pharmaceutical development. PMID:16122351

  11. Comparative genome analysis and genome-guided physiological analysis of Roseobacter litoralis

    PubMed Central

    2011-01-01

    Background Roseobacter litoralis OCh149, the type species of the genus, and Roseobacter denitrificans OCh114 were the first described organisms of the Roseobacter clade, an ecologically important group of marine bacteria. Both species were isolated from seaweed and are able to perform aerobic anoxygenic photosynthesis. Results The genome of R. litoralis OCh149 contains one circular chromosome of 4,505,211 bp and three plasmids of 93,578 bp (pRLO149_94), 83,129 bp (pRLO149_83) and 63,532 bp (pRLO149_63). Of the 4537 genes predicted for R. litoralis, 1122 (24.7%) are not present in the genome of R. denitrificans. Many of the unique genes of R. litoralis are located in genomic islands and on plasmids. On pRLO149_83 several potential heavy metal resistance genes are encoded which are not present in the genome of R. denitrificans. The comparison of the heavy metal tolerance of the two organisms showed an increased zinc tolerance of R. litoralis. In contrast to R. denitrificans, the photosynthesis genes of R. litoralis are plasmid encoded. The activity of the photosynthetic apparatus was confirmed by respiration rate measurements, indicating a growth-phase dependent response to light. Comparative genomics with other members of the Roseobacter clade revealed several genomic regions that were only conserved in the two Roseobacter species. One of those regions encodes a variety of genes that might play a role in host association of the organisms. The catabolism of different carbon and nitrogen sources was predicted from the genome and combined with experimental data. In several cases, e.g. the degradation of some algal osmolytes and sugars, the genome-derived predictions of the metabolic pathways in R. litoralis differed from the phenotype. Conclusions The genomic differences between the two Roseobacter species are mainly due to lateral gene transfer and genomic rearrangements. Plasmid pRLO149_83 contains predominantly recently acquired genetic material whereas pRLO149

  12. Detecting Genomic Signatures of Natural Selection with Principal Component Analysis: Application to the 1000 Genomes Data

    PubMed Central

    Duforet-Frebourg, Nicolas; Luu, Keurcien; Laval, Guillaume; Bazin, Eric; Blum, Michael G.B.

    2016-01-01

    To characterize natural selection, various analytical methods for detecting candidate genomic regions have been developed. We propose to perform genome-wide scans of natural selection using principal component analysis (PCA). We show that the common FST index of genetic differentiation between populations can be viewed as the proportion of variance explained by the principal components. Considering the correlations between genetic variants and each principal component provides a conceptual framework to detect genetic variants involved in local adaptation without any prior definition of populations. To validate the PCA-based approach, we consider the 1000 Genomes data (phase 1) considering 850 individuals coming from Africa, Asia, and Europe. The number of genetic variants is of the order of 36 millions obtained with a low-coverage sequencing depth (3×). The correlations between genetic variation and each principal component provide well-known targets for positive selection (EDAR, SLC24A5, SLC45A2, DARC), and also new candidate genes (APPBPP2, TP1A1, RTTN, KCNMA, MYO5C) and noncoding RNAs. In addition to identifying genes involved in biological adaptation, we identify two biological pathways involved in polygenic adaptation that are related to the innate immune system (beta defensins) and to lipid metabolism (fatty acid omega oxidation). An additional analysis of European data shows that a genome scan based on PCA retrieves classical examples of local adaptation even when there are no well-defined populations. PCA-based statistics, implemented in the PCAdapt R package and the PCAdapt fast open-source software, retrieve well-known signals of human adaptation, which is encouraging for future whole-genome sequencing project, especially when defining populations is difficult. PMID:26715629

  13. Detecting Genomic Signatures of Natural Selection with Principal Component Analysis: Application to the 1000 Genomes Data.

    PubMed

    Duforet-Frebourg, Nicolas; Luu, Keurcien; Laval, Guillaume; Bazin, Eric; Blum, Michael G B

    2016-04-01

    To characterize natural selection, various analytical methods for detecting candidate genomic regions have been developed. We propose to perform genome-wide scans of natural selection using principal component analysis (PCA). We show that the common FST index of genetic differentiation between populations can be viewed as the proportion of variance explained by the principal components. Considering the correlations between genetic variants and each principal component provides a conceptual framework to detect genetic variants involved in local adaptation without any prior definition of populations. To validate the PCA-based approach, we consider the 1000 Genomes data (phase 1) considering 850 individuals coming from Africa, Asia, and Europe. The number of genetic variants is of the order of 36 millions obtained with a low-coverage sequencing depth (3×). The correlations between genetic variation and each principal component provide well-known targets for positive selection (EDAR, SLC24A5, SLC45A2, DARC), and also new candidate genes (APPBPP2, TP1A1, RTTN, KCNMA, MYO5C) and noncoding RNAs. In addition to identifying genes involved in biological adaptation, we identify two biological pathways involved in polygenic adaptation that are related to the innate immune system (beta defensins) and to lipid metabolism (fatty acid omega oxidation). An additional analysis of European data shows that a genome scan based on PCA retrieves classical examples of local adaptation even when there are no well-defined populations. PCA-based statistics, implemented in the PCAdapt R package and the PCAdapt fast open-source software, retrieve well-known signals of human adaptation, which is encouraging for future whole-genome sequencing project, especially when defining populations is difficult. PMID:26715629

  14. Continuous-wave terahertz scanning image resolution analysis and restoration

    NASA Astrophysics Data System (ADS)

    Li, Qi; Yin, Qiguo; Yao, Rui; Ding, Shenghui; Wang, Qi

    2010-03-01

    Resolution of continuous-wave (CW) terahertz scanning image is limited by many factors among which the aperture effect of finite focus diameter is very important. We have investigated the factors that affect terahertz (THz) image resolution in details through theory analysis and simulation. On the other hand, in order to enhance THz image resolution, Richardson-Lucy algorithm has been introduced as a promising approach to improve image details. By analyzing the imaging theory, it is proposed that intensity distribution function of actual THz laser focal spot can be approximatively used as point spread function (PSF) in the restoration algorithm. The focal spot image could be obtained by applying the pyroelectric camera, and mean filtering result of the focal spot image is used as the PSF. Simulation and experiment show that the algorithm implemented is comparatively effective.

  15. The Cancer Genome Atlas Pan-Cancer analysis project.

    PubMed

    Weinstein, John N; Collisson, Eric A; Mills, Gordon B; Shaw, Kenna R Mills; Ozenberger, Brad A; Ellrott, Kyle; Shmulevich, Ilya; Sander, Chris; Stuart, Joshua M

    2013-10-01

    The Cancer Genome Atlas (TCGA) Research Network has profiled and analyzed large numbers of human tumors to discover molecular aberrations at the DNA, RNA, protein and epigenetic levels. The resulting rich data provide a major opportunity to develop an integrated picture of commonalities, differences and emergent themes across tumor lineages. The Pan-Cancer initiative compares the first 12 tumor types profiled by TCGA. Analysis of the molecular aberrations and their functional roles across tumor types will teach us how to extend therapies effective in one cancer type to others with a similar genomic profile. PMID:24071849

  16. Genomic-Wide Analysis with Microarrays in Human Oncology

    PubMed Central

    Inaoka, Kenichi; Inokawa, Yoshikuni; Nomoto, Shuji

    2015-01-01

    DNA microarray technologies have advanced rapidly and had a profound impact on examining gene expression on a genomic scale in research. This review discusses the history and development of microarray and DNA chip devices, and specific microarrays are described along with their methods and applications. In particular, microarrays have detected many novel cancer-related genes by comparing cancer tissues and non-cancerous tissues in oncological research. Recently, new methods have been in development, such as the double-combination array and triple-combination array, which allow more effective analysis of gene expression and epigenetic changes. Analysis of gene expression alterations in precancerous regions compared with normal regions and array analysis in drug-resistance cancer tissues are also successfully performed. Compared with next-generation sequencing, a similar method of genome analysis, several important differences distinguish these techniques and their applications. Development of novel microarray technologies is expected to contribute to further cancer research.

  17. Analysis of recent segmental duplications in the bovine genome

    PubMed Central

    2009-01-01

    Background Duplicated sequences are an important source of gene innovation and structural variation within mammalian genomes. We performed the first systematic and genome-wide analysis of segmental duplications in the modern domesticated cattle (Bos taurus). Using two distinct computational analyses, we estimated that 3.1% (94.4 Mb) of the bovine genome consists of recently duplicated sequences (≥ 1 kb in length, ≥ 90% sequence identity). Similar to other mammalian draft assemblies, almost half (47% of 94.4 Mb) of these sequences have not been assigned to cattle chromosomes. Results In this study, we provide the first experimental validation large duplications and briefly compared their distribution on two independent bovine genome assemblies using fluorescent in situ hybridization (FISH). Our analyses suggest that the (75-90%) of segmental duplications are organized into local tandem duplication clusters. Along with rodents and carnivores, these results now confidently establish tandem duplications as the most likely mammalian archetypical organization, in contrast to humans and great ape species which show a preponderance of interspersed duplications. A cross-species survey of duplicated genes and gene families indicated that duplication, positive selection and gene conversion have shaped primates, rodents, carnivores and ruminants to different degrees for their speciation and adaptation. We identified that bovine segmental duplications corresponding to genes are significantly enriched for specific biological functions such as immunity, digestion, lactation and reproduction. Conclusion Our results suggest that in most mammalian lineages segmental duplications are organized in a tandem configuration. Segmental duplications remain problematic for genome and assembly and we highlight genic regions that require higher quality sequence characterization. This study provides insights into mammalian genome evolution and generates a valuable resource for cattle

  18. Sequencing and Analysis of Neanderthal Genomic DNA

    SciTech Connect

    Noonan, James P.; Coop, Graham; Kudaravalli, Sridhar; Smith,Doug; Krause, Johannes; Alessi, Joe; Chen, Feng; Platt, Darren; Paabo,Svante; Pritchard, Jonathan K.; Rubin, Edward M.

    2006-06-13

    Recovery and analysis of multiple Neanderthal autosomalsequences using a metagenomic approach reveals that modern humans andNeanderthals split ~;400,000 years ago, without significant evidence ofsubsequent admixture.

  19. Genome analysis of the platypus reveals unique signatures of evolution.

    PubMed

    Warren, Wesley C; Hillier, LaDeana W; Marshall Graves, Jennifer A; Birney, Ewan; Ponting, Chris P; Grützner, Frank; Belov, Katherine; Miller, Webb; Clarke, Laura; Chinwalla, Asif T; Yang, Shiaw-Pyng; Heger, Andreas; Locke, Devin P; Miethke, Pat; Waters, Paul D; Veyrunes, Frédéric; Fulton, Lucinda; Fulton, Bob; Graves, Tina; Wallis, John; Puente, Xose S; López-Otín, Carlos; Ordóñez, Gonzalo R; Eichler, Evan E; Chen, Lin; Cheng, Ze; Deakin, Janine E; Alsop, Amber; Thompson, Katherine; Kirby, Patrick; Papenfuss, Anthony T; Wakefield, Matthew J; Olender, Tsviya; Lancet, Doron; Huttley, Gavin A; Smit, Arian F A; Pask, Andrew; Temple-Smith, Peter; Batzer, Mark A; Walker, Jerilyn A; Konkel, Miriam K; Harris, Robert S; Whittington, Camilla M; Wong, Emily S W; Gemmell, Neil J; Buschiazzo, Emmanuel; Vargas Jentzsch, Iris M; Merkel, Angelika; Schmitz, Juergen; Zemann, Anja; Churakov, Gennady; Kriegs, Jan Ole; Brosius, Juergen; Murchison, Elizabeth P; Sachidanandam, Ravi; Smith, Carly; Hannon, Gregory J; Tsend-Ayush, Enkhjargal; McMillan, Daniel; Attenborough, Rosalind; Rens, Willem; Ferguson-Smith, Malcolm; Lefèvre, Christophe M; Sharp, Julie A; Nicholas, Kevin R; Ray, David A; Kube, Michael; Reinhardt, Richard; Pringle, Thomas H; Taylor, James; Jones, Russell C; Nixon, Brett; Dacheux, Jean-Louis; Niwa, Hitoshi; Sekita, Yoko; Huang, Xiaoqiu; Stark, Alexander; Kheradpour, Pouya; Kellis, Manolis; Flicek, Paul; Chen, Yuan; Webber, Caleb; Hardison, Ross; Nelson, Joanne; Hallsworth-Pepin, Kym; Delehaunty, Kim; Markovic, Chris; Minx, Pat; Feng, Yucheng; Kremitzki, Colin; Mitreva, Makedonka; Glasscock, Jarret; Wylie, Todd; Wohldmann, Patricia; Thiru, Prathapan; Nhan, Michael N; Pohl, Craig S; Smith, Scott M; Hou, Shunfeng; Nefedov, Mikhail; de Jong, Pieter J; Renfree, Marilyn B; Mardis, Elaine R; Wilson, Richard K

    2008-05-01

    We present a draft genome sequence of the platypus, Ornithorhynchus anatinus. This monotreme exhibits a fascinating combination of reptilian and mammalian characters. For example, platypuses have a coat of fur adapted to an aquatic lifestyle; platypus females lactate, yet lay eggs; and males are equipped with venom similar to that of reptiles. Analysis of the first monotreme genome aligned these features with genetic innovations. We find that reptile and platypus venom proteins have been co-opted independently from the same gene families; milk protein genes are conserved despite platypuses laying eggs; and immune gene family expansions are directly related to platypus biology. Expansions of protein, non-protein-coding RNA and microRNA families, as well as repeat elements, are identified. Sequencing of this genome now provides a valuable resource for deep mammalian comparative analyses, as well as for monotreme biology and conservation. PMID:18464734

  20. Genome analysis of the platypus reveals unique signatures of evolution

    PubMed Central

    Warren, Wesley C.; Hillier, LaDeana W.; Marshall Graves, Jennifer A.; Birney, Ewan; Ponting, Chris P.; Grützner, Frank; Belov, Katherine; Miller, Webb; Clarke, Laura; Chinwalla, Asif T.; Yang, Shiaw-Pyng; Heger, Andreas; Locke, Devin P.; Miethke, Pat; Waters, Paul D.; Veyrunes, Frédéric; Fulton, Lucinda; Fulton, Bob; Graves, Tina; Wallis, John; Puente, Xose S.; López-Otín, Carlos; Ordóñez, Gonzalo R.; Eichler, Evan E.; Chen, Lin; Cheng, Ze; Deakin, Janine E.; Alsop, Amber; Thompson, Katherine; Kirby, Patrick; Papenfuss, Anthony T.; Wakefield, Matthew J.; Olender, Tsviya; Lancet, Doron; Huttley, Gavin A.; Smit, Arian F. A.; Pask, Andrew; Temple-Smith, Peter; Batzer, Mark A.; Walker, Jerilyn A.; Konkel, Miriam K.; Harris, Robert S.; Whittington, Camilla M.; Wong, Emily S. W.; Gemmell, Neil J.; Buschiazzo, Emmanuel; Vargas Jentzsch, Iris M.; Merkel, Angelika; Schmitz, Juergen; Zemann, Anja; Churakov, Gennady; Kriegs, Jan Ole; Brosius, Juergen; Murchison, Elizabeth P.; Sachidanandam, Ravi; Smith, Carly; Hannon, Gregory J.; Tsend-Ayush, Enkhjargal; McMillan, Daniel; Attenborough, Rosalind; Rens, Willem; Ferguson-Smith, Malcolm; Lefèvre, Christophe M.; Sharp, Julie A.; Nicholas, Kevin R.; Ray, David A.; Kube, Michael; Reinhardt, Richard; Pringle, Thomas H.; Taylor, James; Jones, Russell C.; Nixon, Brett; Dacheux, Jean-Louis; Niwa, Hitoshi; Sekita, Yoko; Huang, Xiaoqiu; Stark, Alexander; Kheradpour, Pouya; Kellis, Manolis; Flicek, Paul; Chen, Yuan; Webber, Caleb; Hardison, Ross; Nelson, Joanne; Hallsworth-Pepin, Kym; Delehaunty, Kim; Markovic, Chris; Minx, Pat; Feng, Yucheng; Kremitzki, Colin; Mitreva, Makedonka; Glasscock, Jarret; Wylie, Todd; Wohldmann, Patricia; Thiru, Prathapan; Nhan, Michael N.; Pohl, Craig S.; Smith, Scott M.; Hou, Shunfeng; Renfree, Marilyn B.; Mardis, Elaine R.; Wilson, Richard K.

    2009-01-01

    We present a draft genome sequence of the platypus, Ornithorhynchus anatinus. This monotreme exhibits a fascinating combination of reptilian and mammalian characters. For example, platypuses have a coat of fur adapted to an aquatic lifestyle; platypus females lactate, yet lay eggs; and males are equipped with venom similar to that of reptiles. Analysis of the first monotreme genome aligned these features with genetic innovations. We find that reptile and platypus venom proteins have been co-opted independently from the same gene families; milk protein genes are conserved despite platypuses laying eggs; and immune gene family expansions are directly related to platypus biology. Expansions of protein, non-protein-coding RNA and microRNA families, as well as repeat elements, are identified. Sequencing of this genome now provides a valuable resource for deep mammalian comparative analyses, as well as for monotreme biology and conservation. PMID:18464734

  1. Phylogeny and comparative genome analysis of a Basidiomycete fungi

    SciTech Connect

    Riley, Robert W.; Salamov, Asaf; Grigoriev, Igor; Hibbett, David

    2011-03-14

    Fungi of the phylum Basidiomycota, make up some 37percent of the described fungi, and are important from the perspectives of forestry, agriculture, medicine, and bioenergy. This diverse phylum includes the mushrooms, wood rots, plant pathogenic rusts and smuts, and some human pathogens. To better understand these important fungi, we have undertaken a comparative genomic analysis of the Basidiomycetes with available sequenced genomes. We report a phylogeny that sheds light on previously unclear evolutionary relationships among the Basidiomycetes. We also define a `core proteome? based on protein families conserved in all Basidiomycetes. We identify key expansions and contractions in protein families that may be responsible for the degradation of plant biomass such as cellulose, hemicellulose, and lignin. Finally, we speculate as to the genomic changes that drove such expansions and contractions.

  2. Whole-genome CNV analysis: advances in computational approaches

    PubMed Central

    Pirooznia, Mehdi; Goes, Fernando S.; Zandi, Peter P.

    2015-01-01

    Accumulating evidence indicates that DNA copy number variation (CNV) is likely to make a significant contribution to human diversity and also play an important role in disease susceptibility. Recent advances in genome sequencing technologies have enabled the characterization of a variety of genomic features, including CNVs. This has led to the development of several bioinformatics approaches to detect CNVs from next-generation sequencing data. Here, we review recent advances in CNV detection from whole genome sequencing. We discuss the informatics approaches and current computational tools that have been developed as well as their strengths and limitations. This review will assist researchers and analysts in choosing the most suitable tools for CNV analysis as well as provide suggestions for new directions in future development. PMID:25918519

  3. Comparative Analysis of Genome Diversity in Bullmastiff Dogs

    PubMed Central

    Mortlock, Sally-Anne; Khatkar, Mehar S.; Williamson, Peter

    2016-01-01

    Management and preservation of genomic diversity in dog breeds is a major objective for maintaining health. The present study was undertaken to characterise genomic diversity in Bullmastiff dogs using both genealogical and molecular analysis. Genealogical analysis of diversity was conducted using a database consisting of 16,378 Bullmastiff pedigrees from year 1980 to 2013. Additionally, a total of 188 Bullmastiff dogs were genotyped using the 170,000 SNP Illumina CanineHD Beadchip. Genealogical parameters revealed a mean inbreeding coefficient of 0.047; 142 total founders (f); an effective number of founders (fe) of 79; an effective number of ancestors (fa) of 62; and an effective population size of the reference population of 41. Genetic diversity and the degree of genome-wide homogeneity within the breed were also investigated using molecular data. Multiple-locus heterozygosity (MLH) was equal to 0.206; runs of homozygosity (ROH) as proportion of the genome, averaged 16.44%; effective population size was 29.1, with an average inbreeding coefficient of 0.035, all estimated using SNP Data. Fine-scale population structure was analysed using NETVIEW, a population analysis pipeline. Visualisation of the high definition network captured relationships among individuals within and between subpopulations. Effects of unequal founder use, and ancestral inbreeding and selection, were evident. While current levels of Bullmastiff heterozygosity, inbreeding and homozygosity are not unusual, a relatively small effective population size indicates that a breeding strategy to reduce the inbreeding rate may be beneficial. PMID:26824579

  4. Comparative Analysis of Genome Diversity in Bullmastiff Dogs.

    PubMed

    Mortlock, Sally-Anne; Khatkar, Mehar S; Williamson, Peter

    2016-01-01

    Management and preservation of genomic diversity in dog breeds is a major objective for maintaining health. The present study was undertaken to characterise genomic diversity in Bullmastiff dogs using both genealogical and molecular analysis. Genealogical analysis of diversity was conducted using a database consisting of 16,378 Bullmastiff pedigrees from year 1980 to 2013. Additionally, a total of 188 Bullmastiff dogs were genotyped using the 170,000 SNP Illumina CanineHD Beadchip. Genealogical parameters revealed a mean inbreeding coefficient of 0.047; 142 total founders (f); an effective number of founders (fe) of 79; an effective number of ancestors (fa) of 62; and an effective population size of the reference population of 41. Genetic diversity and the degree of genome-wide homogeneity within the breed were also investigated using molecular data. Multiple-locus heterozygosity (MLH) was equal to 0.206; runs of homozygosity (ROH) as proportion of the genome, averaged 16.44%; effective population size was 29.1, with an average inbreeding coefficient of 0.035, all estimated using SNP Data. Fine-scale population structure was analysed using NETVIEW, a population analysis pipeline. Visualisation of the high definition network captured relationships among individuals within and between subpopulations. Effects of unequal founder use, and ancestral inbreeding and selection, were evident. While current levels of Bullmastiff heterozygosity, inbreeding and homozygosity are not unusual, a relatively small effective population size indicates that a breeding strategy to reduce the inbreeding rate may be beneficial. PMID:26824579

  5. Genomic Sequencing and Analysis of Sucra jujuba Nucleopolyhedrovirus

    PubMed Central

    Liu, Xiaoping; Yin, Feifei; Zhu, Zheng; Hou, Dianhai; Wang, Jun; Zhang, Lei; Wang, Manli; Wang, Hualin; Hu, Zhihong; Deng, Fei

    2014-01-01

    The complete nucleotide sequence of Sucra jujuba nucleopolyhedrovirus (SujuNPV) was determined by 454 pyrosequencing. The SujuNPV genome was 135,952 bp in length with an A+T content of 61.34%. It contained 131 putative open reading frames (ORFs) covering 87.9% of the genome. Among these ORFs, 37 were conserved in all baculovirus genomes that have been completely sequenced, 24 were conserved in lepidopteran baculoviruses, 65 were found in other baculoviruses, and 5 were unique to the SujuNPV genome. Seven homologous regions (hrs) were identified in the SujuNPV genome. SujuNPV contained several genes that were duplicated or copied multiple times: two copies of helicase, DNA binding protein gene (dbp), p26 and cg30, three copies of the inhibitor of the apoptosis gene (iap), and four copies of the baculovirus repeated ORF (bro). Phylogenetic analysis suggested that SujuNPV belongs to a subclade of group II alphabaculovirus, which differs from other baculoviruses in that all nine members of this subclade contain a second copy of dbp. PMID:25329074

  6. Genome-Assisted Analysis of Dissimilatory Metal-Reducing Bacteria

    SciTech Connect

    Fredrickson, Jim K.; Romine, Margaret F.

    2005-06-01

    Whole genome sequence for Shewanella oneidensis and Geobacter sulfurreducens has provided numerous new biological insights into the function of these model dissimilatory metal-reducing bacteria. Many of the discoveries, including the identification of a high number of c-type cytochromes in both organisms, have been the result of comparative genomic analyses including several that were experimentally confirmed. Genome sequence has also aided the identification of genes important for the reduction of metal ions and other electron acceptors utilized by these organisms during anaerobic growth by facilitating the identification of genes disrupted by random insertions. Technologies for assaying global expression patterns for genes (mRNA) and proteins have also been enabled by the availability of genome sequence but their application has been limited mainly to the analysis of the role of global regulatory genes and to identifying genes expressed or repressed in response to specific electron acceptors. It is anticipated that details regarding the mechanisms of metal ion respiration, and metabolism in general, will eventually be revealed by comprehensive, systems-level analyses enabled by functional genomic analyses.

  7. Hierarchical structure analysis describing abnormal base composition of genomes

    NASA Astrophysics Data System (ADS)

    Ouyang, Zhengqing; Liu, Jian-Kun; She, Zhen-Su

    2005-10-01

    Abnormal base compositional patterns of genomic DNA sequences are studied in the framework of a hierarchical structure (HS) model originally proposed for the study of fully developed turbulence [She and Lévêque, Phys. Rev. Lett. 72, 336 (1994)]. The HS similarity law is verified over scales between 103bp and 105bp , and the HS parameter β is proposed to describe the degree of heterogeneity in the base composition patterns. More than one hundred bacteria, archaea, virus, yeast, and human genome sequences have been analyzed and the results show that the HS analysis efficiently captures abnormal base composition patterns, and the parameter β is a characteristic measure of the genome. Detailed examination of the values of β reveals an intriguing link to the evolutionary events of genetic material transfer. Finally, a sequence complexity (S) measure is proposed to characterize gradual increase of organizational complexity of the genome during the evolution. The present study raises several interesting issues in the evolutionary history of genomes.

  8. Sequence Analysis of the Genome of Carnation (Dianthus caryophyllus L.)

    PubMed Central

    Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi

    2014-01-01

    The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. ‘Francesco’ was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568 887 315 bp, consisting of 45 088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16 644 bp and 60 737 bp, respectively, and the longest scaffold was 1 287 144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ∼98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp. PMID:24344172

  9. Phylogenetic Analysis and Comparative Genomics of Purine Riboswitch Distribution in Prokaryotes

    PubMed Central

    Singh, Payal; Sengupta, Supratim

    2012-01-01

    Riboswitches are regulatory RNA that control gene expression by undergoing conformational changes on ligand binding. Using phylogenetic analysis and comparative genomics we have been able to identify the class of genes/operons regulated by the purine riboswitch and obtain a high-resolution map of purine riboswitch distribution across all bacterial groups. In the process, we are able to explain the absence of purine riboswitches upstream to specific genes in certain genomes. We also identify the point of origin of various purine riboswitches and argue that not all purine riboswitches are of primordial origin, and that some purine riboswitches must have originated after the divergence of certain Firmicute orders in the course of evolution. Our study also reveals the role of horizontal transfer events in accounting for the presence of purine riboswitches in some gammaproteobacterial species. Our work provides significant insights into the origin, distribution and regulatory role of purine riboswitches in prokaryotes. PMID:23170063

  10. A 1-Mb resolution radiation hybrid map of the canine genome

    PubMed Central

    Guyon, Richard; Lorentzen, Travis D.; Hitte, Christophe; Kim, Lisa; Cadieu, Edouard; Parker, Heidi G.; Quignon, Pascale; Lowe, Jennifer K.; Renier, Corinne; Gelfenbeyn, Boris; Vignaux, Françoise; DeFrance, Hawkins B.; Gloux, Stephanie; Mahairas, Gregory G.; André, Catherine; Galibert, Francis; Ostrander, Elaine A.

    2003-01-01

    The purebred dog population consists of >300 partially inbred genetic isolates or breeds. Restriction of gene flow between breeds, together with strong selection for traits, has led to the establishment of a unique resource for dissecting the genetic basis of simple and complex mammalian traits. Toward this end, we present a comprehensive radiation hybrid map of the canine genome composed of 3,270 markers including 1,596 microsatellite-based markers, 900 cloned gene sequences and ESTs, 668 canine-specific bacterial artificial chromosome (BAC) ends, and 106 sequence-tagged sites. The map was constructed by using the RHDF5000-2 whole-genome radiation hybrid panel and computed by using multimap and tsp/concorde. The 3,270 markers map to 3,021 unique positions and define an average intermarker distance corresponding to 1 Mb. We also define a minimal screening set of 325 highly informative well spaced markers, to be used in the initiation of genome-wide scans. The well defined synteny between the dog and human genomes, established in part as a function of this work by the identification of 85 conserved fragments, will allow follow-up of initial findings of linkage by selection of candidate genes from the human genome sequence. This work continues to define the canine system as the method of choice in the pursuit of the genes causing mammalian variation and disease. PMID:12700351

  11. High-resolution genetic mapping of maize pan-genome sequence anchors.

    PubMed

    Lu, Fei; Romay, Maria C; Glaubitz, Jeffrey C; Bradbury, Peter J; Elshire, Robert J; Wang, Tianyu; Li, Yu; Li, Yongxiang; Semagn, Kassa; Zhang, Xuecai; Hernandez, Alvaro G; Mikel, Mark A; Soifer, Ilya; Barad, Omer; Buckler, Edward S

    2015-01-01

    In addition to single-nucleotide polymorphisms, structural variation is abundant in many plant genomes. The structural variation across a species can be represented by a 'pan-genome', which is essential to fully understand the genetic control of phenotypes. However, the pan-genome's complexity hinders its accurate assembly via sequence alignment. Here we demonstrate an approach to facilitate pan-genome construction in maize. By performing 18 trillion association tests we map 26 million tags generated by reduced representation sequencing of 14,129 maize inbred lines. Using machine-learning models we select 4.4 million accurately mapped tags as sequence anchors, 1.1 million of which are presence/absence variations. Structural variations exhibit enriched association with phenotypic traits, indicating that it is a significant source of adaptive variation in maize. The ability to efficiently map ultrahigh-density pan-genome sequence anchors enables fine characterization of structural variation and will advance both genetic research and breeding in many crops. PMID:25881062

  12. The sea lamprey meiotic map improves resolution of ancient vertebrate genome duplications

    PubMed Central

    Smith, Jeramiah J.; Keinath, Melissa C.

    2015-01-01

    It is generally accepted that many genes present in vertebrate genomes owe their origin to two whole-genome duplications that occurred deep in the ancestry of the vertebrate lineage. However, details regarding the timing and outcome of these duplications are not well resolved. We present high-density meiotic and comparative genomic maps for the sea lamprey (Petromyzon marinus), a representative of an ancient lineage that diverged from all other vertebrates ∼550 million years ago. Linkage analyses yielded a total of 95 linkage groups, similar to the estimated number of germline chromosomes (1n ∼ 99), spanning a total of 5570.25 cM. Comparative mapping data yield strong support for the hypothesis that a single whole-genome duplication occurred in the basal vertebrate lineage, but do not strongly support a hypothetical second event. Rather, these comparative maps reveal several evolutionarily independent segmental duplications occurring over the last 600+ million years of chordate evolution. This refined history of vertebrate genome duplication should permit more precise investigations of vertebrate evolution. PMID:26048246

  13. Genome sequencing and analysis conference grant

    SciTech Connect

    Venter, J.C.

    1995-10-01

    The 14 plenary session presentations focused on nematode; yeast; fruit fly; plants; mycobacteria; and man. In addition there were presentations on a variety of technical innovations including database developments and refinements, bioelectronic genesensors, computer-assisted multiplex techniques, and hybridization analysis with DNA chip technology. This document includes a list of exhibitors and abstracts of sessions.

  14. Meta-analysis of genome-wide association from genomic prediction models.

    PubMed

    Bernal Rubio, Y L; Gualdrón Duarte, J L; Bates, R O; Ernst, C W; Nonneman, D; Rohrer, G A; King, A; Shackelford, S D; Wheeler, T L; Cantet, R J C; Steibel, J P

    2016-02-01

    Genome-wide association (GWA) studies based on GBLUP models are a common practice in animal breeding. However, effect sizes of GWA tests are small, requiring larger sample sizes to enhance power of detection of rare variants. Because of difficulties in increasing sample size in animal populations, one alternative is to implement a meta-analysis (MA), combining information and results from independent GWA studies. Although this methodology has been used widely in human genetics, implementation in animal breeding has been limited. Thus, we present methods to implement a MA of GWA, describing the proper approach to compute weights derived from multiple genomic evaluations based on animal-centric GBLUP models. Application to real datasets shows that MA increases power of detection of associations in comparison with population-level GWA, allowing for population structure and heterogeneity of variance components across populations to be accounted for. Another advantage of MA is that it does not require access to genotype data that is required for a joint analysis. Scripts related to the implementation of this approach, which consider the strength of association as well as the sign, are distributed and thus account for heterogeneity in association phase between QTL and SNPs. Thus, MA of GWA is an attractive alternative to summarizing results from multiple genomic studies, avoiding restrictions with genotype data sharing, definition of fixed effects and different scales of measurement of evaluated traits. PMID:26607299

  15. Emerging pathogens of gilthead seabream: characterisation and genomic analysis of novel intracellular β-proteobacteria.

    PubMed

    Seth-Smith, Helena M B; Dourala, Nancy; Fehr, Alexander; Qi, Weihong; Katharios, Pantelis; Ruetten, Maja; Mateos, José M; Nufer, Lisbeth; Weilenmann, Roseline; Ziegler, Urs; Thomson, Nicholas R; Schlapbach, Ralph; Vaughan, Lloyd

    2016-07-01

    New and emerging environmental pathogens pose some of the greatest threats to modern aquaculture, a critical source of food protein globally. As with other intensive farming practices, increasing our understanding of the biology of infections is important to improve animal welfare and husbandry. The gill infection epitheliocystis is increasingly problematic in gilthead seabream (Sparus aurata), a major Mediterranean aquaculture species. Epitheliocystis is generally associated with chlamydial bacteria, yet we were not able to localise chlamydial targets within the major gilthead seabream lesions. Two previously unidentified species within a novel β-proteobacterial genus were instead identified. These co-infecting intracellular bacteria have been characterised using high-resolution imaging and genomics, presenting the most comprehensive study on epitheliocystis agents to date. Draft genomes of the two uncultured species, Ca. Ichthyocystis hellenicum and Ca. Ichthyocystis sparus, have been de novo sequenced and annotated from preserved material. Analysis of the genomes shows a compact core indicating a metabolic dependency on the host, and an accessory genome with an unprecedented number of tandemly arrayed gene families. This study represents a critical insight into novel, emerging fish pathogens and will be used to underpin future investigations into the bacterial origins, and to develop diagnostic and treatment strategies. PMID:26849311

  16. Genomic analysis and selected molecular pathways in rare cancers

    NASA Astrophysics Data System (ADS)

    Liu, Stephen V.; Lenkiewicz, Elizabeth; Evers, Lisa; Holley, Tara; Kiefer, Jeffrey; Ruiz, Christian; Glatz, Katharina; Bubendorf, Lukas; Demeure, Michael J.; Eng, Cathy; Ramanathan, Ramesh K.; Von Hoff, Daniel D.; Barrett, Michael T.

    2012-12-01

    It is widely accepted that many cancers arise as a result of an acquired genomic instability and the subsequent evolution of tumor cells with variable patterns of selected and background aberrations. The presence and behaviors of distinct neoplastic cell populations within a patient's tumor may underlie multiple clinical phenotypes in cancers. A goal of many current cancer genome studies is the identification of recurring selected driver events that can be advanced for the development of personalized therapies. Unfortunately, in the majority of rare tumors, this type of analysis can be particularly challenging. Large series of specimens for analysis are simply not available, allowing recurring patterns to remain hidden. In this paper, we highlight the use of DNA content-based flow sorting to identify and isolate DNA-diploid and DNA-aneuploid populations from tumor biopsies as a strategy to comprehensively study the genomic composition and behaviors of individual cancers in a series of rare solid tumors: intrahepatic cholangiocarcinoma, anal carcinoma, adrenal leiomyosarcoma, and pancreatic neuroendocrine tumors. We propose that the identification of highly selected genomic events in distinct tumor populations within each tumor can identify candidate driver events that can facilitate the development of novel, personalized treatment strategies for patients with cancer.

  17. Genome-Wide Detection and Analysis of Multifunctional Genes

    PubMed Central

    Pritykin, Yuri; Ghersi, Dario; Singh, Mona

    2015-01-01

    Many genes can play a role in multiple biological processes or molecular functions. Identifying multifunctional genes at the genome-wide level and studying their properties can shed light upon the complexity of molecular events that underpin cellular functioning, thereby leading to a better understanding of the functional landscape of the cell. However, to date, genome-wide analysis of multifunctional genes (and the proteins they encode) has been limited. Here we introduce a computational approach that uses known functional annotations to extract genes playing a role in at least two distinct biological processes. We leverage functional genomics data sets for three organisms—H. sapiens, D. melanogaster, and S. cerevisiae—and show that, as compared to other annotated genes, genes involved in multiple biological processes possess distinct physicochemical properties, are more broadly expressed, tend to be more central in protein interaction networks, tend to be more evolutionarily conserved, and are more likely to be essential. We also find that multifunctional genes are significantly more likely to be involved in human disorders. These same features also hold when multifunctionality is defined with respect to molecular functions instead of biological processes. Our analysis uncovers key features about multifunctional genes, and is a step towards a better genome-wide understanding of gene multifunctionality. PMID:26436655

  18. Genome-Wide Detection and Analysis of Multifunctional Genes.

    PubMed

    Pritykin, Yuri; Ghersi, Dario; Singh, Mona

    2015-10-01

    Many genes can play a role in multiple biological processes or molecular functions. Identifying multifunctional genes at the genome-wide level and studying their properties can shed light upon the complexity of molecular events that underpin cellular functioning, thereby leading to a better understanding of the functional landscape of the cell. However, to date, genome-wide analysis of multifunctional genes (and the proteins they encode) has been limited. Here we introduce a computational approach that uses known functional annotations to extract genes playing a role in at least two distinct biological processes. We leverage functional genomics data sets for three organisms--H. sapiens, D. melanogaster, and S. cerevisiae--and show that, as compared to other annotated genes, genes involved in multiple biological processes possess distinct physicochemical properties, are more broadly expressed, tend to be more central in protein interaction networks, tend to be more evolutionarily conserved, and are more likely to be essential. We also find that multifunctional genes are significantly more likely to be involved in human disorders. These same features also hold when multifunctionality is defined with respect to molecular functions instead of biological processes. Our analysis uncovers key features about multifunctional genes, and is a step towards a better genome-wide understanding of gene multifunctionality. PMID:26436655

  19. High Resolution Continuous Flow Analysis System for Polar Ice Cores

    NASA Astrophysics Data System (ADS)

    Dallmayr, Remi; Azuma, Kumiko; Yamada, Hironobu; Kjær, Helle Astrid; Vallelonga, Paul; Azuma, Nobuhiko; Takata, Morimasa

    2014-05-01

    In the last decades, Continuous Flow Analysis (CFA) technology for ice core analyses has been developed to reconstruct the past changes of the climate system 1), 2). Compared with traditional analyses of discrete samples, a CFA system offers much faster and higher depth resolution analyses. It also generates a decontaminated sample stream without time-consuming sample processing procedure by using the inner area of an ice-core sample.. The CFA system that we have been developing is currently able to continuously measure stable water isotopes 3) and electrolytic conductivity, as well as to collect discrete samples for the both inner and outer areas with variable depth resolutions. Chemistry analyses4) and methane-gas analysis 5) are planned to be added using the continuous water stream system 5). In order to optimize the resolution of the current system with minimal sample volumes necessary for different analyses, our CFA system typically melts an ice core at 1.6 cm/min. Instead of using a wire position encoder with typical 1mm positioning resolution 6), we decided to use a high-accuracy CCD Laser displacement sensor (LKG-G505, Keyence). At the 1.6 cm/min melt rate, the positioning resolution was increased to 0.27mm. Also, the mixing volume that occurs in our open split debubbler is regulated using its weight. The overflow pumping rate is smoothly PID controlled to maintain the weight as low as possible, while keeping a safety buffer of water to avoid air bubbles downstream. To evaluate the system's depth-resolution, we will present the preliminary data of electrolytic conductivity obtained by melting 12 bags of the North Greenland Eemian Ice Drilling (NEEM) ice core. The samples correspond to different climate intervals (Greenland Stadial 21, 22, Greenland Stadial 5, Greenland Interstadial 5, Greenland Interstadial 7, Greenland Stadial 8). We will present results for the Greenland Stadial -8, whose depths and ages are between 1723.7 and 1724.8 meters, and 35.520 to

  20. Massively expedited genome-wide heritability analysis (MEGHA)

    PubMed Central

    Ge, Tian; Nichols, Thomas E.; Lee, Phil H.; Holmes, Avram J.; Roffman, Joshua L.; Buckner, Randy L.; Sabuncu, Mert R.; Smoller, Jordan W.

    2015-01-01

    The discovery and prioritization of heritable phenotypes is a computational challenge in a variety of settings, including neuroimaging genetics and analyses of the vast phenotypic repositories in electronic health record systems and population-based biobanks. Classical estimates of heritability require twin or pedigree data, which can be costly and difficult to acquire. Genome-wide complex trait analysis is an alternative tool to compute heritability estimates from unrelated individuals, using genome-wide data that are increasingly ubiquitous, but is computationally demanding and becomes difficult to apply in evaluating very large numbers of phenotypes. Here we present a fast and accurate statistical method for high-dimensional heritability analysis using genome-wide SNP data from unrelated individuals, termed massively expedited genome-wide heritability analysis (MEGHA) and accompanying nonparametric sampling techniques that enable flexible inferences for arbitrary statistics of interest. MEGHA produces estimates and significance measures of heritability with several orders of magnitude less computational time than existing methods, making heritability-based prioritization of millions of phenotypes based on data from unrelated individuals tractable for the first time to our knowledge. As a demonstration of application, we conducted heritability analyses on global and local morphometric measurements derived from brain structural MRI scans, using genome-wide SNP data from 1,320 unrelated young healthy adults of non-Hispanic European ancestry. We also computed surface maps of heritability for cortical thickness measures and empirically localized cortical regions where thickness measures were significantly heritable. Our analyses demonstrate the unique capability of MEGHA for large-scale heritability-based screening and high-dimensional heritability profile construction. PMID:25675487

  1. Massively expedited genome-wide heritability analysis (MEGHA).

    PubMed

    Ge, Tian; Nichols, Thomas E; Lee, Phil H; Holmes, Avram J; Roffman, Joshua L; Buckner, Randy L; Sabuncu, Mert R; Smoller, Jordan W

    2015-02-24

    The discovery and prioritization of heritable phenotypes is a computational challenge in a variety of settings, including neuroimaging genetics and analyses of the vast phenotypic repositories in electronic health record systems and population-based biobanks. Classical estimates of heritability require twin or pedigree data, which can be costly and difficult to acquire. Genome-wide complex trait analysis is an alternative tool to compute heritability estimates from unrelated individuals, using genome-wide data that are increasingly ubiquitous, but is computationally demanding and becomes difficult to apply in evaluating very large numbers of phenotypes. Here we present a fast and accurate statistical method for high-dimensional heritability analysis using genome-wide SNP data from unrelated individuals, termed massively expedited genome-wide heritability analysis (MEGHA) and accompanying nonparametric sampling techniques that enable flexible inferences for arbitrary statistics of interest. MEGHA produces estimates and significance measures of heritability with several orders of magnitude less computational time than existing methods, making heritability-based prioritization of millions of phenotypes based on data from unrelated individuals tractable for the first time to our knowledge. As a demonstration of application, we conducted heritability analyses on global and local morphometric measurements derived from brain structural MRI scans, using genome-wide SNP data from 1,320 unrelated young healthy adults of non-Hispanic European ancestry. We also computed surface maps of heritability for cortical thickness measures and empirically localized cortical regions where thickness measures were significantly heritable. Our analyses demonstrate the unique capability of MEGHA for large-scale heritability-based screening and high-dimensional heritability profile construction. PMID:25675487

  2. Revealing misassembled segments in the bovine reference genome by high resolution linkage disequilibrium scan

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Misassembly signatures, created by shuffling the order of sequences while assembling a genome, can be easily seen by analyzing the unexpected behaviour of the linkage disequilibrium (LD) decay. A heuristic process was proposed to identify those misassembly signatures and presented the ones found in ...

  3. High-resolution typing by integration of genome sequencing data in a large tuberculosis cluster.

    PubMed

    Schürch, Anita C; Kremer, Kristin; Daviena, Olaf; Kiers, Albert; Boeree, Martin J; Siezen, Roland J; van Soolingen, Dick

    2010-09-01

    To investigate whether genome sequencing yields more useful markers than those currently used to study the epidemiology of tuberculosis, it was applied to three Mycobacterium tuberculosis isolates of the Harlingen outbreak. Our findings suggest that single nucleotide polymorphisms can be used to identify transmission chains in restriction fragment length polymorphism clusters. PMID:20592143

  4. High-Resolution Typing by Integration of Genome Sequencing Data in a Large Tuberculosis Cluster▿

    PubMed Central

    Schürch, Anita C.; Kremer, Kristin; Daviena, Olaf; Kiers, Albert; Boeree, Martin J.; Siezen, Roland J.; van Soolingen, Dick

    2010-01-01

    To investigate whether genome sequencing yields more useful markers than those currently used to study the epidemiology of tuberculosis, it was applied to three Mycobacterium tuberculosis isolates of the Harlingen outbreak. Our findings suggest that single nucleotide polymorphisms can be used to identify transmission chains in restriction fragment length polymorphism clusters. PMID:20592143

  5. Resolution and noise trade-off analysis for volumetric CT

    SciTech Connect

    Li Baojun; Avinash, Gopal B.; Hsieh, Jiang

    2007-10-15

    Until recently, most studies addressing the trade-off between spatial resolution and quantum noise were performed in the context of single-slice CT. In this study, we extend the theoretical framework of previous works to volumetric CT and further extend it by taking into account the actual shapes of the preferred reconstruction kernels. In the experimental study, we also attempt to explore a three-dimensional approach for spatial resolution measurement, as opposed to the conventional two-dimensional approaches that were widely adopted in previously published studies. By scanning a finite-sized sphere phantom, the MTF was measured from the edge profile along the spherical surface. Cases of different resolutions (and noise levels) were generated by adjusting the reconstruction kernel. To reduce bias, the total photon fluxes were matched: 120 kVp, 200 mA, and 1 s per gantry rotation. All data sets were reconstructed using a modified FDK algorithm under the same condition: Scan field-of-view (SFOV)=10 cm, and slice thickness=0.625 mm. The theoretical analysis indicated that the variance of noise is proportional to >4th power of the spatial resolution. Our experimental results supported this conclusion by showing the relationship is 4.6th (helical) or 5th (axial) power.

  6. Geometric multi-resolution analysis for dictionary learning

    NASA Astrophysics Data System (ADS)

    Maggioni, Mauro; Minsker, Stanislav; Strawn, Nate

    2015-09-01

    We present an efficient algorithm and theory for Geometric Multi-Resolution Analysis (GMRA), a procedure for dictionary learning. Sparse dictionary learning provides the necessary complexity reduction for the critical applications of compression, regression, and classification in high-dimensional data analysis. As such, it is a critical technique in data science and it is important to have techniques that admit both efficient implementation and strong theory for large classes of theoretical models. By construction, GMRA is computationally efficient and in this paper we describe how the GMRA correctly approximates a large class of plausible models (namely, the noisy manifolds).

  7. Genome-wide association interaction analysis for Alzheimer's disease

    PubMed Central

    Gusareva, Elena S.; Carrasquillo, Minerva M.; Bellenguez, Céline; Cuyvers, Elise; Colon, Samuel; Graff-Radford, Neill R.; Petersen, Ronald C.; Dickson, Dennis W.; Mahachie Johna, Jestinah M.; Bessonov, Kyrylo; Van Broeckhoven, Christine; Williams, Julie; Amouyel, Philippe; Sleegers, Kristel; Ertekin-Taner, Nilüfer; Lambert, Jean-Charles; Van Steen, Kristel

    2015-01-01

    We propose a minimal protocol for exhaustive genome-wide association interaction analysis that involves screening for epistasis over large-scale genomic data combining strengths of different methods and statistical tools. The different steps of this protocol are illustrated on a real-life data application for Alzheimer's disease (AD) (2259 patients and 6017 controls from France). Particularly, in the exhaustive genome-wide epistasis screening we identified AD-associated interacting SNPs-pair from chromosome 6q11.1 (rs6455128, the KHDRBS2 gene) and 13q12.11 (rs7989332, the CRYL1 gene) (p = 0.006, corrected for multiple testing). A replication analysis in the independent AD cohort from Germany (555 patients and 824 controls) confirmed the discovered epistasis signal (p = 0.036). This signal was also supported by a meta-analysis approach in 5 independent AD cohorts that was applied in the context of epistasis for the first time. Transcriptome analysis revealed negative correlation between expression levels of KHDRBS2 and CRYL1 in both the temporal cortex (β = −0.19, p = 0.0006) and cerebellum (β = −0.23, p < 0.0001) brain regions. This is the first time a replicable epistasis associated with AD was identified using a hypothesis free screening approach. PMID:24958192

  8. High-Throughput, High-Resolution Mapping of Protein Localization in Mammalian Brain by In Vivo Genome Editing.

    PubMed

    Mikuni, Takayasu; Nishiyama, Jun; Sun, Ye; Kamasawa, Naomi; Yasuda, Ryohei

    2016-06-16

    A scalable and high-throughput method to identify precise subcellular localization of endogenous proteins is essential for integrative understanding of a cell at the molecular level. Here, we developed a simple and generalizable technique to image endogenous proteins with high specificity, resolution, and contrast in single cells in mammalian brain tissue. The technique, single-cell labeling of endogenous proteins by clustered regularly interspaced short palindromic repeats (CRISPR)-Cas9-mediated homology-directed repair (SLENDR), uses in vivo genome editing to insert a sequence encoding an epitope tag or a fluorescent protein to a gene of interest by CRISPR-Cas9-mediated homology-directed repair (HDR). Single-cell, HDR-mediated genome editing was achieved by delivering the editing machinery to dividing neuronal progenitors through in utero electroporation. We demonstrate that SLENDR allows rapid determination of the localization and dynamics of many endogenous proteins in various cell types, regions, and ages in the brain. Thus, SLENDR provides a high-throughput platform to map the subcellular localization of endogenous proteins with the resolution of micro- to nanometers in the brain. PMID:27180908

  9. High resolution coherence analysis between planetary and climate oscillations

    NASA Astrophysics Data System (ADS)

    Scafetta, Nicola

    2016-05-01

    This study investigates the existence of a multi-frequency spectral coherence between planetary and global surface temperature oscillations by using advanced techniques of coherence analysis and statistical significance tests. The performance of the standard Matlab mscohere algorithms is compared versus high resolution coherence analysis methodologies such as the canonical correlation analysis. The Matlab mscohere function highlights large coherence peaks at 20 and 60-year periods although, due to the shortness of the global surface temperature record (1850-2014), the statistical significance of the result depends on the specific window function adopted for pre-processing the data. In fact, window functions disrupt the low frequency component of the spectrum. On the contrary, using the canonical correlation analysis at least five coherent frequencies at the 95% significance level are found at the following periods: 6.6, 7.4, 14, 20 and 60 years. Thus, high resolution coherence analysis confirms that the climate system can be partially modulated by astronomical forces of gravitational, electromagnetic and solar origin. A possible chain of the physical causes explaining this coherence is briefly discussed.

  10. Complete sequence and genomic analysis of murine gammaherpesvirus 68.

    PubMed Central

    Virgin, H W; Latreille, P; Wamsley, P; Hallsworth, K; Weck, K E; Dal Canto, A J; Speck, S H

    1997-01-01

    Murine gammaherpesvirus 68 (gammaHV68) infects mice, thus providing a tractable small-animal model for analysis of the acute and chronic pathogenesis of gammaherpesviruses. To facilitate molecular analysis of gammaHV68 pathogenesis, we have sequenced the gammaHV68 genome. The genome contains 118,237 bp of unique sequence flanked by multiple copies of a 1,213-bp terminal repeat. The GC content of the unique portion of the genome is 46%, while the GC content of the terminal repeat is 78%. The unique portion of the genome is estimated to encode at least 80 genes and is largely colinear with the genomes of Kaposi's sarcoma herpesvirus (KSHV; also known as human herpesvirus 8), herpesvirus saimiri (HVS), and Epstein-Barr virus (EBV). We detected 63 open reading frames (ORFs) homologous to HVS and KSHV ORFs and used the HVS/KSHV numbering system to designate these ORFs. gammaHV68 shares with HVS and KSHV ORFs homologous to a complement regulatory protein (ORF 4), a D-type cyclin (ORF 72), and a G-protein-coupled receptor with close homology to the interleukin-8 receptor (ORF 74). One ORF (K3) was identified in gammaHV68 as homologous to both ORFs K3 and K5 of KSHV and contains a domain found in a bovine herpesvirus 4 major immediate-early protein. We also detected 16 methionine-initiated ORFs predicted to encode proteins at least 100 amino acids in length that are unique to gammaHV68 (ORFs M1 to 14). ORF M1 has striking homology to poxvirus serpins, while ORF M11 encodes a potential homolog of Bcl-2-like molecules encoded by other gammaherpesviruses (gene 16 of HVS and KSHV and the BHRF1 gene of EBV). In addition, clustered at the left end of the unique region are eight sequences with significant homology to bacterial tRNAs. The unique region of the genome contains two internal repeats: a 40-bp repeat located between bp 26778 and 28191 in the genome and a 100-bp repeat located between bp 98981 and 101170. Analysis of the gammaHV68, HVS, EBV, and KSHV genomes demonstrated

  11. [Detection of the introgression of genome elements of Aegilops cylindrica Host. into Triticum aestivum L. genome with ISSR-analysis].

    PubMed

    Galaev, A V; Babaiants, L T; Sivolap, Iu M

    2003-01-01

    Comparative analysis of introgressive and parental forms of wheat was carried out to reveal the sites of donor genome with new loci of resistance to fungal diseases. By ISSR-method 124 ISSR-loci were detected in the genomes of 18 individual plants of introgressive line 5/20-91; 17 of them have been related to introgressive fragments of Ae. cylindrica genome in T. aestivum. It was shown that ISSR-method is effective for detection of the variability caused by introgression of alien genetic material to T. aestivum genome. PMID:12945176

  12. Genome Sequence and Comparative Genomics Analysis of a Vibrio cholerae O1 Strain Isolated from a Cholera Patient in Malaysia

    PubMed Central

    Osama, Abdulrazak; Gan, Han Ming; Teh, Cindy Shuan Ju; Yap, Kien-Pong

    2012-01-01

    The genome sequence analysis of a clinical Vibrio cholerae VC35 strain from an outbreak case in Malaysia indicates multiple genes involved in host adaptation and a novel Na+-driven multidrug efflux pump-coding gene in the genome of Vibrio cholerae with the highest similarity to VMA_001754 of Vibrio mimicus VMA223. PMID:23209200

  13. Comparative Genome Analysis of Basidiomycete Fungi

    SciTech Connect

    Riley, Robert; Salamov, Asaf; Morin, Emmanuelle; Nagy, Laszlo; Manning, Gerard; Baker, Scott; Brown, Daren; Henrissat, Bernard; Levasseur, Anthony; Hibbett, David; Martin, Francis; Grigoriev, Igor

    2012-03-19

    Fungi of the phylum Basidiomycota (basidiomycetes), make up some 37percent of the described fungi, and are important in forestry, agriculture, medicine, and bioenergy. This diverse phylum includes the mushrooms, wood rots, symbionts, and plant and animal pathogens. To better understand the diversity of phenotypes in basidiomycetes, we performed a comparative analysis of 35 basidiomycete fungi spanning the diversity of the phylum. Phylogenetic patterns of lignocellulose degrading genes suggest a continuum rather than a sharp dichotomy between the white rot and brown rot modes of wood decay. Patterns of secondary metabolic enzymes give additional insight into the broad array of phenotypes found in the basidiomycetes. We suggest that the profile of an organism in lignocellulose-targeting genes can be used to predict its nutritional mode, and predict Dacryopinax sp. as a brown rot; Botryobasidium botryosum and Jaapia argillacea as white rots.

  14. Computational analysis of high resolution unsteady airloads for rotor aeroacoustics

    NASA Technical Reports Server (NTRS)

    Quackenbush, Todd R.; Lam, C.-M. Gordon; Wachspress, Daniel A.; Bliss, Donald B.

    1994-01-01

    The study of helicopter aerodynamic loading for acoustics applications requires the application of efficient yet accurate simulations of the velocity field induced by the rotor's vortex wake. This report summarizes work to date on the development of such an analysis, which builds on the Constant Vorticity Contour (CVC) free wake model, previously implemented for the study of vibratory loading in the RotorCRAFT computer code. The present effort has focused on implementation of an airload reconstruction approach that computes high resolution airload solutions of rotor/rotor-wake interactions required for acoustics computations. Supplementary efforts on the development of improved vortex core modeling, unsteady aerodynamic effects, higher spatial resolution of rotor loading, and fast vortex wake implementations have substantially enhanced the capabilities of the resulting software, denoted RotorCRAFT/AA (AeroAcoustics). Results of validation calculations using recently acquired model rotor data show that by employing airload reconstruction it is possible to apply the CVC wake analysis with temporal and spatial resolution suitable for acoustics applications while reducing the computation time required by one to two orders of magnitude relative to that required by direct calculations. Promising correlation with this body of airload and noise data has been obtained for a variety of rotor configurations and operating conditions.

  15. Universal multifractal analysis of high-resolution snowfall data

    NASA Astrophysics Data System (ADS)

    Raupach, Timothy; Gires, Auguste; Tchiguirinskaia, Ioulia; Schertzer, Daniel; Berne, Alexis

    2016-04-01

    Universal multifractal analysis offers useful insights into the scaling properties of precipitation data. While much work has been done on the scaling properties of rainfall fields, less is known about the scaling properties of solid precipitation such as snowfall, especially at high resolution. We present results of a universal multifractal (UM) analysis of high-resolution solid precipitation data. The data were recorded using a 2D-video-disdrometer (2DVD) situated in the Swiss Alps. Analysis was performed on a one-hour period of snowfall, during which time the mean wind speed was zero, temperatures were low, and no hail was detected. The 2DVD recorded information on individual particles, from which we calculated snow mass. Three "cuts" of the spatio-temporal snowfall process were analysed using the UM framework. First, high-resolution timeseries of precipitation intensity at 100 ms temporal resolution were analysed. These results show two scaling regimes with a transition area between them. Second, we analysed reconstructed vertical columns of particle concentration and snow mass, assuming no horizontal wind and constant vertical velocity (equal to the one recorded on the ground). Strong scaling was observed in the particle concentration fields, with the influence of large (and therefore rare) snowflakes degrading the quality of the scaling observed for higher moments of the particle distribution. There was a clear difference between the measured fields and fields in which the vertical distribution of particles was made homogeneous, indicating that the measured snowfall fields contained non-homogeneous fields. Scaling behaviour was observed down to vertical scales of about 0.5 m, which is similar to published results using rain data. Finally, we used the UM framework to investigate the scaling properties of 2D maps of snow accumulation over a subset of the instrument collection area of 5.12 x 5.12 cm^2. As expected from the vertical column analysis, given that

  16. Tomato Functional Genomics Database: a comprehensive resource and analysis package for tomato functional genomics.

    PubMed

    Fei, Zhangjun; Joung, Je-Gun; Tang, Xuemei; Zheng, Yi; Huang, Mingyun; Lee, Je Min; McQuinn, Ryan; Tieman, Denise M; Alba, Rob; Klee, Harry J; Giovannoni, James J

    2011-01-01

    Tomato Functional Genomics Database (TFGD) provides a comprehensive resource to store, query, mine, analyze, visualize and integrate large-scale tomato functional genomics data sets. The database is functionally expanded from the previously described Tomato Expression Database by including metabolite profiles as well as large-scale tomato small RNA (sRNA) data sets. Computational pipelines have been developed to process microarray, metabolite and sRNA data sets archived in the database, respectively, and TFGD provides downloads of all the analyzed results. TFGD is also designed to enable users to easily retrieve biologically important information through a set of efficient query interfaces and analysis tools, including improved array probe annotations as well as tools to identify co-expressed genes, significantly affected biological processes and biochemical pathways from gene expression data sets and miRNA targets, and to integrate transcript and metabolite profiles, and sRNA and mRNA sequences. The suite of tools and interfaces in TFGD allow intelligent data mining of recently released and continually expanding large-scale tomato functional genomics data sets. TFGD is available at http://ted.bti.cornell.edu. PMID:20965973

  17. Integrated genome-wide analysis of genomic changes and gene regulation in human adrenocortical tissue samples

    PubMed Central

    Gara, Sudheer Kumar; Wang, Yonghong; Patel, Dhaval; Liu-Chittenden, Yi; Jain, Meenu; Boufraqech, Myriem; Zhang, Lisa; Meltzer, Paul S.; Kebebew, Electron

    2015-01-01

    To gain insight into the pathogenesis of adrenocortical carcinoma (ACC) and whether there is progression from normal-to-adenoma-to-carcinoma, we performed genome-wide gene expression, gene methylation, microRNA expression and comparative genomic hybridization (CGH) analysis in human adrenocortical tissue (normal, adrenocortical adenomas and ACC) samples. A pairwise comparison of normal, adrenocortical adenomas and ACC gene expression profiles with more than four-fold expression differences and an adjusted P-value < 0.05 revealed no major differences in normal versus adrenocortical adenoma whereas there are 808 and 1085, respectively, dysregulated genes between ACC versus adrenocortical adenoma and ACC versus normal. The majority of the dysregulated genes in ACC were downregulated. By integrating the CGH, gene methylation and expression profiles of potential miRNAs with the gene expression of dysregulated genes, we found that there are higher alterations in ACC versus normal compared to ACC versus adrenocortical adenoma. Importantly, we identified several novel molecular pathways that are associated with dysregulated genes and further experimentally validated that oncostatin m signaling induces caspase 3 dependent apoptosis and suppresses cell proliferation. Finally, we propose that there is higher number of genomic changes from normal-to-adenoma-to-carcinoma and identified oncostatin m signaling as a plausible druggable pathway for therapeutics. PMID:26446994

  18. SIDEKICK: Genomic data driven analysis and decision-making framework

    PubMed Central

    2010-01-01

    Background Scientists striving to unlock mysteries within complex biological systems face myriad barriers in effectively integrating available information to enhance their understanding. While experimental techniques and available data sources are rapidly evolving, useful information is dispersed across a variety of sources, and sources of the same information often do not use the same format or nomenclature. To harness these expanding resources, scientists need tools that bridge nomenclature differences and allow them to integrate, organize, and evaluate the quality of information without extensive computation. Results Sidekick, a genomic data driven analysis and decision making framework, is a web-based tool that provides a user-friendly intuitive solution to the problem of information inaccessibility. Sidekick enables scientists without training in computation and data management to pursue answers to research questions like "What are the mechanisms for disease X" or "Does the set of genes associated with disease X also influence other diseases." Sidekick enables the process of combining heterogeneous data, finding and maintaining the most up-to-date data, evaluating data sources, quantifying confidence in results based on evidence, and managing the multi-step research tasks needed to answer these questions. We demonstrate Sidekick's effectiveness by showing how to accomplish a complex published analysis in a fraction of the original time with no computational effort using Sidekick. Conclusions Sidekick is an easy-to-use web-based tool that organizes and facilitates complex genomic research, allowing scientists to explore genomic relationships and formulate hypotheses without computational effort. Possible analysis steps include gene list discovery, gene-pair list discovery, various enrichments for both types of lists, and convenient list manipulation. Further, Sidekick's ability to characterize pairs of genes offers new ways to approach genomic analysis that

  19. Sparse Recovery Analysis of High-Resolution Climate Data

    NASA Astrophysics Data System (ADS)

    Archibald, R.

    2013-12-01

    The field of compressed sensing is vast and currently very active, with new results, methods, and algorithms appearing almost daily. The first notions of compressed sensing began with Prony's method, which was designed by the French mathematician Gaspard Riche de Prony to extract signal information from a limited number of measurements. Since then, sparsity has been used empirically in a variety of applications, including geology and geophysics, spectroscopy, signal processing, radio astronomy, and medical ultrasound. High-resolution climate studies performed on large scale high performance computing have been producing large amounts of data that can benefit from unique mathematical methods for analysis. This work demonstrates how sparse recovery and L1 regularization can be used effectively on large datasets from high-resolution climate studies.

  20. Benchmarking undedicated cloud computing providers for analysis of genomic datasets.

    PubMed

    Yazar, Seyhan; Gooden, George E C; Mackey, David A; Hewitt, Alex W

    2014-01-01

    A major bottleneck in biological discovery is now emerging at the computational level. Cloud computing offers a dynamic means whereby small and medium-sized laboratories can rapidly adjust their computational capacity. We benchmarked two established cloud computing services, Amazon Web Services Elastic MapReduce (EMR) on Amazon EC2 instances and Google Compute Engine (GCE), using publicly available genomic datasets (E.coli CC102 strain and a Han Chinese male genome) and a standard bioinformatic pipeline on a Hadoop-based platform. Wall-clock time for complete assembly differed by 52.9% (95% CI: 27.5-78.2) for E.coli and 53.5% (95% CI: 34.4-72.6) for human genome, with GCE being more efficient than EMR. The cost of running this experiment on EMR and GCE differed significantly, with the costs on EMR being 257.3% (95% CI: 211.5-303.1) and 173.9% (95% CI: 134.6-213.1) more expensive for E.coli and human assemblies respectively. Thus, GCE was found to outperform EMR both in terms of cost and wall-clock time. Our findings confirm that cloud computing is an efficient and potentially cost-effective alternative for analysis of large genomic datasets. In addition to releasing our cost-effectiveness comparison, we present available ready-to-use scripts for establishing Hadoop instances with Ganglia monitoring on EC2 or GCE. PMID:25247298

  1. Pan-Genome Analysis of Brazilian Lineage A Amoebal Mimiviruses

    PubMed Central

    Assis, Felipe L.; Bajrai, Leena; Abrahao, Jonatas S.; Kroon, Erna G.; Dornas, Fabio P.; Andrade, Kétyllen R.; Boratto, Paulo V. M.; Pilotto, Mariana R.; Robert, Catherine; Benamar, Samia; La Scola, Bernard; Colson, Philippe

    2015-01-01

    Since the recent discovery of Samba virus, the first representative of the family Mimiviridae from Brazil, prospecting for mimiviruses has been conducted in different environmental conditions in Brazil. Recently, we isolated using Acanthamoeba sp. three new mimiviruses, all of lineage A of amoebal mimiviruses: Kroon virus from urban lake water; Amazonia virus from the Brazilian Amazon river; and Oyster virus from farmed oysters. The aims of this work were to sequence and analyze the genome of these new Brazilian mimiviruses (mimi-BR) and update the analysis of the Samba virus genome. The genomes of Samba virus, Amazonia virus and Oyster virus were 97%–99% similar, whereas Kroon virus had a low similarity (90%–91%) with other mimi-BR. A total of 3877 proteins encoded by mimi-BR were grouped into 974 orthologous clusters. In addition, we identified three new ORFans in the Kroon virus genome. Additional work is needed to expand our knowledge of the diversity of mimiviruses from Brazil, including if and why among amoebal mimiviruses those of lineage A predominate in the Brazilian environment. PMID:26131958

  2. Privacy-preserving GWAS analysis on federated genomic datasets

    PubMed Central

    2015-01-01

    Background The biomedical community benefits from the increasing availability of genomic data to support meaningful scientific research, e.g., Genome-Wide Association Studies (GWAS). However, high quality GWAS usually requires a large amount of samples, which can grow beyond the capability of a single institution. Federated genomic data analysis holds the promise of enabling cross-institution collaboration for effective GWAS, but it raises concerns about patient privacy and medical information confidentiality (as data are being exchanged across institutional boundaries), which becomes an inhibiting factor for the practical use. Methods We present a privacy-preserving GWAS framework on federated genomic datasets. Our method is to layer the GWAS computations on top of secure multi-party computation (MPC) systems. This approach allows two parties in a distributed system to mutually perform secure GWAS computations, but without exposing their private data outside. Results We demonstrate our technique by implementing a framework for minor allele frequency counting and χ2 statistics calculation, one of typical computations used in GWAS. For efficient prototyping, we use a state-of-the-art MPC framework, i.e., Portable Circuit Format (PCF) [1]. Our experimental results show promise in realizing both efficient and secure cross-institution GWAS computations. PMID:26733045

  3. Pan-Genome Analysis of Brazilian Lineage A Amoebal Mimiviruses.

    PubMed

    Assis, Felipe L; Bajrai, Leena; Abrahao, Jonatas S; Kroon, Erna G; Dornas, Fabio P; Andrade, Kétyllen R; Boratto, Paulo V M; Pilotto, Mariana R; Robert, Catherine; Benamar, Samia; Scola, Bernard La; Colson, Philippe

    2015-07-01

    Since the recent discovery of Samba virus, the first representative of the family Mimiviridae from Brazil, prospecting for mimiviruses has been conducted in different environmental conditions in Brazil. Recently, we isolated using Acanthamoeba sp. three new mimiviruses, all of lineage A of amoebal mimiviruses: Kroon virus from urban lake water; Amazonia virus from the Brazilian Amazon river; and Oyster virus from farmed oysters. The aims of this work were to sequence and analyze the genome of these new Brazilian mimiviruses (mimi-BR) and update the analysis of the Samba virus genome. The genomes of Samba virus, Amazonia virus and Oyster virus were 97%-99% similar, whereas Kroon virus had a low similarity (90%-91%) with other mimi-BR. A total of 3877 proteins encoded by mimi-BR were grouped into 974 orthologous clusters. In addition, we identified three new ORFans in the Kroon virus genome. Additional work is needed to expand our knowledge of the diversity of mimiviruses from Brazil, including if and why among amoebal mimiviruses those of lineage A predominate in the Brazilian environment. PMID:26131958

  4. Sequencing and comparative analysis of the gorilla MHC genomic sequence.

    PubMed

    Wilming, Laurens G; Hart, Elizabeth A; Coggill, Penny C; Horton, Roger; Gilbert, James G R; Clee, Chris; Jones, Matt; Lloyd, Christine; Palmer, Sophie; Sims, Sarah; Whitehead, Siobhan; Wiley, David; Beck, Stephan; Harrow, Jennifer L

    2013-01-01

    Major histocompatibility complex (MHC) genes play a critical role in vertebrate immune response and because the MHC is linked to a significant number of auto-immune and other diseases it is of great medical interest. Here we describe the clone-based sequencing and subsequent annotation of the MHC region of the gorilla genome. Because the MHC is subject to extensive variation, both structural and sequence-wise, it is not readily amenable to study in whole genome shotgun sequence such as the recently published gorilla genome. The variation of the MHC also makes it of evolutionary interest and therefore we analyse the sequence in the context of human and chimpanzee. In our comparisons with human and re-annotated chimpanzee MHC sequence we find that gorilla has a trimodular RCCX cluster, versus the reference human bimodular cluster, and additional copies of Class I (pseudo)genes between Gogo-K and Gogo-A (the orthologues of HLA-K and -A). We also find that Gogo-H (and Patr-H) is coding versus the HLA-H pseudogene and, conversely, there is a Gogo-DQB2 pseudogene versus the HLA-DQB2 coding gene. Our analysis, which is freely available through the VEGA genome browser, provides the research community with a comprehensive dataset for comparative and evolutionary research of the MHC. PMID:23589541

  5. Comparative genomic analysis of seven Mycoplasma hyosynoviae strains

    PubMed Central

    Bumgardner, Eric A; Kittichotirat, Weerayuth; Bumgarner, Roger E; Lawrence, Paulraj K

    2015-01-01

    Infection with Mycoplasma hyosynoviae can result in debilitating arthritis in pigs, particularly those aged 10 weeks or older. Strategies for controlling this pathogen are becoming increasingly important due to the rise in the number of cases of arthritis that have been attributed to infection in recent years. In order to begin to develop interventions to prevent arthritis caused by M. hyosynoviae, more information regarding the specific proteins and potential virulence factors that its genome encodes was needed. However, the genome of this emerging swine pathogen had not been sequenced previously. In this report, we present a comparative analysis of the genomes of seven strains of M. hyosynoviae isolated from different locations in North America during the years 2010 to 2013. We identified several putative virulence factors that may contribute to the ability of this pathogen to adhere to host cells. Additionally, we discovered several prophage genes present within the genomes of three strains that show significant similarity to MAV1, a phage isolated from the related species, M. arthritidis. We also identified CRISPR-Cas and type III restriction and modification systems present in two strains that may contribute to their ability to defend against phage infection. PMID:25693846

  6. Genome-scale computational analysis of DNA curvature and repeats in Arabidopsis and rice uncovers plant-specific genomic properties

    PubMed Central

    2011-01-01

    Background Due to its overarching role in genome function, sequence-dependent DNA curvature continues to attract great attention. The DNA double helix is not a rigid cylinder, but presents both curvature and flexibility in different regions, depending on the sequence. More in depth knowledge of the various orders of complexity of genomic DNA structure has allowed the design of sophisticated bioinformatics tools for its analysis and manipulation, which, in turn, have yielded a better understanding of the genome itself. Curved DNA is involved in many biologically important processes, such as transcription initiation and termination, recombination, DNA replication, and nucleosome positioning. CpG islands and tandem repeats also play significant roles in the dynamics and evolution of genomes. Results In this study, we analyzed the relationship between these three structural features within rice (Oryza sativa) and Arabidopsis (Arabidopsis thaliana) genomes. A genome-scale prediction of curvature distribution in rice and Arabidopsis indicated that most of the chromosomes of both genomes have maximal chromosomal DNA curvature adjacent to the centromeric region. By analyzing tandem repeats across the genome, we found that frequencies of repeats are higher in regions adjacent to those with high curvature value. Further analysis of CpG islands shows a clear interdependence between curvature value, repeat frequencies and CpG islands. Each CpG island appears in a local minimal curvature region, and CpG islands usually do not appear in the centromere or regions with high repeat frequency. A statistical evaluation demonstrates the significance and non-randomness of these features. Conclusions This study represents the first systematic genome-scale analysis of DNA curvature, CpG islands and tandem repeats at the DNA sequence level in plant genomes, and finds that not all of the chromosomes in plants follow the same rules common to other eukaryote organisms, suggesting that some

  7. New Assembly, Reannotation and Analysis of the Entamoeba histolytica Genome Reveal New Genomic Features and Protein Content Information

    PubMed Central

    Lorenzi, Hernan A.; Puiu, Daniela; Miller, Jason R.; Brinkac, Lauren M.; Amedeo, Paolo; Hall, Neil; Caler, Elisabet V.

    2010-01-01

    Background In order to maintain genome information accurately and relevantly, original genome annotations need to be updated and evaluated regularly. Manual reannotation of genomes is important as it can significantly reduce the propagation of errors and consequently diminishes the time spent on mistaken research. For this reason, after five years from the initial submission of the Entamoeba histolytica draft genome publication, we have re-examined the original 23 Mb assembly and the annotation of the predicted genes. Principal Findings The evaluation of the genomic sequence led to the identification of more than one hundred artifactual tandem duplications that were eliminated by re-assembling the genome. The reannotation was done using a combination of manual and automated genome analysis. The new 20 Mb assembly contains 1,496 scaffolds and 8,201 predicted genes, of which 60% are identical to the initial annotation and the remaining 40% underwent structural changes. Functional classification of 60% of the genes was modified based on recent sequence comparisons and new experimental data. We have assigned putative function to 3,788 proteins (46% of the predicted proteome) based on the annotation of predicted gene families, and have identified 58 protein families of five or more members that share no homology with known proteins and thus could be entamoeba specific. Genome analysis also revealed new features such as the presence of segmental duplications of up to 16 kb flanked by inverted repeats, and the tight association of some gene families with transposable elements. Significance This new genome annotation and analysis represents a more refined and accurate blueprint of the pathogen genome, and provides an upgraded tool as reference for the study of many important aspects of E. histolytica biology, such as genome evolution and pathogenesis. PMID:20559563

  8. Structural characterization of genomes by large scale sequence-structure threading: application of reliability analysis in structural genomics

    PubMed Central

    Cherkasov, Artem; Ho Sui, Shannan J; Brunham, Robert C; Jones, Steven JM

    2004-01-01

    Background We establish that the occurrence of protein folds among genomes can be accurately described with a Weibull function. Systems which exhibit Weibull character can be interpreted with reliability theory commonly used in engineering analysis. For instance, Weibull distributions are widely used in reliability, maintainability and safety work to model time-to-failure of mechanical devices, mechanisms, building constructions and equipment. Results We have found that the Weibull function describes protein fold distribution within and among genomes more accurately than conventional power functions which have been used in a number of structural genomic studies reported to date. It has also been found that the Weibull reliability parameter β for protein fold distributions varies between genomes and may reflect differences in rates of gene duplication in evolutionary history of organisms. Conclusions The results of this work demonstrate that reliability analysis can provide useful insights and testable predictions in the fields of comparative and structural genomics. PMID:15274750

  9. Genomic analysis of membrane protein families: abundance and conserved motifs

    PubMed Central

    Liu, Yang; Engelman, Donald M; Gerstein, Mark

    2002-01-01

    Background Polytopic membrane proteins can be related to each other on the basis of the number of transmembrane helices and sequence similarities. Building on the Pfam classification of protein domain families, and using transmembrane-helix prediction and sequence-similarity searching, we identified a total of 526 well-characterized membrane protein families in 26 recently sequenced genomes. To this we added a clustering of a number of predicted but unclassified membrane proteins, resulting in a total of 637 membrane protein families. Results Analysis of the occurrence and composition of these families revealed several interesting trends. The number of assigned membrane protein domains has an approximately linear relationship to the total number of open reading frames (ORFs) in 26 genomes studied. Caenorhabditis elegans is an apparent outlier, because of its high representation of seven-span transmembrane (7-TM) chemoreceptor families. In all genomes, including that of C. elegans, the number of distinct membrane protein families has a logarithmic relation to the number of ORFs. Glycine, proline, and tyrosine locations tend to be conserved in transmembrane regions within families, whereas isoleucine, valine, and methionine locations are relatively mutable. Analysis of motifs in putative transmembrane helices reveals that GxxxG and GxxxxxxG (which can be written GG4 and GG7, respectively; see Materials and methods) are among the most prevalent. This was noted in earlier studies; we now find these motifs are particularly well conserved in families, however, especially those corresponding to transporters, symporters, and channels. Conclusions We carried out a genome-wide analysis on patterns of the classified polytopic membrane protein families and analyzed the distribution of conserved amino acids and motifs in the transmembrane helix regions in these families. PMID:12372142

  10. Sequence analysis of the Choristoneura occidentalis granulovirus genome.

    PubMed

    Escasa, Shannon R; Lauzon, Hilary A M; Mathur, Amanda C; Krell, Peter J; Arif, Basil M

    2006-07-01

    The genome of the Choristoneura occidentalis granulovirus (ChocGV) isolated from the western spruce budworm, Choristoneura occidentalis, was sequenced completely. It was 104,710 bp long, with a 67.3% A+T content and contained 116 potential open reading frames (ORFs) covering 88.4% of the genome. Of these, 29 ORFs were conserved in all fully sequenced baculovirus genomes, 30 were GV-specific, 53 were present in some nucleopolyhedroviruses (NPVs) and/or GVs, three were common to ChocGV and Choristoneura fumiferana GV (ChfuGV) and one was so far unique. To date, ChocGV is the only GV identified that contains a homologue of the apoptosis inhibitor protein P35/P49, present in some group I NPVs. It is also the first GV without a Xestia c-nigrum GV ORF 26 homologue. Five homologous regions (hrs)/repeat regions, lacking typical NPV hr palindromes were identified. ChocGV hrs were similar to each other but not to other GV hrs. A 1.8 kb repeat region with a high A+T content (81%) and multiple repeats of 21-210 bp was found between choc36 and 37. This area resembled the non-homologous region origin of DNA replication (non-hr ori) identified in Cryptophlebia leucotreta GV (CrleGV) and Cydia pomonella GV (CpGV). Based on the mean amino acid identities of homologous proteins, ChocGV was closest to fully sequenced genomes CpGV (52.3%) and CrleGV (52.1%). The closest amino acid identity was to individual ORFs from the partially sequenced ChfuGV genome (97.2% in 38 ORFs). Phylogenetic analysis placed ChocGV in a clade with CrleGV and CpGV. PMID:16760394

  11. Sequence Analysis of the Genome of the Neodiprion sertifer Nucleopolyhedrovirus†

    PubMed Central

    Garcia-Maruniak, Alejandra; Maruniak, James E.; Zanotto, Paolo M. A.; Doumbouya, Aissa E.; Liu, Jaw-Ching; Merritt, Thomas M.; Lanoie, Jennifer S.

    2004-01-01

    The genome of the Neodiprion sertifer nucleopolyhedrovirus (NeseNPV), which infects the European pine sawfly, N. sertifer (Hymenoptera: Diprionidae), was sequenced and analyzed. The genome was 86,462 bp in size. The C+G content of 34% was lower than that of the majority of baculoviruses. A total of 90 methionine-initiated open reading frames (ORFs) with more than 50 amino acids and minimal overlapping were found. From those, 43 ORFs were homologous to other baculovirus ORFs, and 29 of these were from the 30 conserved core genes among all baculoviruses. A NeseNPV homolog to the ld130 gene, which is present in all other baculovirus genomes sequenced to date, could not be identified. Six NeseNPV ORFs were similar to non-baculovirus-related genes, one of which was a trypsin-like gene. Only one iap gene, containing a single BIR motif and a RING finger, was found in NeseNPV. Two NeseNPV ORFs (nese18 and nese19) were duplicates transcribed in opposite orientations from each other. NeseNPV did not have an AcMNPV ORF 2 homolog characterized as the baculovirus repeat ORF (bro). Six homologous regions (hrs) were located within the NeseNPV genome, each containing small palindromes embedded within direct repeats. A phylogenetic analysis was done to root the tree based upon the sequences of DNA polymerase genes of NeseNPV, 23 other baculoviruses, and other phyla. Baculovirus phylogeny was then constructed with 29 conserved genes from 24 baculovirus genomes. Culex nigripalpus nucleopolyhedrovirus (CuniNPV) was the most distantly related baculovirus, branching to the hymenopteran NeseNPV and the lepidopteran nucleopolyhedroviruses and granuloviruses. PMID:15194780

  12. A Comparative Analysis of Mitochondrial Genomes in Eustigmatophyte Algae

    PubMed Central

    Ševčíková, Tereza; Klimeš, Vladimír; Zbránková, Veronika; Strnad, Hynek; Hroudová, Miluše; Vlček, Čestmír; Eliáš, Marek

    2016-01-01

    Eustigmatophyceae (Ochrophyta, Stramenopiles) is a small algal group with species of the genus Nannochloropsis being its best studied representatives. Nuclear and organellar genomes have been recently sequenced for several Nannochloropsis spp., but phylogenetically wider genomic studies are missing for eustigmatophytes. We sequenced mitochondrial genomes (mitogenomes) of three species representing most major eustigmatophyte lineages, Monodopsis sp. MarTras21, Vischeria sp. CAUP Q 202 and Trachydiscus minutus, and carried out their comparative analysis in the context of available data from Nannochloropsis and other stramenopiles, revealing a number of noticeable findings. First, mitogenomes of most eustigmatophytes are highly collinear and similar in the gene content, but extensive rearrangements and loss of three otherwise ubiquitous genes happened in the Vischeria lineage; this correlates with an accelerated evolution of mitochondrial gene sequences in this lineage. Second, eustigmatophytes appear to be the only ochrophyte group with the Atp1 protein encoded by the mitogenome. Third, eustigmatophyte mitogenomes uniquely share a truncated nad11 gene encoding only the C-terminal part of the Nad11 protein, while the N-terminal part is encoded by a separate gene in the nuclear genome. Fourth, UGA as a termination codon and the cognate release factor mRF2 were lost from mitochondria independently by the Nannochloropsis and T. minutus lineages. Finally, the rps3 gene in the mitogenome of Vischeria sp. is interrupted by the UAG codon, but the genome includes a gene for an unusual tRNA with an extended anticodon loop that we speculate may serve as a suppressor tRNA to properly decode the rps3 gene. PMID:26872774

  13. A Comparative Analysis of Mitochondrial Genomes in Eustigmatophyte Algae.

    PubMed

    Ševčíková, Tereza; Klimeš, Vladimír; Zbránková, Veronika; Strnad, Hynek; Hroudová, Miluše; Vlček, Čestmír; Eliáš, Marek

    2016-03-01

    Eustigmatophyceae (Ochrophyta, Stramenopiles) is a small algal group with species of the genus Nannochloropsis being its best studied representatives. Nuclear and organellar genomes have been recently sequenced for several Nannochloropsis spp., but phylogenetically wider genomic studies are missing for eustigmatophytes. We sequenced mitochondrial genomes (mitogenomes) of three species representing most major eustigmatophyte lineages, Monodopsis sp. MarTras21, Vischeria sp. CAUP Q 202 and Trachydiscus minutus, and carried out their comparative analysis in the context of available data from Nannochloropsis and other stramenopiles, revealing a number of noticeable findings. First, mitogenomes of most eustigmatophytes are highly collinear and similar in the gene content, but extensive rearrangements and loss of three otherwise ubiquitous genes happened in the Vischeria lineage; this correlates with an accelerated evolution of mitochondrial gene sequences in this lineage. Second, eustigmatophytes appear to be the only ochrophyte group with the Atp1 protein encoded by the mitogenome. Third, eustigmatophyte mitogenomes uniquely share a truncated nad11 gene encoding only the C-terminal part of the Nad11 protein, while the N-terminal part is encoded by a separate gene in the nuclear genome. Fourth, UGA as a termination codon and the cognate release factor mRF2 were lost from mitochondria independently by the Nannochloropsis and T. minutus lineages. Finally, the rps3 gene in the mitogenome of Vischeria sp. is interrupted by the UAG codon, but the genome includes a gene for an unusual tRNA with an extended anticodon loop that we speculate may serve as a suppressor tRNA to properly decode the rps3 gene. PMID:26872774

  14. Genomic cluster and network analysis for predictive screening for hepatotoxicity.

    PubMed

    Fukushima, Tamio; Kikkawa, Rie; Hamada, Yoshimasa; Horii, Ikuo

    2006-12-01

    The present study was undertaken to estimate the usefulness of genomic approaches to predict hepatotoxicity. Male rats were treated with acetaminophen (APAP), carbon tetrachloride (CCL), amiodarone (AD) or tetracycline (TC) at toxic doses. Their livers were extracted 6 or 24 hr after the dosings and were used for subsequent examinations. At 6 hr there were no histological changes noted in any of the groups except for the CCL group, but at 24 hr, such changes were noted in all but the AD group. Regarding genomic analysis, we performed hierarchical cluster analysis using S-plus software. The individual microarray data were clearly classified into 5 treatment-related clusters at 24 hr as well as at 6 hr, even though no morphological changes were noted at 6 hr. In the gene expression analysis using GeneSpring, transcription factor and oxidative stress- and lipid metabolism-related genes were markedly affected in all treatment groups at both time points when compared with the corresponding control values. Finally, we investigated gene networks in the above-affected genes by using Ingenuity Pathway Analysis software. Down-regulation of lipid metabolism-related genes regulated by SREBP1 was observed in all treatment groups at both time points, and up-regulation of oxidative stress-related genes regulated by Nrf2 was observed in the APAP and CCL treatment groups. From the above findings, for the application of genomic approaches to predict hepatotoxicity, we considered that cluster analysis for classification and early prediction of hepatotoxicity and network analysis for investigation of toxicological biomarkers would be useful. PMID:17202758

  15. Genomic analysis of the native European Solanum species, S. dulcamara

    PubMed Central

    2013-01-01

    Background Solanum dulcamara (bittersweet, climbing nightshade) is one of the few species of the Solanaceae family native to Europe. As a common weed it is adapted to a wide range of ecological niches and it has long been recognized as one of the alternative hosts for pathogens and pests responsible for many important diseases in potato, such as Phytophthora. At the same time, it may represent an alternative source of resistance genes against these diseases. Despite its unique ecology and potential as a genetic resource, genomic research tools are lacking for S. dulcamara. We have taken advantage of next-generation sequencing to speed up research on and use of this non-model species. Results In this work, we present the first large-scale characterization of the S. dulcamara transcriptome. Through comparison of RNAseq reads from two different accessions, we were able to predict transcript-based SNP and SSR markers. Using the SNP markers in combination with genomic AFLP and CAPS markers, the first genome-wide genetic linkage map of bittersweet was generated. Based on gene orthology, the markers were anchored to the genome of related Solanum species (tomato, potato and eggplant), revealing both conserved and novel chromosomal rearrangements. This allowed a better estimation of the evolutionary moment of rearrangements in a number of cases and showed that chromosomal breakpoints are regularly re-used. Conclusion Knowledge and tools developed as part of this study pave the way for future genomic research and exploitation of this wild Solanum species. The transcriptome assembly represents a resource for functional analysis of genes underlying interesting biological and agronomical traits and, in the absence of the full genome, provides a reference for RNAseq gene expression profiling aimed at understanding the unique biology of S. dulcamara. Cross-species orthology-based marker selection is shown to be a powerful tool to quickly generate a comparative genetic map, which

  16. Increasing microscopy resolution with photobleaching and intensity cumulant analysis.

    PubMed

    Brutkowski, Wojtek; Dziob, Daniel; Bernas, Tytus

    2015-11-01

    Super-resolution fluorescence microscopy and its applications for analysis of biological structures are evolving rapidly field. A number of approaches aimed at overcoming the fundamental limit imposed by diffraction have been proposed in recent years. Here we present a modification of super-resolution optical fluctuation imaging (SOFI), a technique based on spatio-temporal evaluation of the optical signal from independently fluctuating emitters. Instead of rapid, reversible photoswitching, photobleaching is used to produce irreversible transitions between emitting and nonemitting states of the fluorochrome molecules. Simulated images are used to demonstrate that, in the absence of noise, the proposed SOFI modification increases the efficiency of transfer of high spatial frequencies in a fluorescence microscope. Correspondingly, a decrease of the point spread function (PSF) width is obtained. Moreover, the modified SOFI algorithm is capable of resolving point emitters in the presence of simulated noise. Using real biological images we demonstrate that an increase of resolution is obtained in 2D optical sections through densely packed chromatin in cell nuclei and lamin layer at the nuclear envelope. Finally, the approach is extended to 3D wide-field microscopy, allowing reduction of out-of-focus image blurring. PMID:26278779

  17. Improved protocol for rapid identification of certain spa types using high resolution melting curve analysis.

    PubMed

    Mayerhofer, Benjamin; Stöger, Anna; Pietzka, Ariane T; Fernandez, Haizpea Lasa; Prewein, Bernhard; Sorschag, Sieglinde; Kunert, Renate; Allerberger, Franz; Ruppitsch, Werner

    2015-01-01

    Methicillin-resistant Staphylococcus aureus is one of the most significant pathogens associated with health care. For efficient surveillance, control and outbreak investigation, S. aureus typing is essential. A high resolution melting curve analysis was developed and evaluated for rapid identification of the most frequent spa types found in an Austrian hospital consortium covering 2,435 beds. Among 557 methicillin-resistant Staphylococcus aureus isolates 38 different spa types were identified by sequence analysis of the hypervariable region X of the protein A gene (spa). Identification of spa types through their characteristic high resolution melting curve profiles was considerably improved by double spiking with genomic DNA from spa type t030 and spa type t003 and allowed unambiguous and fast identification of the ten most frequent spa types t001 (58%), t003 (12%), t190 (9%), t041 (5%), t022 (2%), t032 (2%), t008 (2%), t002 (1%), t5712 (1%) and t2203 (1%), representing 93% of all isolates within this hospital consortium. The performance of the assay was evaluated by testing samples with unknown spa types from the daily routine and by testing three different high resolution melting curve analysis real-time PCR instruments. The ten most frequent spa types were identified from all samples and on all instruments with 100% specificity and 100% sensitivity. Compared to classical spa typing by sequence analysis, this gene scanning assay is faster, cheaper and can be performed in a single closed tube assay format. Therefore it is an optimal screening tool to detect the most frequent endemic spa types and to exclude non-endemic spa types within a hospital. PMID:25768007

  18. Improved Protocol for Rapid Identification of Certain Spa Types Using High Resolution Melting Curve Analysis

    PubMed Central

    Mayerhofer, Benjamin; Stöger, Anna; Pietzka, Ariane T.; Fernandez, Haizpea Lasa; Prewein, Bernhard; Sorschag, Sieglinde; Kunert, Renate; Allerberger, Franz; Ruppitsch, Werner

    2015-01-01

    Methicillin-resistant Staphylococcus aureus is one of the most significant pathogens associated with health care. For efficient surveillance, control and outbreak investigation, S. aureus typing is essential. A high resolution melting curve analysis was developed and evaluated for rapid identification of the most frequent spa types found in an Austrian hospital consortium covering 2,435 beds. Among 557 methicillin-resistant Staphylococcus aureus isolates 38 different spa types were identified by sequence analysis of the hypervariable region X of the protein A gene (spa). Identification of spa types through their characteristic high resolution melting curve profiles was considerably improved by double spiking with genomic DNA from spa type t030 and spa type t003 and allowed unambiguous and fast identification of the ten most frequent spa types t001 (58%), t003 (12%), t190 (9%), t041 (5%), t022 (2%), t032 (2%), t008 (2%), t002 (1%), t5712 (1%) and t2203 (1%), representing 93% of all isolates within this hospital consortium. The performance of the assay was evaluated by testing samples with unknown spa types from the daily routine and by testing three different high resolution melting curve analysis real-time PCR instruments. The ten most frequent spa types were identified from all samples and on all instruments with 100% specificity and 100% sensitivity. Compared to classical spa typing by sequence analysis, this gene scanning assay is faster, cheaper and can be performed in a single closed tube assay format. Therefore it is an optimal screening tool to detect the most frequent endemic spa types and to exclude non-endemic spa types within a hospital. PMID:25768007

  19. Geometric multi-resolution analysis and data-driven convolutions

    NASA Astrophysics Data System (ADS)

    Strawn, Nate

    2015-09-01

    We introduce a procedure for learning discrete convolutional operators for generic datasets which recovers the standard block convolutional operators when applied to sets of natural images. They key observation is that the standard block convolutional operators on images are intuitive because humans naturally understand the grid structure of the self-evident functions over images spaces (pixels). This procedure first constructs a Geometric Multi-Resolution Analysis (GMRA) on the set of variables giving rise to a dataset, and then leverages the details of this data structure to identify subsets of variables upon which convolutional operators are supported, as well as a space of functions that can be shared coherently amongst these supports.

  20. Genome-wide association analysis identifies three psoriasis susceptibility loci

    PubMed Central

    Stuart, Philip E.; Nair, Rajan P.; Ellinghaus, Eva; Ding, Jun; Tejasvi, Trilokraj; Gudjonsson, Johann E.; Li, Yun; Weidinger, Stephan; Eberlein, Bernadette; Gieger, Christian; Wichmann, H. Erich; Kunz, Manfred; Ike, Robert; Krueger, Gerald G.; Bowcock, Anne M.; Mroweitz, Ulrich; Lim, Henry W.; Voorhees, John J.; Abecasis, Goncalo R.; Weichenthal, Michael; Franke, Andre; Rahman, Proton; Gladman, Dafna D.; Elder, James T.

    2010-01-01

    To identify novel psoriasis susceptibility loci, we carried out a meta-analysis of two recent genome-wide association studies 1,2, yielding a discovery sample of 1,831 cases and 2,546 controls. 102 of the most promising loci in the discovery analysis were followed up in a three-stage replication study using 4,064 cases and 4,685 controls from Michigan, Toronto, Newfoundland, and Germany. Association at a genome-wide level of significance for the combined discovery and replication samples was found for three genomic regions. One contains NOS2 (rs4795067, p = 4 × 10−11), another contains FBXL19 (rs10782001, p = 9 × 10−10), and a third contains PSMA6 and NFKBIA (rs12586317, p = 2 × 10−8). All three loci were also strongly associated with the subphenotypes of psoriatic arthritis and purely cutaneous psoriasis. Finally, we confirmed a recently identified3 association signal near RNF114. PMID:20953189

  1. Whole-genome sequence-based analysis of thyroid function

    PubMed Central

    Taylor, Peter N.; Porcu, Eleonora; Chew, Shelby; Campbell, Purdey J.; Traglia, Michela; Brown, Suzanne J.; Mullin, Benjamin H.; Shihab, Hashem A.; Min, Josine; Walter, Klaudia; Memari, Yasin; Huang, Jie; Barnes, Michael R.; Beilby, John P.; Charoen, Pimphen; Danecek, Petr; Dudbridge, Frank; Forgetta, Vincenzo; Greenwood, Celia; Grundberg, Elin; Johnson, Andrew D.; Hui, Jennie; Lim, Ee M.; McCarthy, Shane; Muddyman, Dawn; Panicker, Vijay; Perry, John R.B.; Bell, Jordana T.; Yuan, Wei; Relton, Caroline; Gaunt, Tom; Schlessinger, David; Abecasis, Goncalo; Cucca, Francesco; Surdulescu, Gabriela L.; Woltersdorf, Wolfram; Zeggini, Eleftheria; Zheng, Hou-Feng; Toniolo, Daniela; Dayan, Colin M.; Naitza, Silvia; Walsh, John P.; Spector, Tim; Davey Smith, George; Durbin, Richard; Brent Richards, J.; Sanna, Serena; Soranzo, Nicole; Timpson, Nicholas J.; Wilson, Scott G.; Turki, Saeed Al; Anderson, Carl; Anney, Richard; Antony, Dinu; Artigas, Maria Soler; Ayub, Muhammad; Balasubramaniam, Senduran; Barrett, Jeffrey C.; Barroso, Inês; Beales, Phil; Bentham, Jamie; Bhattacharya, Shoumo; Birney, Ewan; Blackwood, Douglas; Bobrow, Martin; Bochukova, Elena; Bolton, Patrick; Bounds, Rebecca; Boustred, Chris; Breen, Gerome; Calissano, Mattia; Carss, Keren; Chatterjee, Krishna; Chen, Lu; Ciampi, Antonio; Cirak, Sebhattin; Clapham, Peter; Clement, Gail; Coates, Guy; Collier, David; Cosgrove, Catherine; Cox, Tony; Craddock, Nick; Crooks, Lucy; Curran, Sarah; Curtis, David; Daly, Allan; Day-Williams, Aaron; Day, Ian N.M.; Down, Thomas; Du, Yuanping; Dunham, Ian; Edkins, Sarah; Ellis, Peter; Evans, David; Faroogi, Sadaf; Fatemifar, Ghazaleh; Fitzpatrick, David R.; Flicek, Paul; Flyod, James; Foley, A. Reghan; Franklin, Christopher S.; Futema, Marta; Gallagher, Louise; Geihs, Matthias; Geschwind, Daniel; Griffin, Heather; Grozeva, Detelina; Guo, Xueqin; Guo, Xiaosen; Gurling, Hugh; Hart, Deborah; Hendricks, Audrey; Holmans, Peter; Howie, Bryan; Huang, Liren; Hubbard, Tim; Humphries, Steve E.; Hurles, Matthew E.; Hysi, Pirro; Jackson, David K.; Jamshidi, Yalda; Jing, Tian; Joyce, Chris; Kaye, Jane; Keane, Thomas; Keogh, Julia; Kemp, John; Kennedy, Karen; Kolb-Kokocinski, Anja; Lachance, Genevieve; Langford, Cordelia; Lawson, Daniel; Lee, Irene; Lek, Monkol; Liang, Jieqin; Lin, Hong; Li, Rui; Li, Yingrui; Liu, Ryan; Lönnqvist, Jouko; Lopes, Margarida; Lotchkova, Valentina; MacArthur, Daniel; Marchini, Jonathan; Maslen, John; Massimo, Mangino; Mathieson, Iain; Marenne, Gaëlle; McGuffin, Peter; McIntosh, Andrew; McKechanie, Andrew G.; McQuillin, Andrew; Metrustry, Sarah; Mitchison, Hannah; Moayyeri, Alireza; Morris, James; Muntoni, Francesco; Northstone, Kate; O'Donnovan, Michael; Onoufriadis, Alexandros; O'Rahilly, Stephen; Oualkacha, Karim; Owen, Michael J.; Palotie, Aarno; Panoutsopoulou, Kalliope; Parker, Victoria; Parr, Jeremy R.; Paternoster, Lavinia; Paunio, Tiina; Payne, Felicity; Pietilainen, Olli; Plagnol, Vincent; Quaye, Lydia; Quai, Michael A.; Raymond, Lucy; Rehnström, Karola; Richards, Brent; Ring, Susan; Ritchie, Graham R.S.; Roberts, Nicola; Savage, David B.; Scambler, Peter; Schiffels, Stephen; Schmidts, Miriam; Schoenmakers, Nadia; Semple, Robert K.; Serra, Eva; Sharp, Sally I.; Shin, So-Youn; Skuse, David; Small, Kerrin; Southam, Lorraine; Spasic-Boskovic, Olivera; Clair, David St; Stalker, Jim; Stevens, Elizabeth; Pourcian, Beate St; Sun, Jianping; Suvisaari, Jaana; Tachmazidou, Ionna; Tobin, Martin D.; Valdes, Ana; Kogelenberg, Margriet Van; Vijayarangakannan, Parthiban; Visscher, Peter M.; Wain, Louise V.; Walters, James T.R.; Wang, Guangbiao; Wang, Jun; Wang, Yu; Ward, Kirsten; Wheeler, Elanor; Whyte, Tamieka; Williams, Hywel; Williamson, Kathleen A.; Wilson, Crispian; Wong, Kim; Xu, ChangJiang; Yang, Jian; Zhang, Fend; Zhang, Pingbo

    2015-01-01

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N=2,287). Using additional whole-genome sequence and deeply imputed data sets, we report meta-analysis results for common variants (MAF≥1%) associated with TSH and FT4 (N=16,335). For TSH, we identify a novel variant in SYN2 (MAF=23.5%, P=6.15 × 10−9) and a new independent variant in PDE8B (MAF=10.4%, P=5.94 × 10−14). For FT4, we report a low-frequency variant near B4GALT6/SLC25A52 (MAF=3.2%, P=1.27 × 10−9) tagging a rare TTR variant (MAF=0.4%, P=2.14 × 10−11). All common variants explain ≥20% of the variance in TSH and FT4. Analysis of rare variants (MAF<1%) using sequence kernel association testing reveals a novel association with FT4 in NRG1. Our results demonstrate that increased coverage in whole-genome sequence association studies identifies novel variants associated with thyroid function. PMID:25743335

  2. Comparative analysis of essential genes in prokaryotic genomic islands

    PubMed Central

    Zhang, Xi; Peng, Chong; Zhang, Ge; Gao, Feng

    2015-01-01

    Essential genes are thought to encode proteins that carry out the basic functions to sustain a cellular life, and genomic islands (GIs) usually contain clusters of horizontally transferred genes. It has been assumed that essential genes are not likely to be located in GIs, but systematical analysis of essential genes in GIs has not been explored before. Here, we have analyzed the essential genes in 28 prokaryotes by statistical method and reached a conclusion that essential genes in GIs are significantly fewer than those outside GIs. The function of 362 essential genes found in GIs has been explored further by BLAST against the Virulence Factor Database (VFDB) and the phage/prophage sequence database of PHAge Search Tool (PHAST). Consequently, 64 and 60 eligible essential genes are found to share the sequence similarity with the virulence factors and phage/prophages-related genes, respectively. Meanwhile, we find several toxin-related proteins and repressors encoded by these essential genes in GIs. The comparative analysis of essential genes in genomic islands will not only shed new light on the development of the prediction algorithm of essential genes, but also give a clue to detect the functionality of essential genes in genomic islands. PMID:26223387

  3. Whole-genome sequence-based analysis of thyroid function.

    PubMed

    Taylor, Peter N; Porcu, Eleonora; Chew, Shelby; Campbell, Purdey J; Traglia, Michela; Brown, Suzanne J; Mullin, Benjamin H; Shihab, Hashem A; Min, Josine; Walter, Klaudia; Memari, Yasin; Huang, Jie; Barnes, Michael R; Beilby, John P; Charoen, Pimphen; Danecek, Petr; Dudbridge, Frank; Forgetta, Vincenzo; Greenwood, Celia; Grundberg, Elin; Johnson, Andrew D; Hui, Jennie; Lim, Ee M; McCarthy, Shane; Muddyman, Dawn; Panicker, Vijay; Perry, John R B; Bell, Jordana T; Yuan, Wei; Relton, Caroline; Gaunt, Tom; Schlessinger, David; Abecasis, Goncalo; Cucca, Francesco; Surdulescu, Gabriela L; Woltersdorf, Wolfram; Zeggini, Eleftheria; Zheng, Hou-Feng; Toniolo, Daniela; Dayan, Colin M; Naitza, Silvia; Walsh, John P; Spector, Tim; Davey Smith, George; Durbin, Richard; Richards, J Brent; Sanna, Serena; Soranzo, Nicole; Timpson, Nicholas J; Wilson, Scott G

    2015-01-01

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N=2,287). Using additional whole-genome sequence and deeply imputed data sets, we report meta-analysis results for common variants (MAF≥1%) associated with TSH and FT4 (N=16,335). For TSH, we identify a novel variant in SYN2 (MAF=23.5%, P=6.15 × 10(-9)) and a new independent variant in PDE8B (MAF=10.4%, P=5.94 × 10(-14)). For FT4, we report a low-frequency variant near B4GALT6/SLC25A52 (MAF=3.2%, P=1.27 × 10(-9)) tagging a rare TTR variant (MAF=0.4%, P=2.14 × 10(-11)). All common variants explain ≥20% of the variance in TSH and FT4. Analysis of rare variants (MAF<1%) using sequence kernel association testing reveals a novel association with FT4 in NRG1. Our results demonstrate that increased coverage in whole-genome sequence association studies identifies novel variants associated with thyroid function. PMID:25743335

  4. Functional genomic analysis of the Drosophila immune response.

    PubMed

    Valanne, Susanna

    2014-01-01

    Drosophila melanogaster has been widely used as a model organism for over a century now, and also as an immunological research model for over 20 years. With the emergence of RNA interference (RNAi) in Drosophila as a robust tool to silence genes of interest, large-scale or genome-wide functional analysis has become a popular way of studying the Drosophila immune response in cell culture. Drosophila immunity is composed of cellular and humoral immunity mechanisms, and especially the systemic, humoral response pathways have been extensively dissected using the functional genomic approach. Although most components of the main immune pathways had already been found using traditional genetic screening techniques, important findings including pathway components, positive and negative regulators and modifiers have been made with RNAi screening. Additionally, RNAi screening has produced new information on host-pathogen interactions related to the pathogenesis of many microbial species. PMID:23707784

  5. [Cancer Genome Atlas Pan-cancer Analysis Project].

    PubMed

    Zhang, Kun; Wang, Hong

    2015-04-01

    Cancer can exhibit different forms depending on the site of origin, cell types, the different forms of genetic mutations which also affect cancer therapeutic effect. Although many genes have been demonstrated to change a direct result of the change in phenotype, however, many cancers lineage complex molecular mechanisms are still not fully elucidated. Therefore, The Cancer Genome Atlas (TCGA) Research Network analyzed a large human tumors, in order to find the molecular changes in DNA, RNA, protein and epigenetic level, The results contain a wealth of data provides us with an opportunity for common, personality and new ideas throughout the cancer lineages form a whole description. Pan-cancer genome program first compares the 12 kinds of cancer types. Analysis of different tumor molecular changes and their functions, will tell us how effective treatment method is applied to a similar phenotype of the tumor. PMID:25936886

  6. Genomic analysis of the Hsp70 superfamily in Arabidopsis thaliana

    PubMed Central

    Lin, Bai-Ling; Wang, Jang-Shiun; Liu, Hung-Chi; Chen, Rung-Wu; Meyer, Yves; Barakat, Abdellalli; Delseny, Michel

    2001-01-01

    The Arabidopsis genome contains at least 18 genes encoding members of the 70-kilodalton heat shock protein (Hsp70) family, 14 in the DnaK subfamily and 4 in the Hsp110/SSE subfamily. While the Hsp70s are highly conserved, a phylogenetic analysis including all members of this family in Arabidopsis and in yeast indicates the homology of Hsp70s in the subgroups, such as those predicted to localize in the same subcellular compartment and those similar to the mammalian Hsp110 and Grp170. Gene structure and genome organization suggest duplication in the origin of some genes. The Arabidopsis hsp70s exhibit distinct expression profiles; representative genes of the subgroups are expressed at relatively high levels during specific developmental stages and under thermal stress. PMID:11599561

  7. Nanopatterned structures for biomolecular analysis toward genomic and proteomic applications

    NASA Astrophysics Data System (ADS)

    Chou, Chia-Fu; Gu, Jian; Wei, Qihuo; Liu, Yingjie; Gupta, Ravi; Nishio, Takeyoshi; Zenhausern, Frederic

    2005-01-01

    We report our fabrication of nanoscale devices using electron beam and nanoimprint lithography (NIL). We focus our study in the emerging fields of NIL, nanophotonics and nanobiotechnology and give a few examples as to how these nanodevices may be applied toward genomic and proteomic applications for molecular analysis. The examples include reverse NIL-fabricated nanofluidic channels for DNA stretching, nanoscale molecular traps constructed from dielectric constrictions for DNA or protein focusing by dielectrophoresis, multi-layer nanoburger and nanoburger multiplets for optimized surface-plasma enhanced Raman scattering for protein detection, and biomolecular motor-based nanosystems. The development of advanced nanopatterning techniques promises reliable and high-throughput manufacturing of nanodevices which could impact significantly on the areas of genomics, proteomics, drug discovery and molecular clinical diagnostics.

  8. Genomic analysis of Skermanella stibiiresistens type strain SB22T

    PubMed Central

    Zhu, Wentao; Huang, Jing; Li, Mingshun; Li, Xiangyang; Wang, Gejiao

    2014-01-01

    Members of genus Skermanella were described as Gram-negative, motile, aerobic, rod-shaped, obligate-heterotrophic bacteria and unable to fix nitrogen. In this study, the genome sequence of Skermanella stibiiresistens SB22T is reported. Phylogenetic analysis using core proteins confirmed the phylogenetic assignment based on 16S rRNA gene sequences. Strain SB22T has all the proteins for complete glycolysis, tricarboxylic acid cycle and pentose phosphate pathway. The RuBisCO encoding genes cbbL1S1 and nitrogenase delta subunit gene anfG are absent, consistent with its inability to fix carbon and nitrogen, respectively. In addition, the genome possesses a series of flagellar assembly and chemotaxis genes to ensure its motility. PMID:25197493

  9. Genomic Analysis of the BMP Family in Glioblastomas

    PubMed Central

    Hover, Laura D; Abel, Ty W; Owens, Philip

    2015-01-01

    Glioblastoma multiforme (GBM) is a grade IV glioma with a median survival of 15 months. Recently, bone morphogenetic protein (BMP) signaling has been shown to promote survival in xenograft murine models. To gain a better understanding of the role of BMP signaling in human GBMs, we examined the genomic alterations of 90 genes associated with BMP signaling in GBM patient samples. We completed this analysis using publically available datasets compiled through The Cancer Genome Atlas and the Glioma Molecular Diagnostic Initiative. Here we show how mRNA expression is altered in GBM samples and how that is associated with patient survival, highlighting both known and novel associations between BMP signaling and GBM biology. PMID:25987829

  10. Emergence of a New Epidemic Neisseria meningitidis Serogroup A Clone in the African Meningitis Belt: High-Resolution Picture of Genomic Changes That Mediate Immune Evasion

    PubMed Central

    Lamelas, Araceli; Harris, Simon R.; Röltgen, Katharina; Dangy, Jean-Pierre; Hauser, Julia; Kingsley, Robert A.; Connor, Thomas R.; Sie, Ali; Hodgson, Abraham; Dougan, Gordon; Parkhill, Julian; Bentley, Stephen D.

    2014-01-01

    ABSTRACT In the African “meningitis belt,” outbreaks of meningococcal meningitis occur in cycles, representing a model for the role of host-pathogen interactions in epidemic processes. The periodicity of the epidemics is not well understood, nor is it currently possible to predict them. In our longitudinal colonization and disease surveys, we have observed waves of clonal replacement with the same serogroup, suggesting that immunity to noncapsular antigens plays a significant role in natural herd immunity. Here, through comparative genomic analysis of 100 meningococcal isolates, we provide a high-resolution view of the evolutionary changes that occurred during clonal replacement of a hypervirulent meningococcal clone (ST-7) by a descendant clone (ST-2859). We show that the majority of genetic changes are due to homologous recombination of laterally acquired DNA, with more than 20% of these events involving acquisition of DNA from other species. Signals of adaptation to evade herd immunity were indicated by genomic hot spots of recombination. Most striking is the high frequency of changes involving the pgl locus, which determines the glycosylation patterns of major protein antigens. High-frequency changes were also observed for genes involved in the regulation of pilus expression and the synthesis of Maf3 adhesins, highlighting the importance of these surface features in host-pathogen interaction and immune evasion. PMID:25336458

  11. Genome-wide mapping of promoter-anchored interactions with close to single-enhancer resolution.

    PubMed

    Sahlén, Pelin; Abdullayev, Ilgar; Ramsköld, Daniel; Matskova, Liudmila; Rilakovic, Nemanja; Lötstedt, Britta; Albert, Thomas J; Lundeberg, Joakim; Sandberg, Rickard

    2015-01-01

    Although the locations of promoters and enhancers have been identified in several cell types, we still have limited information on their connectivity. We developed HiCap, which combines a 4-cutter restriction enzyme Hi-C with sequence capture of promoter regions. Applying the method to mouse embryonic stem cells, we identified promoter-anchored interactions involving 15,905 promoters and 71,984 distal regions. The distal regions were enriched for enhancer marks and transcription, and had a mean fragment size of only 699 bp--close to single-enhancer resolution. High-resolution maps of promoter-anchored interactions with HiCap will be important for detailed characterizations of chromatin interaction landscapes. PMID:26313521

  12. Rapid High Resolution Genotyping of Francisella tularensis by Whole Genome Sequence Comparison of Annotated Genes (“MLST+”)

    PubMed Central

    Mellmann, Alexander; Höppner, Sebastian; Splettstoesser, Wolf D.; Harmsen, Dag

    2015-01-01

    The zoonotic disease tularemia is caused by the bacterium Francisella tularensis. This pathogen is considered as a category A select agent with potential to be misused in bioterrorism. Molecular typing based on DNA-sequence like canSNP-typing or MLVA has become the accepted standard for this organism. Due to the organism’s highly clonal nature, the current typing methods have reached their limit of discrimination for classifying closely related subpopulations within the subspecies F. tularensis ssp. holarctica. We introduce a new gene-by-gene approach, MLST+, based on whole genome data of 15 sequenced F. tularensis ssp. holarctica strains and apply this approach to investigate an epidemic of lethal tularemia among non-human primates in two animal facilities in Germany. Due to the high resolution of MLST+ we are able to demonstrate that three independent clones of this highly infectious pathogen were responsible for these spatially and temporally restricted outbreaks. PMID:25856198

  13. High density linkage mapping of genomic and transcriptomic SNPs for synteny analysis and anchoring the genome sequence of chickpea

    PubMed Central

    Gaur, Rashmi; Jeena, Ganga; Shah, Niraj; Gupta, Shefali; Pradhan, Seema; Tyagi, Akhilesh K; Jain, Mukesh; Chattopadhyay, Debasis; Bhatia, Sabhyata

    2015-01-01

    This study presents genome-wide discovery of SNPs through next generation sequencing of the genome of Cicer reticulatum. Mapping of the C. reticulatum sequenced reads onto the draft genome assembly of C. arietinum (desi chickpea) resulted in identification of 842,104 genomic SNPs which were utilized along with an additional 36,446 genic SNPs identified from transcriptome sequences of the aforementioned varieties. Two new chickpea Oligo Pool All (OPAs) each having 3,072 SNPs were designed and utilized for SNP genotyping of 129 Recombinant Inbred Lines (RILs). Using Illumina GoldenGate Technology genotyping data of 5,041 SNPs were generated and combined with the 1,673 marker data from previously published studies, to generate a high resolution linkage map. The map comprised of 6698 markers distributed on eight linkage groups spanning 1083.93 cM with an average inter-marker distance of 0.16 cM. Utility of the present map was demonstrated for improving the anchoring of the earlier reported draft genome sequence of desi chickpea by ~30% and that of kabuli chickpea by 18%. The genetic map reported in this study represents the most dense linkage map of chickpea , with the potential to facilitate efficient anchoring of the draft genome sequences of desi as well as kabuli chickpea varieties. PMID:26303721

  14. Refinement of the high-resolution physical and genetic map of Rhodobacter capsulatus and genome surveys using blots of the cosmid encyclopedia.

    PubMed Central

    Fonstein, M; Koshy, E G; Nikolskaya, T; Mourachov, P; Haselkorn, R

    1995-01-01

    Cosmids from a library containing Rhodobacter capsulatus DNA fragments were previously ordered in two contigs: one corresponding to the chromosome and one to a 134 kb plasmid. This map contained 40 regions connected only by colony hybridization. To confirm the linkage and correct the map, the actual sizes of the overlaps were determined by blot-hybridization with Rhodobacter chromosomal DNA and by mapping of additional cosmids. Several revisions of the earlier map include single cosmid shifts and inversions. One additional gap in a cosmid contig was also found, raising the possibility that the chromosome is not a contiguous circle. About 2500 additional EcoRI,BamHI and HindIII restriction sites were added to the 560 EcoRV sites previously mapped onto the Rhodobacter chromosome, increasing the resolution of the physical map to the size of individual genes. Twenty-five new markers were located on the genetic map. The 48 markers now mapped represent nearly 300 genes and ORFs cloned from different species of Rhodobacter. The orientation of transcription of the four rrn operons was established using 16S rRNA- and 23S rRNA-specific probes and digestion with the rare-cutting enzyme, CeuI. Gel blots of 192 cosmids of the miniset of R.capsulatus digested with EcoRV were prepared. Such a hybridization template represents the whole genome cut into 560 DNA fragments varying in size from 0.4 to 25 kb. This template was used for high-resolution mapping of single genes, analysis of total genomic DNAs from related Rhodobacter strains and differentially expressed RNAs. Images PMID:7737133

  15. Genome-wide Comparative Analysis of Annexin Superfamily in Plants

    PubMed Central

    Jami, Sravan Kumar; Clark, Greg B.; Ayele, Belay T.; Ashe, Paula; Kirti, Pulugurtha Bharadwaja

    2012-01-01

    Most annexins are calcium-dependent, phospholipid-binding proteins with suggested functions in response to environmental stresses and signaling during plant growth and development. They have previously been identified and characterized in Arabidopsis and rice, and constitute a multigene family in plants. In this study, we performed a comparative analysis of annexin gene families in the sequenced genomes of Viridiplantae ranging from unicellular green algae to multicellular plants, and identified 149 genes. Phylogenetic studies of these deduced annexins classified them into nine different arbitrary groups. The occurrence and distribution of bona fide type II calcium binding sites within the four annexin domains were found to be different in each of these groups. Analysis of chromosomal distribution of annexin genes in rice, Arabidopsis and poplar revealed their localization on various chromosomes with some members also found on duplicated chromosomal segments leading to gene family expansion. Analysis of gene structure suggests sequential or differential loss of introns during the evolution of land plant annexin genes. Intron positions and phases are well conserved in annexin genes from representative genomes ranging from Physcomitrella to higher plants. The occurrence of alternative motifs such as K/R/HGD was found to be overlapping or at the mutated regions of the type II calcium binding sites indicating potential functional divergence in certain plant annexins. This study provides a basis for further functional analysis and characterization of annexin multigene families in the plant lineage. PMID:23133603

  16. Genetic analysis of biological pathway data through genomic randomization

    PubMed Central

    Yaspan, Brian L.; Bush, William S.; Torstenson, Eric S.; Ma, Deqiong; Pericak-Vance, Margaret A.; Ritchie, Marylyn D.; Sutcliffe, James S.; Haines, Jonathan L.

    2011-01-01

    Genome Wide Association Studies (GWAS) are a standard approach for large-scale common variation characterization and for identification of single loci predisposing to disease. However, due to issues of moderate sample sizes and particularly multiple testing correction, many variants of smaller effect size are not detected within a single allele analysis framework. Thus, small main effects and potential epistatic effects are not consistently observed in GWAS using standard analytical approaches that consider only single SNP alleles. Here we propose unique methodology that aggregates variants of interest (for example, genes in a biological pathway) using GWAS results. Multiple testing and type I error concerns are minimized using empirical genomic randomization to estimate significance. Randomization corrects for common pathway-based analysis biases such as SNP coverage and density, linkage disequilibrium, gene size and pathway size. PARIS (Pathway Analysis by Randomization Incorporating Structure) applies this randomization and in doing so directly accounts for linkage disequilibrium effects. PARIS is independent of association analysis method and is thus applicable to GWAS datasets of all study designs. Using the KEGG database as an example, we apply PARIS to the publicly available Autism Genetic Resource Exchange (AGRE) GWA dataset, revealing pathways with a significant enrichment of positive association results. PMID:21279722

  17. Development and Validation of a Comparative Genomic Fingerprinting Method for High-Resolution Genotyping of Campylobacter jejuni

    PubMed Central

    Ross, Susan L.; Mutschall, Steven K.; MacKinnon, Joanne M.; Roberts, Michael J.; Buchanan, Cody J.; Kruczkiewicz, Peter; Jokinen, Cassandra C.; Thomas, James E.; Nash, John H. E.; Gannon, Victor P. J.; Marshall, Barbara; Pollari, Frank; Clark, Clifford G.

    2012-01-01

    Campylobacter spp. are a leading cause of bacterial gastroenteritis worldwide. The need for molecular subtyping methods with enhanced discrimination in the context of surveillance- and outbreak-based epidemiologic investigations of Campylobacter spp. is critical to our understanding of sources and routes of transmission and the development of mitigation strategies to reduce the incidence of campylobacteriosis. We describe the development and validation of a rapid and high-resolution comparative genomic fingerprinting (CGF) method for C. jejuni. A total of 412 isolates from agricultural, environmental, retail, and human clinical sources obtained from the Canadian national integrated enteric pathogen surveillance program (C-EnterNet) were analyzed using a 40-gene assay (CGF40) and multilocus sequence typing (MLST). The significantly higher Simpson's index of diversity (ID) obtained with CGF40 (ID = 0.994) suggests that it has a higher discriminatory power than MLST at both the level of clonal complex (ID = 0.873) and sequence type (ID = 0.935). High Wallace coefficients obtained when CGF40 was used as the primary typing method suggest that CGF and MLST are highly concordant, and we show that isolates with identical MLST profiles are comprised of isolates with distinct but highly similar CGF profiles. The high concordance with MLST coupled with the ability to discriminate between closely related isolates suggests that CFG40 is useful in differentiating highly prevalent sequence types, such as ST21 and ST45. CGF40 is a high-resolution comparative genomics-based method for C. jejuni subtyping with high discriminatory power that is also rapid, low cost, and easily deployable for routine epidemiologic surveillance and outbreak investigations. PMID:22170908

  18. Transcription-coupled and global genome repair in the Saccharomyces cerevisiae RPB2 gene at nucleotide resolution.

    PubMed Central

    Tijsterman, M; Tasseron-de Jong, J G; van de Putte, P; Brouwer, J

    1996-01-01

    Repair of UV-induced cyclobutane pyrimidine dimers (CPDs) was examined at single nucleotide resolution in the yeast Saccharomyces cerevisiae, using an improved protocol for genomic end-labelling. To obtain the sensitivity required for adduct detection in yeast, an oligonucleotide-directed enrichment step was introduced into the current methodology developed for adduct detection in Escherichia coli. With this method, heterogeneous repair of CPDs within the RPB2 locus is observed. Individual CPDs positioned in the transcribed strand are removed very efficiently with identical kinetics. This fast repair starts within 23 bases downstream of the transcription initiation site. The non-transcribed strand of the active gene exhibits slow repair without detectable repair variations between individual lesions. In contrast, CPDs positioned in the promoter region show profound repair heterogeneity. Here, CPDs at specific sites are removed very quickly, with comparable rates to CPDs positioned in the transcribed strand, while at other positions lesions are not repaired at all during the period studied. Interestingly, the fast repair in the promoter region is dependent on the RAD7 and RAD16 genes, as are the slowly repaired CPDs in this region and in the non-transcribed strand. This indicates that the global genome repair pathway is not intrinsically slow and at specific positions can be as efficient as the transcription-coupled repair pathway. PMID:8836174

  19. Integrated genome-scale analysis of the transcriptional regulatory landscape in a blood stem/progenitor cell model.

    PubMed

    Wilson, Nicola K; Schoenfelder, Stefan; Hannah, Rebecca; Sánchez Castillo, Manuel; Schütte, Judith; Ladopoulos, Vasileios; Mitchelmore, Joanna; Goode, Debbie K; Calero-Nieto, Fernando J; Moignard, Victoria; Wilkinson, Adam C; Jimenez-Madrid, Isabel; Kinston, Sarah; Spivakov, Mikhail; Fraser, Peter; Göttgens, Berthold

    2016-03-31

    Comprehensive study of transcriptional control processes will be required to enhance our understanding of both normal and malignant hematopoiesis. Modern sequencing technologies have revolutionized our ability to generate genome-scale expression and histone modification profiles, transcription factor (TF)-binding maps, and also comprehensive chromatin-looping information. Many of these technologies, however, require large numbers of cells, and therefore cannot be applied to rare hematopoietic stem/progenitor cell (HSPC) populations. The stem cell factor-dependent multipotent progenitor cell line HPC-7 represents a well-recognized cell line model for HSPCs. Here we report genome-wide maps for 17 TFs, 3 histone modifications, DNase I hypersensitive sites, and high-resolution promoter-enhancer interactomes in HPC-7 cells. Integrated analysis of these complementary data sets revealed TF occupancy patterns of genomic regions involved in promoter-anchored loops. Moreover, preferential associations between pairs of TFs bound at either ends of chromatin loops led to the identification of 4 previously unrecognized protein-protein interactions between key blood stem cell regulators. All HPC-7 data sets are freely available both through standard repositories and a user-friendly Web interface. Together with previously generated genome-wide data sets, this study integrates HPC-7 data into a genomic resource on par with ENCODE tier 1 cell lines and, importantly, is the only current model with comprehensive genome-scale data that is relevant to HSPC biology. PMID:26809507

  20. Structural analysis of hepatitis C RNA genome using DNA microarrays

    PubMed Central

    Martell, María; Briones, Carlos; de Vicente, Aránzazu; Piron, María; Esteban, Juan I.; Esteban, Rafael; Guardia, Jaime; Gómez, Jordi

    2004-01-01

    Many studies have tried to identify specific nucleotide sequences in the quasispecies of hepatitis C virus (HCV) that determine resistance or sensitivity to interferon (IFN) therapy, unfortunately without conclusive results. Although viral proteins represent the most evident phenotype of the virus, genomic RNA sequences determine secondary and tertiary structures which are also part of the viral phenotype and can be involved in important biological roles. In this work, a method of RNA structure analysis has been developed based on the hybridization of labelled HCV transcripts to microarrays of complementary DNA oligonucleotides. Hybridizations were carried out at non-denaturing conditions, using appropriate temperature and buffer composition to allow binding to the immobilized probes of the RNA transcript without disturbing its secondary/tertiary structural motifs. Oligonucleotides printed onto the microarray covered the entire 5′ non-coding region (5′NCR), the first three-quarters of the core region, the E2–NS2 junction and the first 400 nt of the NS3 region. We document the use of this methodology to analyse the structural degree of a large region of HCV genomic RNA in two genotypes associated with different responses to IFN treatment. The results reported here show different structural degree along the genome regions analysed, and differential hybridization patterns for distinct genotypes in NS2 and NS3 HCV regions. PMID:15247323

  1. Population genomic analysis of outcrossing and recombination in yeast.

    PubMed

    Ruderfer, Douglas M; Pratt, Stephen C; Seidel, Hannah S; Kruglyak, Leonid

    2006-09-01

    The budding yeast Saccharomyces cerevisiae has been used by humans for millennia to make wine, beer and bread. More recently, it became a key model organism for studies of eukaryotic biology and for genomic analysis. However, relatively little is known about the natural lifestyle and population genetics of yeast. One major question is whether genetically diverse yeast strains mate and recombine in the wild. We developed a method to infer the evolutionary history of a species from genome sequences of multiple individuals and applied it to whole-genome sequence data from three strains of Saccharomyces cerevisiae and the sister species Saccharomyces paradoxus. We observed a pattern of sequence variation among yeast strains in which ancestral recombination events lead to a mosaic of segments with shared genealogy. Based on sequence divergence and the inferred median size of shared segments (approximately 2,000 bp), we estimated that although any two strains have undergone approximately 16 million cell divisions since their last common ancestor, only 314 outcrossing events have occurred during this time (roughly one every 50,000 divisions). Local correlations in polymorphism rates indicate that linkage disequilibrium in yeast should extend over kilobases. Our results provide the initial foundation for population studies of association between genotype and phenotype in S. cerevisiae. PMID:16892060

  2. Comparative genomic analysis of ten Streptococcus pneumoniae temperate bacteriophages.

    PubMed

    Romero, Patricia; Croucher, Nicholas J; Hiller, N Luisa; Hu, Fen Z; Ehrlich, Garth D; Bentley, Stephen D; García, Ernesto; Mitchell, Tim J

    2009-08-01

    Streptococcus pneumoniae is an important human pathogen that often carries temperate bacteriophages. As part of a program to characterize the genetic makeup of prophages associated with clinical strains and to assess the potential roles that they play in the biology and pathogenesis in their host, we performed comparative genomic analysis of 10 temperate pneumococcal phages. All of the genomes are organized into five major gene clusters: lysogeny, replication, packaging, morphogenesis, and lysis clusters. All of the phage particles observed showed a Siphoviridae morphology. The only genes that are well conserved in all the genomes studied are those involved in the integration and the lysis of the host in addition to two genes, of unknown function, within the replication module. We observed that a high percentage of the open reading frames contained no similarities to any sequences catalogued in public databases; however, genes that were homologous to known phage virulence genes, including the pblB gene of Streptococcus mitis and the vapE gene of Dichelobacter nodosus, were also identified. Interestingly, bioinformatic tools showed the presence of a toxin-antitoxin system in the phage phiSpn_6, and this represents the first time that an addition system in a pneumophage has been identified. Collectively, the temperate pneumophages contain a diverse set of genes with various levels of similarity among them. PMID:19502408

  3. Applying Logic Analysis to Genomic Data and Phylogenetic Profiles

    NASA Astrophysics Data System (ADS)

    Yeates, Todd

    2005-03-01

    One of the main goals of comparative genomics is to understand how all the various proteins in a cell relate to each other in terms of pathways and interaction networks. Various computational ideas have been explored with this goal in mind. In the original phylogenetic profile method, `functional linkages' were inferred between pairs of proteins when the two proteins, A and B, showed identical (or statistically similar) patterns of presence vs. absence across a set of completely sequenced genomes. Here we describe a new generalization, logic analysis of phylogenetic profiles (LAPP), from which higher order relationships can be identified between three (or more) different proteins. For instance, in one type of triplet logic relation -- of which there are eight distinct types -- a protein C may be present in a genome iff proteins A and B are both present (C=AB). An application of the LAPP method identifies thousands of previously unidentified relationships between protein triplets. These higher order logic relationships offer insights -- not available from pairwise approaches -- into branching, competition, and alternate routes through cellular pathways and networks. The results also make it possible to assign tentative cellular functions to many novel proteins of unknown function. Co-authors: Peter Bowers, Shawn Cokus, Morgan Beeby, and David Eisenberg

  4. Genome analysis of enterovirus 71 strains differing in mouse pathogenicity.

    PubMed

    Li, Peng; Yue, Yingying; Song, Nannan; Li, Bingqing; Meng, Hong; Yang, Guiwen; Li, Zhihui; An, Liguo; Qin, Lizeng

    2016-04-01

    Enterovirus 71 (EV71) is a major causative agent of hand, foot, and mouth disease (HFMD) and is occasionally associated with severe neurological diseases. The investigation of virulence determinants of EV71 is rudimentary. Therefore, it is important to understand the relationship between EV71 virulence and genomic information. In this study, a series of analyses about full-length genomic sequence were performed on six EV71 strains isolated from HFMD patients with either severe or mild clinical symptoms. A one-day-old BALB/c mouse model was used to study the infection characteristics. Results showed all six strains were of the subgenogroup C4a. Viral full-length genomic sequence analysis showed that a total of 40 nucleotide differences between strains of highly and low virulence were revealed. Among all mutations, three nucleotide mutations were found in the untranslated region. A mutation, nt115, at internal ribozyme entry site (IRES) caused RNA secondary structural change. The other 37 mutations were all located in the open reading frame resulting in 8 amino acid mutations. Importantly, we discovered that a mutation of amino acid (Asn1617 → Asp1617) in the 3C proteinase (3C(pro)) of highly and low pathogenic strains could lead to conformational change at the active center, suggesting that this site may be a virulence determinant of EV71. PMID:26781949

  5. A Chromosome 7 Pericentric Inversion Defined at Single-Nucleotide Resolution Using Diagnostic Whole Genome Sequencing in a Patient with Hand-Foot-Genital Syndrome.

    PubMed

    Watson, Christopher M; Crinnion, Laura A; Harrison, Sally M; Lascelles, Carolina; Antanaviciute, Agne; Carr, Ian M; Bonthron, David T; Sheridan, Eamonn

    2016-01-01

    Next generation sequencing methodologies are facilitating the rapid characterisation of novel structural variants at nucleotide resolution. These approaches are particularly applicable to variants initially identified using alternative molecular methods. We report a child born with bilateral postaxial syndactyly of the feet and bilateral fifth finger clinodactyly. This was presumed to be an autosomal recessive syndrome, due to the family history of consanguinity. Karyotype analysis revealed a homozygous pericentric inversion of chromosome 7 (46,XX,inv(7)(p15q21)x2) which was confirmed to be heterozygous in both unaffected parents. Since the resolution of the karyotype was insufficient to identify any putatively causative gene, we undertook medium-coverage whole genome sequencing using paired-end reads, in order to elucidate the molecular breakpoints. In a two-step analysis, we first narrowed down the region by identifying discordant read-pairs, and then determined the precise molecular breakpoint by analysing the mapping locations of "soft-clipped" breakpoint-spanning reads. PCR and Sanger sequencing confirmed the identified breakpoints, both of which were located in intergenic regions. Significantly, the 7p15 breakpoint was located 523 kb upstream of HOXA13, the locus for hand-foot-genital syndrome. By inference from studies of HOXA locus control in the mouse, we suggest that the inversion has delocalised a HOXA13 enhancer to produce the phenotype observed in our patient. This study demonstrates how modern genetic diagnostic approach can characterise structural variants at nucleotide resolution and provide potential insights into functional regulation. PMID:27272187

  6. A Chromosome 7 Pericentric Inversion Defined at Single-Nucleotide Resolution Using Diagnostic Whole Genome Sequencing in a Patient with Hand-Foot-Genital Syndrome

    PubMed Central

    Crinnion, Laura A.; Harrison, Sally M.; Lascelles, Carolina; Antanaviciute, Agne; Carr, Ian M.; Bonthron, David T.; Sheridan, Eamonn

    2016-01-01

    Next generation sequencing methodologies are facilitating the rapid characterisation of novel structural variants at nucleotide resolution. These approaches are particularly applicable to variants initially identified using alternative molecular methods. We report a child born with bilateral postaxial syndactyly of the feet and bilateral fifth finger clinodactyly. This was presumed to be an autosomal recessive syndrome, due to the family history of consanguinity. Karyotype analysis revealed a homozygous pericentric inversion of chromosome 7 (46,XX,inv(7)(p15q21)x2) which was confirmed to be heterozygous in both unaffected parents. Since the resolution of the karyotype was insufficient to identify any putatively causative gene, we undertook medium-coverage whole genome sequencing using paired-end reads, in order to elucidate the molecular breakpoints. In a two-step analysis, we first narrowed down the region by identifying discordant read-pairs, and then determined the precise molecular breakpoint by analysing the mapping locations of “soft-clipped” breakpoint-spanning reads. PCR and Sanger sequencing confirmed the identified breakpoints, both of which were located in intergenic regions. Significantly, the 7p15 breakpoint was located 523 kb upstream of HOXA13, the locus for hand-foot-genital syndrome. By inference from studies of HOXA locus control in the mouse, we suggest that the inversion has delocalised a HOXA13 enhancer to produce the phenotype observed in our patient. This study demonstrates how modern genetic diagnostic approach can characterise structural variants at nucleotide resolution and provide potential insights into functional regulation. PMID:27272187

  7. Comparative Genomic Analysis of Mannheimia haemolytica from Bovine Sources.

    PubMed

    Klima, Cassidy L; Cook, Shaun R; Zaheer, Rahat; Laing, Chad; Gannon, Vick P; Xu, Yong; Rasmussen, Jay; Potter, Andrew; Hendrick, Steve; Alexander, Trevor W; McAllister, Tim A

    2016-01-01

    Bovine respiratory disease is a common health problem in beef production. The primary bacterial agent involved, Mannheimia haemolytica, is a target for antimicrobial therapy and at risk for associated antimicrobial resistance development. The role of M. haemolytica in pathogenesis is linked to serotype with serotypes 1 (S1) and 6 (S6) isolated from pneumonic lesions and serotype 2 (S2) found in the upper respiratory tract of healthy animals. Here, we sequenced the genomes of 11 strains of M. haemolytica, representing all three serotypes and performed comparative genomics analysis to identify genetic features that may contribute to pathogenesis. Possible virulence associated genes were identified within 14 distinct prophage, including a periplasmic chaperone, a lipoprotein, peptidoglycan glycosyltransferase and a stress response protein. Prophage content ranged from 2-8 per genome, but was higher in S1 and S6 strains. A type I-C CRISPR-Cas system was identified in each strain with spacer diversity and organization conserved among serotypes. The majority of spacers occur in S1 and S6 strains and originate from phage suggesting that serotypes 1 and 6 may be more resistant to phage predation. However, two spacers complementary to the host chromosome targeting a UDP-N-acetylglucosamine 2-epimerase and a glycosyl transferases group 1 gene are present in S1 and S6 strains only indicating these serotypes may employ CRISPR-Cas to regulate gene expression to avoid host immune responses or enhance adhesion during infection. Integrative conjugative elements are present in nine of the eleven genomes. Three of these harbor extensive multi-drug resistance cassettes encoding resistance against the majority of drugs used to combat infection in beef cattle, including macrolides and tetracyclines used in human medicine. The findings here identify key features that are likely contributing to serotype related pathogenesis and specific targets for vaccine design intended to reduce the

  8. Comparative Genomic Analysis of Mannheimia haemolytica from Bovine Sources

    PubMed Central

    Klima, Cassidy L.; Cook, Shaun R.; Zaheer, Rahat; Laing, Chad; Gannon, Vick P.; Xu, Yong; Rasmussen, Jay; Potter, Andrew; Hendrick, Steve; Alexander, Trevor W.; McAllister, Tim A.

    2016-01-01

    Bovine respiratory disease is a common health problem in beef production. The primary bacterial agent involved, Mannheimia haemolytica, is a target for antimicrobial therapy and at risk for associated antimicrobial resistance development. The role of M. haemolytica in pathogenesis is linked to serotype with serotypes 1 (S1) and 6 (S6) isolated from pneumonic lesions and serotype 2 (S2) found in the upper respiratory tract of healthy animals. Here, we sequenced the genomes of 11 strains of M. haemolytica, representing all three serotypes and performed comparative genomics analysis to identify genetic features that may contribute to pathogenesis. Possible virulence associated genes were identified within 14 distinct prophage, including a periplasmic chaperone, a lipoprotein, peptidoglycan glycosyltransferase and a stress response protein. Prophage content ranged from 2–8 per genome, but was higher in S1 and S6 strains. A type I-C CRISPR-Cas system was identified in each strain with spacer diversity and organization conserved among serotypes. The majority of spacers occur in S1 and S6 strains and originate from phage suggesting that serotypes 1 and 6 may be more resistant to phage predation. However, two spacers complementary to the host chromosome targeting a UDP-N-acetylglucosamine 2-epimerase and a glycosyl transferases group 1 gene are present in S1 and S6 strains only indicating these serotypes may employ CRISPR-Cas to regulate gene expression to avoid host immune responses or enhance adhesion during infection. Integrative conjugative elements are present in nine of the eleven genomes. Three of these harbor extensive multi-drug resistance cassettes encoding resistance against the majority of drugs used to combat infection in beef cattle, including macrolides and tetracyclines used in human medicine. The findings here identify key features that are likely contributing to serotype related pathogenesis and specific targets for vaccine design intended to reduce the

  9. Radiation induced genome instability: multiscale modelling and data analysis

    NASA Astrophysics Data System (ADS)

    Andreev, Sergey; Eidelman, Yuri

    2012-07-01

    Genome instability (GI) is thought to be an important step in cancer induction and progression. Radiation induced GI is usually defined as genome alterations in the progeny of irradiated cells. The aim of this report is to demonstrate an opportunity for integrative analysis of radiation induced GI on the basis of multiscale modelling. Integrative, systems level modelling is necessary to assess different pathways resulting in GI in which a variety of genetic and epigenetic processes are involved. The multilevel modelling includes the Monte Carlo based simulation of several key processes involved in GI: DNA double strand breaks (DSBs) generation in cells initially irradiated as well as in descendants of irradiated cells, damage transmission through mitosis. Taking the cell-cycle-dependent generation of DNA/chromosome breakage into account ensures an advantage in estimating the contribution of different DNA damage response pathways to GI, as to nonhomologous vs homologous recombination repair mechanisms, the role of DSBs at telomeres or interstitial chromosomal sites, etc. The preliminary estimates show that both telomeric and non-telomeric DSB interactions are involved in delayed effects of radiation although differentially for different cell types. The computational experiments provide the data on the wide spectrum of GI endpoints (dicentrics, micronuclei, nonclonal translocations, chromatid exchanges, chromosome fragments) similar to those obtained experimentally for various cell lines under various experimental conditions. The modelling based analysis of experimental data demonstrates that radiation induced GI may be viewed as processes of delayed DSB induction/interaction/transmission being a key for quantification of GI. On the other hand, this conclusion is not sufficient to understand GI as a whole because factors of DNA non-damaging origin can also induce GI. Additionally, new data on induced pluripotent stem cells reveal that GI is acquired in normal mature

  10. Industrial Acetogenic Biocatalysts: A Comparative Metabolic and Genomic Analysis

    PubMed Central

    Bengelsdorf, Frank R.; Poehlein, Anja; Linder, Sonja; Erz, Catarina; Hummel, Tim; Hoffmeister, Sabrina; Daniel, Rolf; Dürre, Peter

    2016-01-01

    Synthesis gas (syngas) fermentation by anaerobic acetogenic bacteria employing the Wood–Ljungdahl pathway is a bioprocess for production of biofuels and biocommodities. The major fermentation products of the most relevant biocatalytic strains (Clostridium ljungdahlii, C. autoethanogenum, C. ragsdalei, and C. coskatii) are acetic acid and ethanol. A comparative metabolic and genomic analysis using the mentioned biocatalysts might offer targets for metabolic engineering and thus improve the production of compounds apart from ethanol. Autotrophic growth and product formation of the four wild type (WT) strains were compared in uncontrolled batch experiments. The genomes of C. ragsdalei and C. coskatii were sequenced and the genome sequences of all four biocatalytic strains analyzed in comparative manner. Growth and product spectra (acetate, ethanol, 2,3-butanediol) of C. autoethanogenum, C. ljungdahlii, and C. ragsdalei were rather similar. In contrast, C. coskatii produced significantly less ethanol and its genome sequence lacks two genes encoding aldehyde:ferredoxin oxidoreductases (AOR). Comparative genome sequence analysis of the four WT strains revealed high average nucleotide identity (ANI) of C. ljungdahlii and C. autoethanogenum (99.3%) and C. coskatii (98.3%). In contrast, C. ljungdahlii WT and C. ragsdalei WT showed an ANI-based similarity of only 95.8%. Additionally, recombinant C. ljungdahlii strains were constructed that harbor an artificial acetone synthesis operon (ASO) consisting of the following genes: adc, ctfA, ctfB, and thlA (encoding acetoacetate decarboxylase, acetoacetyl-CoA:acetate/butyrate:CoA-transferase subunits A and B, and thiolase) under the control of thlA promoter (PthlA) from C. acetobutylicum or native pta-ack promoter (Ppta-ack) from C. ljungdahlii. Respective recombinant strains produced 2-propanol rather than acetone, due to the presence of a NADPH-dependent primary-secondary alcohol dehydrogenase that converts acetone to 2

  11. Industrial Acetogenic Biocatalysts: A Comparative Metabolic and Genomic Analysis.

    PubMed

    Bengelsdorf, Frank R; Poehlein, Anja; Linder, Sonja; Erz, Catarina; Hummel, Tim; Hoffmeister, Sabrina; Daniel, Rolf; Dürre, Peter

    2016-01-01

    Synthesis gas (syngas) fermentation by anaerobic acetogenic bacteria employing the Wood-Ljungdahl pathway is a bioprocess for production of biofuels and biocommodities. The major fermentation products of the most relevant biocatalytic strains (Clostridium ljungdahlii, C. autoethanogenum, C. ragsdalei, and C. coskatii) are acetic acid and ethanol. A comparative metabolic and genomic analysis using the mentioned biocatalysts might offer targets for metabolic engineering and thus improve the production of compounds apart from ethanol. Autotrophic growth and product formation of the four wild type (WT) strains were compared in uncontrolled batch experiments. The genomes of C. ragsdalei and C. coskatii were sequenced and the genome sequences of all four biocatalytic strains analyzed in comparative manner. Growth and product spectra (acetate, ethanol, 2,3-butanediol) of C. autoethanogenum, C. ljungdahlii, and C. ragsdalei were rather similar. In contrast, C. coskatii produced significantly less ethanol and its genome sequence lacks two genes encoding aldehyde:ferredoxin oxidoreductases (AOR). Comparative genome sequence analysis of the four WT strains revealed high average nucleotide identity (ANI) of C. ljungdahlii and C. autoethanogenum (99.3%) and C. coskatii (98.3%). In contrast, C. ljungdahlii WT and C. ragsdalei WT showed an ANI-based similarity of only 95.8%. Additionally, recombinant C. ljungdahlii strains were constructed that harbor an artificial acetone synthesis operon (ASO) consisting of the following genes: adc, ctfA, ctfB, and thlA (encoding acetoacetate decarboxylase, acetoacetyl-CoA:acetate/butyrate:CoA-transferase subunits A and B, and thiolase) under the control of thlA promoter (P thlA ) from C. acetobutylicum or native pta-ack promoter (P pta-ack ) from C. ljungdahlii. Respective recombinant strains produced 2-propanol rather than acetone, due to the presence of a NADPH-dependent primary-secondary alcohol dehydrogenase that converts acetone to 2

  12. Paired-end genomic signature tags: a method for the functional analysis of genomes and epigenomes.

    PubMed

    Dunn, John J; McCorkle, Sean R; Everett, Logan; Anderson, Carl W

    2007-01-01

    Because paired-end genomic signature tags are sequenced-based, they have the potential to become an alternate tool to tiled microarray hybridization as a method for genome-wide localization of transcription factors and other sequence-specific DNA binding proteins. As outlined here the method also can be used for global analysis of DNA methylation. One advantage of this approach is the ability to easily switch between different genome types without having to fabricate a new microarray for each and every DNA type. However, the method does have some disadvantages. Among the most rate-limiting steps of our PE-GST protocol are the need to concatemerize the diTAGs, size fractionate them and then clone them prior to sequencing. This is usually followed by additional steps to amplify and size select for long (> or = 500) concatemer inserts prior to sequencing. These time-consuming steps are important for standard DNA sequencing as they increase efficiency approximately 20-30-fold since each amplified concatemer can now provide information on multiple tags; the limitation on data acqui- sition is read length during sequencing. However, the development of new sequencing methods such as Life Sciences' 454 new nanotechnology-based sequencing instrument (41) could increase tag sequencing efficiency by several orders of magnitude (> or = 100,000 diTAG reads/run), which is sufficient to provide in-depth global analysis of all ChIP PE-GSTs in a single run. This is because the lengths of our paired-end diTAGs (approximately 60 bp) fall well within the region of high accuracy for read lengths on this instrument. In principle, sequence analysis of diTAGs could begin as soon as they are generated, thereby completely bypassing the need for the concatemerization, sizing, downstream cloning steps and sequencing template purification. In addition, our protocol places any one of several unique four-base long nucleotide sequences, such as GATC, between each and every diTAG pair, which could

  13. The integrated microbial genomes (IMG) system in 2007: datacontent and analysis tool extensions

    SciTech Connect

    Markowitz, Victor M.; Szeto, Ernest; Palaniappan, Krishna; Grechkin, Yuri; Chu, Ken; Chen, I-Min A.; Dubchak, Inna; Anderson, Iain; Lykidis, Athanasios; Mavromatis, Konstantinos; Ivanova, Natalia N.; Kyrpides, Nikos C.

    2007-08-01

    The Integrated Microbial Genomes (IMG) system is a data management, analysis and annotation platform for all publicly available genomes. IMG contains both draft and complete JGI microbial genomes integrated with all other publicly available genomes from all three domains of life, together with a large number of plasmids and viruses. IMG provides tools and viewers for analyzing and annotating genomes, genes and functions, individually or in a comparative context. Since its first release in 2005, IMG's data content and analytical capabilities have been constantly expanded through quarterly releases. IMG is provided by the DOE-Joint Genome Institute (JGI) and is available from http://img.jgi.doe.gov.

  14. Construction of an integrated database to support genomic sequence analysis

    SciTech Connect

    Gilbert, W.; Overbeek, R.

    1994-11-01

    The central goal of this project is to develop an integrated database to support comparative analysis of genomes including DNA sequence data, protein sequence data, gene expression data and metabolism data. In developing the logic-based system GenoBase, a broader integration of available data was achieved due to assistance from collaborators. Current goals are to easily include new forms of data as they become available and to easily navigate through the ensemble of objects described within the database. This report comments on progress made in these areas.

  15. Orchestrating high-throughput genomic analysis with Bioconductor.

    PubMed

    Huber, Wolfgang; Carey, Vincent J; Gentleman, Robert; Anders, Simon; Carlson, Marc; Carvalho, Benilton S; Bravo, Hector Corrada; Davis, Sean; Gatto, Laurent; Girke, Thomas; Gottardo, Raphael; Hahne, Florian; Hansen, Kasper D; Irizarry, Rafael A; Lawrence, Michael; Love, Michael I; MacDonald, James; Obenchain, Valerie; Oleś, Andrzej K; Pagès, Hervé; Reyes, Alejandro; Shannon, Paul; Smyth, Gordon K; Tenenbaum, Dan; Waldron, Levi; Morgan, Martin

    2015-02-01

    Bioconductor is an open-source, open-development software project for the analysis and comprehension of high-throughput data in genomics and molecular biology. The project aims to enable interdisciplinary research, collaboration and rapid development of scientific software. Based on the statistical programming language R, Bioconductor comprises 934 interoperable packages contributed by a large, diverse community of scientists. Packages cover a range of bioinformatic and statistical applications. They undergo formal initial review and continuous automated testing. We present an overview for prospective users and contributors. PMID:25633503

  16. Orchestrating high-throughput genomic analysis with Bioconductor

    PubMed Central

    Huber, Wolfgang; Carey, Vincent J.; Gentleman, Robert; Anders, Simon; Carlson, Marc; Carvalho, Benilton S.; Bravo, Hector Corrada; Davis, Sean; Gatto, Laurent; Girke, Thomas; Gottardo, Raphael; Hahne, Florian; Hansen, Kasper D.; Irizarry, Rafael A.; Lawrence, Michael; Love, Michael I.; MacDonald, James; Obenchain, Valerie; Oleś, Andrzej K.; Pagès, Hervé; Reyes, Alejandro; Shannon, Paul; Smyth, Gordon K.; Tenenbaum, Dan; Waldron, Levi; Morgan, Martin

    2015-01-01

    Bioconductor is an open-source, open-development software project for the analysis and comprehension of high-throughput data in genomics and molecular biology. The project aims to enable interdisciplinary research, collaboration and rapid development of scientific software. Based on the statistical programming language R, Bioconductor comprises 934 interoperable packages contributed by a large, diverse community of scientists. Packages cover a range of bioinformatic and statistical applications. They undergo formal initial review and continuous automated testing. We present an overview for prospective users and contributors. PMID:25633503

  17. High resolution melting analysis for the differentiation of Mycobacterium species.

    PubMed

    Issa, Rahizan; Abdul, Hatijah; Hashim, Siti Hasmah; Seradja, Valentinus H; Shaili, Nurul 'Aishah; Hassan, Nurul Akma Mohd

    2014-10-01

    A quantitative real-time PCR (qPCR) followed by high resolution melting (HRM) analysis was developed for the differentiation of Mycobacterium species. Rapid differentiation of Mycobacterium species is necessary for the effective diagnosis and management of tuberculosis. In this study, the 16S rRNA gene was tested as the target since this has been identified as a suitable target for the identification of mycobacteria species. During the temperature gradient and primer optimization process, the melting peak (Tm) analysis was determined at a concentration of 50 ng DNA template and 0.3, 0.4 and 0.5 µM primer. The qPCR assay for the detection of other mycobacterial species was done at the Tm and primer concentration of 62 °C and 0.4 µM, respectively. The HRM analysis generated cluster patterns that were specific and sensitive to distinguished small sequence differences of the Mycobacterium species. This study suggests that the 16S rRNA-based real-time PCR followed by HRM analysis produced unique cluster patterns for species of Mycobacterium and could differentiate the closely related mycobacteria species. PMID:25038139

  18. Genomic analysis of primordial dwarfism reveals novel disease genes.

    PubMed

    Shaheen, Ranad; Faqeih, Eissa; Ansari, Shinu; Abdel-Salam, Ghada; Al-Hassnan, Zuhair N; Al-Shidi, Tarfa; Alomar, Rana; Sogaty, Sameera; Alkuraya, Fowzan S

    2014-02-01

    Primordial dwarfism (PD) is a disease in which severely impaired fetal growth persists throughout postnatal development and results in stunted adult size. The condition is highly heterogeneous clinically, but the use of certain phenotypic aspects such as head circumference and facial appearance has proven helpful in defining clinical subgroups. In this study, we present the results of clinical and genomic characterization of 16 new patients in whom a broad definition of PD was used (e.g., 3M syndrome was included). We report a novel PD syndrome with distinct facies in two unrelated patients, each with a different homozygous truncating mutation in CRIPT. Our analysis also reveals, in addition to mutations in known PD disease genes, the first instance of biallelic truncating BRCA2 mutation causing PD with normal bone marrow analysis. In addition, we have identified a novel locus for Seckel syndrome based on a consanguineous multiplex family and identified a homozygous truncating mutation in DNA2 as the likely cause. An additional novel PD disease candidate gene XRCC4 was identified by autozygome/exome analysis, and the knockout mouse phenotype is highly compatible with PD. Thus, we add a number of novel genes to the growing list of PD-linked genes, including one which we show to be linked to a novel PD syndrome with a distinct facial appearance. PD is extremely heterogeneous genetically and clinically, and genomic tools are often required to reach a molecular diagnosis. PMID:24389050

  19. Comparative analysis of genomic signal processing for microarray data clustering.

    PubMed

    Istepanian, Robert S H; Sungoor, Ala; Nebel, Jean-Christophe

    2011-12-01

    Genomic signal processing is a new area of research that combines advanced digital signal processing methodologies for enhanced genetic data analysis. It has many promising applications in bioinformatics and next generation of healthcare systems, in particular, in the field of microarray data clustering. In this paper we present a comparative performance analysis of enhanced digital spectral analysis methods for robust clustering of gene expression across multiple microarray data samples. Three digital signal processing methods: linear predictive coding, wavelet decomposition, and fractal dimension are studied to provide a comparative evaluation of the clustering performance of these methods on several microarray datasets. The results of this study show that the fractal approach provides the best clustering accuracy compared to other digital signal processing and well known statistical methods. PMID:22157075

  20. Analysis of the impact of spatial resolution on land/water classifications using high-resolution aerial imagery

    USGS Publications Warehouse

    Enwright, Nicholas M.; Jones, William R.; Garber, Adrienne L.; Keller, Matthew J.

    2014-01-01

    Long-term monitoring efforts often use remote sensing to track trends in habitat or landscape conditions over time. To most appropriately compare observations over time, long-term monitoring efforts strive for consistency in methods. Thus, advances and changes in technology over time can present a challenge. For instance, modern camera technology has led to an increasing availability of very high-resolution imagery (i.e. submetre and metre) and a shift from analogue to digital photography. While numerous studies have shown that image resolution can impact the accuracy of classifications, most of these studies have focused on the impacts of comparing spatial resolution changes greater than 2 m. Thus, a knowledge gap exists on the impacts of minor changes in spatial resolution (i.e. submetre to about 1.5 m) in very high-resolution aerial imagery (i.e. 2 m resolution or less). This study compared the impact of spatial resolution on land/water classifications of an area dominated by coastal marsh vegetation in Louisiana, USA, using 1:12,000 scale colour-infrared analogue aerial photography (AAP) scanned at four different dot-per-inch resolutions simulating ground sample distances (GSDs) of 0.33, 0.54, 1, and 2 m. Analysis of the impact of spatial resolution on land/water classifications was conducted by exploring various spatial aspects of the classifications including density of waterbodies and frequency distributions in waterbody sizes. This study found that a small-magnitude change (1–1.5 m) in spatial resolution had little to no impact on the amount of water classified (i.e. percentage mapped was less than 1.5%), but had a significant impact on the mapping of very small waterbodies (i.e. waterbodies ≤ 250 m2). These findings should interest those using temporal image classifications derived from very high-resolution aerial photography as a component of long-term monitoring programs.

  1. Microbial Genome Analysis and Comparisons: Web-based Protocols and Resources

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Fully annotated genome sequences of many microorganisms are publicly available as a resource. However, in-depth analysis of these genomes using specialized tools is required to derive meaningful information. We describe here the utility of three powerful publicly available genome databases and ana...

  2. IMG 4 version of the integrated microbial genomes comparative analysis system.

    PubMed

    Markowitz, Victor M; Chen, I-Min A; Palaniappan, Krishna; Chu, Ken; Szeto, Ernest; Pillay, Manoj; Ratner, Anna; Huang, Jinghua; Woyke, Tanja; Huntemann, Marcel; Anderson, Iain; Billis, Konstantinos; Varghese, Neha; Mavromatis, Konstantinos; Pati, Amrita; Ivanova, Natalia N; Kyrpides, Nikos C

    2014-01-01

    The Integrated Microbial Genomes (IMG) data warehouse integrates genomes from all three domains of life, as well as plasmids, viruses and genome fragments. IMG provides tools for analyzing and reviewing the structural and functional annotations of genomes in a comparative context. IMG's data content and analytical capabilities have increased continuously since its first version released in 2005. Since the last report published in the 2012 NAR Database Issue, IMG's annotation and data integration pipelines have evolved while new tools have been added for recording and analyzing single cell genomes, RNA Seq and biosynthetic cluster data. Different IMG datamarts provide support for the analysis of publicly available genomes (IMG/W: http://img.jgi.doe.gov/w), expert review of genome annotations (IMG/ER: http://img.jgi.doe.gov/er) and teaching and training in the area of microbial genome analysis (IMG/EDU: http://img.jgi.doe.gov/edu). PMID:24165883

  3. IMG 4 version of the integrated microbial genomes comparative analysis system

    SciTech Connect

    Markowitz, Victor M.; Chen, I-Min A.; Palaniappan, Krishna; Chu, Ken; Szeto, Ernest; Pillay, Manoj; Ratner, Anna; Huang, Jinghua; Woyke, Tanja; Huntemann, Marcel; Anderson, Iain; Billis, Konstantinos; Varghese, Neha; Mavromatis, Konstantinos; Pati, Amrita; Ivanova, Natalia N.; Kyrpides, Nikos C.

    2013-10-27

    The Integrated Microbial Genomes (IMG) data warehouse integrates genomes from all three domains of life, as well as plasmids, viruses and genome fragments. IMG provides tools for analyzing and reviewing the structural and functional annotations of genomes in a comparative context. IMG’s data content and analytical capabilities have increased continuously since its first version released in 2005. Since the last report published in the 2012 NAR Database Issue, IMG’s annotation and data integration pipelines have evolved while new tools have been added for recording and analyzing single cell genomes, RNA Seq and biosynthetic cluster data. Finally, different IMG datamarts provide support for the analysis of publicly available genomes (IMG/W: http://img.jgi.doe.gov/w), expert review of genome annotations (IMG/ER: http://img.jgi.doe.gov/er) and teaching and training in the area of microbial genome analysis (IMG/EDU: http://img.jgi.doe.gov/edu).

  4. Analysis of Automated Aircraft Conflict Resolution and Weather Avoidance

    NASA Technical Reports Server (NTRS)

    Love, John F.; Chan, William N.; Lee, Chu Han

    2009-01-01

    This paper describes an analysis of using trajectory-based automation to resolve both aircraft and weather constraints for near-term air traffic management decision making. The auto resolution algorithm developed and tested at NASA-Ames to resolve aircraft to aircraft conflicts has been modified to mitigate convective weather constraints. Modifications include adding information about the size of a gap between weather constraints to the routing solution. Routes that traverse gaps that are smaller than a specific size are not used. An evaluation of the performance of the modified autoresolver to resolve both conflicts with aircraft and weather was performed. Integration with the Center-TRACON Traffic Management System was completed to evaluate the effect of weather routing on schedule delays.

  5. Intact MicroRNA Analysis Using High Resolution Mass Spectrometry

    NASA Astrophysics Data System (ADS)

    Kullolli, Majlinda; Knouf, Emily; Arampatzidou, Maria; Tewari, Muneesh; Pitteri, Sharon J.

    2014-01-01

    MicroRNAs (miRNAs) are small single-stranded non-coding RNAs that post-transcriptionally regulate gene expression, and play key roles in the regulation of a variety of cellular processes and in disease. New tools to analyze miRNAs will add understanding of the physiological origins and biological functions of this class of molecules. In this study, we investigate the utility of high resolution mass spectrometry for the analysis of miRNAs through proof-of-concept experiments. We demonstrate the ability of mass spectrometry to resolve and separate miRNAs and corresponding 3' variants in mixtures. The mass accuracy of the monoisotopic deprotonated peaks from various miRNAs is in the low ppm range. We compare fragmentation of miRNA by collision-induced dissociation (CID) and by higher-energy collisional dissociation (HCD) which yields similar sequence coverage from both methods but additional fragmentation by HCD versus CID. We measure the linear dynamic range, limit of detection, and limit of quantitation of miRNA loaded onto a C18 column. Lastly, we explore the use of data-dependent acquisition of MS/MS spectra of miRNA during online LC-MS and demonstrate that multiple charge states can be fragmented, yielding nearly full sequence coverage of miRNA on a chromatographic time scale. We conclude that high resolution mass spectrometry allows the separation and measurement of miRNAs in mixtures and a standard LC-MS setup can be adapted for online analysis of these molecules.

  6. High Resolution Methylome Analysis Reveals Widespread Functional Hypomethylation during Adult Human Erythropoiesis*

    PubMed Central

    Yu, Yiting; Mo, Yongkai; Ebenezer, David; Bhattacharyya, Sanchari; Liu, Hui; Sundaravel, Sriram; Giricz, Orsolya; Wontakal, Sandeep; Cartier, Jessy; Caces, Bennett; Artz, Andrew; Nischal, Sangeeta; Bhagat, Tushar; Bathon, Kathleen; Maqbool, Shahina; Gligich, Oleg; Suzuki, Masako; Steidl, Ulrich; Godley, Lucy; Skoultchi, Art; Greally, John; Wickrema, Amittha; Verma, Amit

    2013-01-01

    Differentiation of hematopoietic stem cells to red cells requires coordinated expression of numerous erythroid genes and is characterized by nuclear condensation and extrusion during terminal development. To understand the regulatory mechanisms governing these widespread phenotypic changes, we conducted a high resolution methylomic and transcriptomic analysis of six major stages of human erythroid differentiation. We observed widespread epigenetic differences between early and late stages of erythropoiesis with progressive loss of methylation being the dominant change during differentiation. Gene bodies, intergenic regions, and CpG shores were preferentially demethylated during erythropoiesis. Epigenetic changes at transcription factor binding sites correlated significantly with changes in gene expression and were enriched for binding motifs for SCL, MYB, GATA, and other factors not previously implicated in erythropoiesis. Demethylation at gene promoters was associated with increased expression of genes, whereas epigenetic changes at gene bodies correlated inversely with gene expression. Important gene networks encoding erythrocyte membrane proteins, surface receptors, and heme synthesis proteins were found to be regulated by DNA methylation. Furthermore, integrative analysis enabled us to identify novel, potential regulatory areas of the genome as evident by epigenetic changes in a predicted PU.1 binding site in intron 1 of the GATA1 gene. This intronic site was found to be conserved across species and was validated to be a novel PU.1 binding site by quantitative ChIP in erythroid cells. Altogether, our study provides a comprehensive analysis of methylomic and transcriptomic changes during erythroid differentiation and demonstrates that human terminal erythropoiesis is surprisingly associated with hypomethylation of the genome. PMID:23306203

  7. High resolution methylome analysis reveals widespread functional hypomethylation during adult human erythropoiesis.

    PubMed

    Yu, Yiting; Mo, Yongkai; Ebenezer, David; Bhattacharyya, Sanchari; Liu, Hui; Sundaravel, Sriram; Giricz, Orsolya; Wontakal, Sandeep; Cartier, Jessy; Caces, Bennett; Artz, Andrew; Nischal, Sangeeta; Bhagat, Tushar; Bathon, Kathleen; Maqbool, Shahina; Gligich, Oleg; Suzuki, Masako; Steidl, Ulrich; Godley, Lucy; Skoultchi, Art; Greally, John; Wickrema, Amittha; Verma, Amit

    2013-03-29

    Differentiation of hematopoietic stem cells to red cells requires coordinated expression of numerous erythroid genes and is characterized by nuclear condensation and extrusion during terminal development. To understand the regulatory mechanisms governing these widespread phenotypic changes, we conducted a high resolution methylomic and transcriptomic analysis of six major stages of human erythroid differentiation. We observed widespread epigenetic differences between early and late stages of erythropoiesis with progressive loss of methylation being the dominant change during differentiation. Gene bodies, intergenic regions, and CpG shores were preferentially demethylated during erythropoiesis. Epigenetic changes at transcription factor binding sites correlated significantly with changes in gene expression and were enriched for binding motifs for SCL, MYB, GATA, and other factors not previously implicated in erythropoiesis. Demethylation at gene promoters was associated with increased expression of genes, whereas epigenetic changes at gene bodies correlated inversely with gene expression. Important gene networks encoding erythrocyte membrane proteins, surface receptors, and heme synthesis proteins were found to be regulated by DNA methylation. Furthermore, integrative analysis enabled us to identify novel, potential regulatory areas of the genome as evident by epigenetic changes in a predicted PU.1 binding site in intron 1 of the GATA1 gene. This intronic site was found to be conserved across species and was validated to be a novel PU.1 binding site by quantitative ChIP in erythroid cells. Altogether, our study provides a comprehensive analysis of methylomic and transcriptomic changes during erythroid differentiation and demonstrates that human terminal erythropoiesis is surprisingly associated with hypomethylation of the genome. PMID:23306203

  8. High-resolution genetic map for understanding the effect of genome-wide recombination rate, selection sweep and linkage disequilibrium on nucleotide diversity in watermelon

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genotyping by sequencing (GBS) technology was used to identify a set of 9,933 single nucleotide polymorphism (SNP) markers for constructing a high-resolution genetic map of 1,087 cM for watermelon. The genome-wide variation of recombination rate (GWRR) across the map was evaluated and a positive co...

  9. Implementation of High Resolution Whole Genome Array CGH in the Prenatal Clinical Setting: Advantages, Challenges, and Review of the Literature

    PubMed Central

    Evangelidou, Paola; Alexandrou, Angelos; Moutafi, Maria; Ioannides, Marios; Antoniou, Pavlos; Koumbaris, George; Kallikas, Ioannis; Velissariou, Voula; Sismani, Carolina; Patsalis, Philippos C.

    2013-01-01

    Array Comparative Genomic Hybridization analysis is replacing postnatal chromosomal analysis in cases of intellectual disabilities, and it has been postulated that it might also become the first-tier test in prenatal diagnosis. In this study, array CGH was applied in 64 prenatal samples with whole genome oligonucleotide arrays (BlueGnome, Ltd.) on DNA extracted from chorionic villi, amniotic fluid, foetal blood, and skin samples. Results were confirmed with Fluorescence In Situ Hybridization or Real-Time PCR. Fifty-three cases had normal karyotype and abnormal ultrasound findings, and seven samples had balanced rearrangements, five of which also had ultrasound findings. The value of array CGH in the characterization of previously known aberrations in five samples is also presented. Seventeen out of 64 samples carried copy number alterations giving a detection rate of 26.5%. Ten of these represent benign or variables of unknown significance, giving a diagnostic capacity of the method to be 10.9%. If karyotype is performed the additional diagnostic capacity of the method is 5.1% (3/59). This study indicates the ability of array CGH to identify chromosomal abnormalities which cannot be detected during routine prenatal cytogenetic analysis, therefore increasing the overall detection rate. In addition a thorough review of the literature is presented. PMID:23555083

  10. Genome-wide analysis of DNA methylation in hepatoblastoma tissues

    PubMed Central

    Cui, Ximao; Liu, Baihui; Zheng, Shan; Dong, Kuiran; Dong, Rui

    2016-01-01

    DNA methylation has a crucial role in cancer biology. In the present study, a genome-wide analysis of DNA methylation in hepatoblastoma (HB) tissues was performed to verify differential methylation levels between HB and normal tissues. As alpha-fetoprotein (AFP) has a critical role in HB, AFP methylation levels were also detected using pyrosequencing. Normal and HB liver tissue samples (frozen tissue) were obtained from patients with HB. Genome-wide analysis of DNA methylation in these tissues was performed using an Infinium HumanMethylation450 BeadChip, and the results were confirmed with reverse transcription-quantitative polymerase chain reaction. The Infinium HumanMethylation450 BeadChip demonstrated distinctively less methylation in HB tissues than in non-tumor tissues. In addition, methylation enrichment was observed in positions near the transcription start site of AFP, which exhibited lower methylation levels in HB tissues than in non-tumor liver tissues. Lastly, a significant negative correlation was observed between AFP messenger RNA expression and DNA methylation percentage, using linear Pearson's R correlation coefficients. The present results demonstrate differential methylation levels between HB and normal tissues, and imply that aberrant methylation of AFP in HB could reflect HB development. Expansion of these findings could provide useful insight into HB biology. PMID:27446465

  11. Monochromosomal hybrids for the analysis of the human genome

    SciTech Connect

    Athwal, R.S.

    1992-01-01

    We have already produced monochromosomal hybrids for 2/3 of the human genome and we have generated sufficient biological materials to complete the proposed panels of hybrid cell lines. We have developed experimental procedures to identify marked chromosomes in human cell lines prior to their transfer to rodent cells. This would eliminate redundancy in the production of monochromosomal hybrids and therefore help expedite completion of the hybrid cell panels. We have also developed a highly sensitive method to identify human chromosomes in hybrid cells. Monochromosomal hybrids produced in our lab are used in a number of laboratories for experiments on gene mapping, gene isolation, chromosome fractionation and genetic analysis for complementation of cellular phenotypes such as DNA repair and regulation of cell growth. Monochromosomal hybrids cell lines are freely available to scientific community for experiments on gene mapping and analysis of the human genome. We are preparing large quantities of DNA from each hybrid cell line which will be available to the research community for various experiments.

  12. Advances in genome-wide DNA methylation analysis

    PubMed Central

    Gupta, Romi; Nagarajan, Arvindhan; Wajapeyee, Narendra

    2013-01-01

    The covalent DNA modification of cytosine at position 5 (5-methylcytosine; 5mC) has emerged as an important epigenetic mark most commonly present in the context of CpG dinucleotides in mammalian cells. In pluripotent stem cells and plants, it is also found in non-CpG and CpNpG contexts, respectively. 5mC has important implications in a diverse set of biological processes, including transcriptional regulation. Aberrant DNA methylation has been shown to be associated with a wide variety of human ailments and thus is the focus of active investigation. Methods used for detecting DNA methylation have revolutionized our understanding of this epigenetic mark and provided new insights into its role in diverse biological functions. Here we describe recent technological advances in genome-wide DNA methylation analysis and discuss their relative utility and drawbacks, providing specific examples from studies that have used these technologies for genome-wide DNA methylation analysis to address important biological questions. Finally, we discuss a newly identified covalent DNA modification, 5-hydroxymethylcytosine (5hmC), and speculate on its possible biological function, as well as describe a new methodology that can distinguish 5hmC from 5mC. PMID:20964631

  13. Flux Coupling Analysis of Genome-Scale Metabolic Network Reconstructions

    PubMed Central

    Burgard, Anthony P.; Nikolaev, Evgeni V.; Schilling, Christophe H.; Maranas, Costas D.

    2004-01-01

    In this paper, we introduce the Flux Coupling Finder (FCF) framework for elucidating the topological and flux connectivity features of genome-scale metabolic networks. The framework is demonstrated on genome-scale metabolic reconstructions of Helicobacter pylori, Escherichia coli, and Saccharomyces cerevisiae. The analysis allows one to determine whether any two metabolic fluxes, v1 and v2, are (1) directionally coupled, if a non-zero flux for v1 implies a non-zero flux for v2 but not necessarily the reverse; (2) partially coupled, if a non-zero flux for v1 implies a non-zero, though variable, flux for v2 and vice versa; or (3) fully coupled, if a non-zero flux for v1 implies not only a non-zero but also a fixed flux for v2 and vice versa. Flux coupling analysis also enables the global identification of blocked reactions, which are all reactions incapable of carrying flux under a certain condition; equivalent knockouts, defined as the set of all possible reactions whose deletion forces the flux through a particular reaction to zero; and sets of affected reactions denoting all reactions whose fluxes are forced to zero if a particular reaction is deleted. The FCF approach thus provides a novel and versatile tool for aiding metabolic reconstructions and guiding genetic manipulations. PMID:14718379

  14. Genomic Analysis of Natural Variation for Seed and Plant Size in Maize ( JGI Seventh Annual User Meeting 2012: Genomics of Energy and Environment)

    SciTech Connect

    Kaeppler, Shawn

    2012-03-21

    Shawn Kaeppler from the University of Wisconsin-Madison on "Genomic Analysis of Biofuel Traits in Maize and Switchgrass" at the 7th Annual Genomics of Energy & Environment Meeting on March 21, 2012 in Walnut Creek, Calif

  15. Genomic Analysis of Natural Variation for Seed and Plant Size in Maize ( JGI Seventh Annual User Meeting 2012: Genomics of Energy and Environment)

    ScienceCinema

    Kaeppler, Shawn [University of Wisconsin, Madison

    2013-01-15

    Shawn Kaeppler from the University of Wisconsin-Madison on "Genomic Analysis of Biofuel Traits in Maize and Switchgrass" at the 7th Annual Genomics of Energy & Environment Meeting on March 21, 2012 in Walnut Creek, Calif

  16. Whole-Genome Sequence Analysis and Genome-Wide Virulence Gene Identification of Riemerella anatipestifer Strain Yb2

    PubMed Central

    Wang, Xiaolan; Ding, Chan; Wang, Shaohui; Han, Xiangan

    2015-01-01

    Riemerella anatipestifer is a well-described pathogen of waterfowl and other avian species that can cause septicemic and exudative diseases. In this study, we sequenced the complete genome of R. anatipestifer strain Yb2 and analyzed it against the published genomic sequences of R. anatipestifer strains DSM15868, RA-GD, RA-CH-1, and RA-CH-2. The Yb2 genome contains one circular chromosome of 2,184,066 bp with a 35.73% GC content and no plasmid. The genome has 2,021 open reading frames that occupy 90.88% of the genome. A comparative genomic analysis revealed that genome organization is highly conserved among R. anatipestifer strains, except for four inversions of a sequence segment in Yb2. A phylogenetic analysis found that the closest neighbor of Yb2 is RA-GD. Furthermore, we constructed a library of 3,175 mutants by random transposon mutagenesis, and 100 mutants exhibiting more than 100-fold-attenuated virulence were obtained by animal screening experiments. Southern blot analysis and genetic characterization of the mutants led to the identification of 49 virulence genes. Of these, 25 encode cytoplasmic proteins, 6 encode cytoplasmic membrane proteins, 4 encode outer membrane proteins, and the subcellular localization of the remaining 14 gene products is unknown. The functional classification of orthologous-group clusters revealed that 16 genes are associated with metabolism, 6 are associated with cellular processing and signaling, and 4 are associated with information storage and processing. The functions of the other 23 genes are poorly characterized or unknown. This genome-wide study identified genes important to the virulence of R. anatipestifer. PMID:26002892

  17. Comparative genomic analysis of novel Acinetobacter symbionts: A combined systems biology and genomics approach.

    PubMed

    Gupta, Vipin; Haider, Shazia; Sood, Utkarsh; Gilbert, Jack A; Ramjee, Meenakshi; Forbes, Ken; Singh, Yogendra; Lopes, Bruno S; Lal, Rup

    2016-01-01

    The increasing trend of antibiotic resistance in Acinetobacter drastically limits the range of therapeutic agents required to treat multidrug resistant (MDR) infections. This study focused on analysis of novel Acinetobacter strains using a genomics and systems biology approach. Here we used a network theory method for pathogenic and non-pathogenic Acinetobacter spp. to identify the key regulatory proteins (hubs) in each strain. We identified nine key regulatory proteins, guaA, guaB, rpsB, rpsI, rpsL, rpsE, rpsC, rplM and trmD, which have functional roles as hubs in a hierarchical scale-free fractal protein-protein interaction network. Two key hubs (guaA and guaB) were important for insect-associated strains, and comparative analysis identified guaA as more important than guaB due to its role in effective module regulation. rpsI played a significant role in all the novel strains, while rplM was unique to sheep-associated strains. rpsM, rpsB and rpsI were involved in the regulation of overall network topology across all Acinetobacter strains analyzed in this study. Future analysis will investigate whether these hubs are useful as drug targets for treating Acinetobacter infections. PMID:27378055

  18. Comparative genomic analysis of novel Acinetobacter symbionts: A combined systems biology and genomics approach

    PubMed Central

    Gupta, Vipin; Haider, Shazia; Sood, Utkarsh; Gilbert, Jack A.; Ramjee, Meenakshi; Forbes, Ken; Singh, Yogendra; Lopes, Bruno S.; Lal, Rup

    2016-01-01

    The increasing trend of antibiotic resistance in Acinetobacter drastically limits the range of therapeutic agents required to treat multidrug resistant (MDR) infections. This study focused on analysis of novel Acinetobacter strains using a genomics and systems biology approach. Here we used a network theory method for pathogenic and non-pathogenic Acinetobacter spp. to identify the key regulatory proteins (hubs) in each strain. We identified nine key regulatory proteins, guaA, guaB, rpsB, rpsI, rpsL, rpsE, rpsC, rplM and trmD, which have functional roles as hubs in a hierarchical scale-free fractal protein-protein interaction network. Two key hubs (guaA and guaB) were important for insect-associated strains, and comparative analysis identified guaA as more important than guaB due to its role in effective module regulation. rpsI played a significant role in all the novel strains, while rplM was unique to sheep-associated strains. rpsM, rpsB and rpsI were involved in the regulation of overall network topology across all Acinetobacter strains analyzed in this study. Future analysis will investigate whether these hubs are useful as drug targets for treating Acinetobacter infections. PMID:27378055

  19. High-resolution physical mapping of a 250-kb region of human chromosome 11q24 by genomic sequence sampling (GSS)

    SciTech Connect

    Selleri, L.; Smith, M.W.; Holmsen, A.L.

    1995-04-10

    A physical map of the region of human chromosome 11q24 containing the FLI1 gene, disrupted by the t(11;22) translocation in Ewing sarcoma and primitive neuroectodermal tumors, was analyzed by genomic sequence sampling. Using a 4- to 5-fold coverage chromosome 11-specific library, 22 region-specific cosmid clones were identified by phenol emulsion reassociation hybridization, with a 245-kb yeast artificial chromosome clone containing the FLI1 gene, and by directed {open_quotes}walking{close_quotes} techniques. Cosmid contigs were constructed by individual clone fingerprinting using restriction enzyme digestion and assembly with the Genome Reconstruction and AsseMbly (GRAM) computer algorithm. The relative orientation and spacing of cosmid contigs with respect to the chromosome were determined by the structural analysis of cosmid clones and by direct visual in situ hybridization mapping. Each cosmid clone in the contig was subjected to {open_quotes}one-pass{close_quotes} end sequencing, and the resulting ordered sequence fragments represent {approximately}5% of the complete DNA sequence, making the entire region accessible by PCR amplification. The sequence samples were analyzed for putative exons, repetitive DNAs, and simple sequence repeats using a variety of computer algorithms. Based upon the computer predictions, Southern and Northern blot experiments led to the independent identification and localization of the FLI1 gene as well as a previously unknown gene located in this region of chromosome 11q24. This approach to high-resolution physical analysis of human chromosomes allows the assembly of detailed sequence-based maps. 62 refs., 7 figs.

  20. Whole genomic DNA sequencing and comparative genomic analysis of Arthrospira platensis: high genome plasticity and genetic diversity

    PubMed Central

    Xu, Teng; Qin, Song; Hu, Yongwu; Song, Zhijian; Ying, Jianchao; Li, Peizhen; Dong, Wei; Zhao, Fangqing; Yang, Huanming; Bao, Qiyu

    2016-01-01

    Arthrospira platensis is a multi-cellular and filamentous non-N2-fixing cyanobacterium that is capable of performing oxygenic photosynthesis. In this study, we determined the nearly complete genome sequence of A. platensis YZ. A. platensis YZ genome is a single, circular chromosome of 6.62 Mb in size. Phylogenetic and comparative genomic analyses revealed that A. platensis YZ was more closely related to A. platensis NIES-39 than Arthrospira sp. PCC 8005 and A. platensis C1. Broad gene gains were identified between A. platensis YZ and three other Arthrospira speices, some of which have been previously demonstrated that can be laterally transferred among different species, such as restriction-modification systems-coding genes. Moreover, unprecedented extensive chromosomal rearrangements among different strains were observed. The chromosomal rearrangements, particularly the chromosomal inversions, were analysed and estimated to be closely related to palindromes that involved long inverted repeat sequences and the extensively distributed type IIR restriction enzyme in the Arthrospira genome. In addition, species from genus Arthrospira unanimously contained the highest rate of repetitive sequence compared with the other species of order Oscillatoriales, suggested that sequence duplication significantly contributed to Arthrospira genome phylogeny. These results provided in-depth views into the genomic phylogeny and structural variation of A. platensis, as well as provide a valuable resource for functional genomics studies. PMID:27330141

  1. Whole genomic DNA sequencing and comparative genomic analysis of Arthrospira platensis: high genome plasticity and genetic diversity.

    PubMed

    Xu, Teng; Qin, Song; Hu, Yongwu; Song, Zhijian; Ying, Jianchao; Li, Peizhen; Dong, Wei; Zhao, Fangqing; Yang, Huanming; Bao, Qiyu

    2016-08-01

    Arthrospira platensis is a multi-cellular and filamentous non-N2-fixing cyanobacterium that is capable of performing oxygenic photosynthesis. In this study, we determined the nearly complete genome sequence of A. platensis YZ. A. platensis YZ genome is a single, circular chromosome of 6.62 Mb in size. Phylogenetic and comparative genomic analyses revealed that A. platensis YZ was more closely related to A. platensis NIES-39 than Arthrospira sp. PCC 8005 and A. platensis C1. Broad gene gains were identified between A. platensis YZ and three other Arthrospira speices, some of which have been previously demonstrated that can be laterally transferred among different species, such as restriction-modification systems-coding genes. Moreover, unprecedented extensive chromosomal rearrangements among different strains were observed. The chromosomal rearrangements, particularly the chromosomal inversions, were analysed and estimated to be closely related to palindromes that involved long inverted repeat sequences and the extensively distributed type IIR restriction enzyme in the Arthrospira genome. In addition, species from genus Arthrospira unanimously contained the highest rate of repetitive sequence compared with the other species of order Oscillatoriales, suggested that sequence duplication significantly contributed to Arthrospira genome phylogeny. These results provided in-depth views into the genomic phylogeny and structural variation of A. platensis, as well as provide a valuable resource for functional genomics studies. PMID:27330141

  2. Establishing a framework for comparative analysis of genome sequences

    SciTech Connect

    Bansal, A.K.

    1995-06-01

    This paper describes a framework and a high-level language toolkit for comparative analysis of genome sequence alignment The framework integrates the information derived from multiple sequence alignment and phylogenetic tree (hypothetical tree of evolution) to derive new properties about sequences. Multiple sequence alignments are treated as an abstract data type. Abstract operations have been described to manipulate a multiple sequence alignment and to derive mutation related information from a phylogenetic tree by superimposing parsimonious analysis. The framework has been applied on protein alignments to derive constrained columns (in a multiple sequence alignment) that exhibit evolutionary pressure to preserve a common property in a column despite mutation. A Prolog toolkit based on the framework has been implemented and demonstrated on alignments containing 3000 sequences and 3904 columns.

  3. A High-Resolution SNP Array-Based Linkage Map Anchors a New Domestic Cat Draft Genome Assembly and Provides Detailed Patterns of Recombination.

    PubMed

    Li, Gang; Hillier, LaDeana W; Grahn, Robert A; Zimin, Aleksey V; David, Victor A; Menotti-Raymond, Marilyn; Middleton, Rondo; Hannah, Steven; Hendrickson, Sher; Makunin, Alex; O'Brien, Stephen J; Minx, Pat; Wilson, Richard K; Lyons, Leslie A; Warren, Wesley C; Murphy, William J

    2016-01-01

    High-resolution genetic and physical maps are invaluable tools for building accurate genome assemblies, and interpreting results of genome-wide association studies (GWAS). Previous genetic and physical maps anchored good quality draft assemblies of the domestic cat genome, enabling the discovery of numerous genes underlying hereditary disease and phenotypes of interest to the biomedical science and breeding communities. However, these maps lacked sufficient marker density to order thousands of shorter scaffolds in earlier assemblies, which instead relied heavily on comparative mapping with related species. A high-resolution map would aid in validating and ordering chromosome scaffolds from existing and new genome assemblies. Here, we describe a high-resolution genetic linkage map of the domestic cat genome based on genotyping 453 domestic cats from several multi-generational pedigrees on the Illumina 63K SNP array. The final maps include 58,055 SNP markers placed relative to 6637 markers with unique positions, distributed across all autosomes and the X chromosome. Our final sex-averaged maps span a total autosomal length of 4464 cM, the longest described linkage map for any mammal, confirming length estimates from a previous microsatellite-based map. The linkage map was used to order and orient the scaffolds from a substantially more contiguous domestic cat genome assembly (Felis catus v8.0), which incorporated ∼20 × coverage of Illumina fragment reads. The new genome assembly shows substantial improvements in contiguity, with a nearly fourfold increase in N50 scaffold size to 18 Mb. We use this map to report probable structural errors in previous maps and assemblies, and to describe features of the recombination landscape, including a massive (∼50 Mb) recombination desert (of virtually zero recombination) on the X chromosome that parallels a similar desert on the porcine X chromosome in both size and physical location. PMID:27172201

  4. A High-Resolution SNP Array-Based Linkage Map Anchors a New Domestic Cat Draft Genome Assembly and Provides Detailed Patterns of Recombination

    PubMed Central

    Li, Gang; Hillier, LaDeana W.; Grahn, Robert A.; Zimin, Aleksey V.; David, Victor A.; Menotti-Raymond, Marilyn; Middleton, Rondo; Hannah, Steven; Hendrickson, Sher; Makunin, Alex; O’Brien, Stephen J.; Minx, Pat; Wilson, Richard K.; Lyons, Leslie A.; Warren, Wesley C.; Murphy, William J.

    2016-01-01

    High-resolution genetic and physical maps are invaluable tools for building accurate genome assemblies, and interpreting results of genome-wide association studies (GWAS). Previous genetic and physical maps anchored good quality draft assemblies of the domestic cat genome, enabling the discovery of numerous genes underlying hereditary disease and phenotypes of interest to the biomedical science and breeding communities. However, these maps lacked sufficient marker density to order thousands of shorter scaffolds in earlier assemblies, which instead relied heavily on comparative mapping with related species. A high-resolution map would aid in validating and ordering chromosome scaffolds from existing and new genome assemblies. Here, we describe a high-resolution genetic linkage map of the domestic cat genome based on genotyping 453 domestic cats from several multi-generational pedigrees on the Illumina 63K SNP array. The final maps include 58,055 SNP markers placed relative to 6637 markers with unique positions, distributed across all autosomes and the X chromosome. Our final sex-averaged maps span a total autosomal length of 4464 cM, the longest described linkage map for any mammal, confirming length estimates from a previous microsatellite-based map. The linkage map was used to order and orient the scaffolds from a substantially more contiguous domestic cat genome assembly (Felis catus v8.0), which incorporated ∼20 × coverage of Illumina fragment reads. The new genome assembly shows substantial improvements in contiguity, with a nearly fourfold increase in N50 scaffold size to 18 Mb. We use this map to report probable structural errors in previous maps and assemblies, and to describe features of the recombination landscape, including a massive (∼50 Mb) recombination desert (of virtually zero recombination) on the X chromosome that parallels a similar desert on the porcine X chromosome in both size and physical location. PMID:27172201

  5. Device for high spatial resolution chemical analysis of a sample and method of high spatial resolution chemical analysis

    SciTech Connect

    Van Berkel, Gary J.

    2015-10-06

    A system and method for analyzing a chemical composition of a specimen are described. The system can include at least one pin; a sampling device configured to contact a liquid with a specimen on the at least one pin to form a testing solution; and a stepper mechanism configured to move the at least one pin and the sampling device relative to one another. The system can also include an analytical instrument for determining a chemical composition of the specimen from the testing solution. In particular, the systems and methods described herein enable chemical analysis of specimens, such as tissue, to be evaluated in a manner that the spatial-resolution is limited by the size of the pins used to obtain tissue samples, not the size of the sampling device used to solubilize the samples coupled to the pins.

  6. Chromosome microdissection and cloning in human genome and genetic disease analysis

    SciTech Connect

    Kao, Faten Eleanor Roosevelt Inst. for Cancer Research, Denver, CO ); Yu, Jingwei )

    1991-03-01

    A procedure has been described for microdissection and microcloning of human chromosomal DNA sequences in which universal amplification of the dissected fragments by Mbo I linker adaptor and polymerase chain reaction is used. A very large library comprising 700,000 recombinant plasmid microclones from 30 dissected chromosomes of human chromosome 21 was constructed. Colony hybridization showed that 42% of the clones contained repetitive sequences and 58% contained single or low-copy sequences. The insert sizes generated by complete Mbo I cleavage ranged from 50 to 1,100 base pairs with a mean of 416 base pairs. Southern blot analysis of microclones from the library confirmed their human origin and chromosome 21 specificity. Some of these clones have also been regionally mapped to specific sites of chromosome 21 by using a regional mapping panel of cell hybrids. This chromosome microtechnology can generate large numbers of microclones with unique sequences from defined chromosomal regions and can be used for processes such as (i) isolating corresponding yeast artificial chromosome clones with large inserts, (ii) screening various cDNA libraries for isolating expressed sequences, and (iii) constructing region-specific libraries of the entire human genome. The studies described here demonstrate the power of this technology for high-resolution genome analysis and explicate their use in an efficient search for disease-associated genes localized to specific chromosomal regions.

  7. Combined analysis of genome-wide expression and copy number profiles to identify key altered genomic regions in cancer

    PubMed Central

    2012-01-01

    Background Analysis of DNA copy number alterations and gene expression changes in human samples have been used to find potential target genes in complex diseases. Recent studies have combined these two types of data using different strategies, but focusing on finding gene-based relationships. However, it has been proposed that these data can be used to identify key genomic regions, which may enclose causal genes under the assumption that disease-associated gene expression changes are caused by genomic alterations. Results Following this proposal, we undertake a new integrative analysis of genome-wide expression and copy number datasets. The analysis is based on the combined location of both types of signals along the genome. Our approach takes into account the genomic location in the copy number (CN) analysis and also in the gene expression (GE) analysis. To achieve this we apply a segmentation algorithm to both types of data using paired samples. Then, we perform a correlation analysis and a frequency analysis of the gene loci in the segmented CN regions and the segmented GE regions; selecting in both cases the statistically significant loci. In this way, we find CN alterations that show strong correspondence with GE changes. We applied our method to a human dataset of 64 Glioblastoma Multiforme samples finding key loci and hotspots that correspond to major alterations previously described for this type of tumors. Conclusions Identification of key altered genomic loci constitutes a first step to find the genes that drive the alteration in a malignant state. These driver genes can be found in regions that show high correlation in copy number alterations and expression changes. PMID:23095915

  8. Microsporidian genome analysis reveals evolutionary strategies for obligate intracellular growth

    PubMed Central

    Cuomo, Christina A.; Desjardins, Christopher A.; Bakowski, Malina A.; Goldberg, Jonathan; Ma, Amy T.; Becnel, James J.; Didier, Elizabeth S.; Fan, Lin; Heiman, David I.; Levin, Joshua Z.; Young, Sarah; Zeng, Qiandong; Troemel, Emily R.

    2012-01-01

    Microsporidia comprise a large phylum of obligate intracellular eukaryotes that are fungal-related parasites responsible for widespread disease, and here we address questions about microsporidia biology and evolution. We sequenced three microsporidian genomes from two species, Nematocida parisii and Nematocida sp1, which are natural pathogens of Caenorhabditis nematodes and provide model systems for studying microsporidian pathogenesis. We performed deep sequencing of transcripts from a time course of N. parisii infection. Examination of pathogen gene expression revealed compact transcripts and a dramatic takeover of host cells by Nematocida. We also performed phylogenomic analyses of Nematocida and other microsporidian genomes to refine microsporidian phylogeny and identify evolutionary events of gene loss, acquisition, and modification. In particular, we found that all microsporidia lost the tumor-suppressor gene retinoblastoma, which we speculate could accelerate the parasite cell cycle and increase the mutation rate. We also found that microsporidia acquired transporters that could import nucleosides to fuel rapid growth. In addition, microsporidian hexokinases gained secretion signal sequences, and in a functional assay these were sufficient to export proteins out of the cell; thus hexokinase may be targeted into the host cell to reprogram it toward biosynthesis. Similar molecular changes appear during formation of cancer cells and may be evolutionary strategies adopted independently by microsporidia to proliferate rapidly within host cells. Finally, analysis of genome polymorphisms revealed evidence for a sexual cycle that may provide genetic diversity to alleviate problems caused by clonal growth. Together these events may explain the emergence and success of these diverse intracellular parasites. PMID:22813931

  9. Microsporidian genome analysis reveals evolutionary strategies for obligate intracellular growth.

    PubMed

    Cuomo, Christina A; Desjardins, Christopher A; Bakowski, Malina A; Goldberg, Jonathan; Ma, Amy T; Becnel, James J; Didier, Elizabeth S; Fan, Lin; Heiman, David I; Levin, Joshua Z; Young, Sarah; Zeng, Qiandong; Troemel, Emily R

    2012-12-01

    Microsporidia comprise a large phylum of obligate intracellular eukaryotes that are fungal-related parasites responsible for widespread disease, and here we address questions about microsporidia biology and evolution. We sequenced three microsporidian genomes from two species, Nematocida parisii and Nematocida sp1, which are natural pathogens of Caenorhabditis nematodes and provide model systems for studying microsporidian pathogenesis. We performed deep sequencing of transcripts from a time course of N. parisii infection. Examination of pathogen gene expression revealed compact transcripts and a dramatic takeover of host cells by Nematocida. We also performed phylogenomic analyses of Nematocida and other microsporidian genomes to refine microsporidian phylogeny and identify evolutionary events of gene loss, acquisition, and modification. In particular, we found that all microsporidia lost the tumor-suppressor gene retinoblastoma, which we speculate could accelerate the parasite cell cycle and increase the mutation rate. We also found that microsporidia acquired transporters that could import nucleosides to fuel rapid growth. In addition, microsporidian hexokinases gained secretion signal sequences, and in a functional assay these were sufficient to export proteins out of the cell; thus hexokinase may be targeted into the host cell to reprogram it toward biosynthesis. Similar molecular changes appear during formation of cancer cells and may be evolutionary strategies adopted independently by microsporidia to proliferate rapidly within host cells. Finally, analysis of genome polymorphisms revealed evidence for a sexual cycle that may provide genetic diversity to alleviate problems caused by clonal growth. Together these events may explain the emergence and success of these diverse intracellular parasites. PMID:22813931

  10. Identification of Uvaria sp by barcoding coupled with high-resolution melting analysis (Bar-HRM).

    PubMed

    Osathanunkul, M; Madesis, P; Ounjai, S; Pumiputavon, K; Somboonchai, R; Lithanatudom, P; Chaowasku, T; Wipasa, J; Suwannapoom, C

    2016-01-01

    DNA barcoding, which was developed about a decade ago, relies on short, standardized regions of the genome to identify plant and animal species. This method can be used to not only identify known species but also to discover novel ones. Numerous sequences are stored in online databases worldwide. One of the ways to save cost and time (by omitting the sequencing step) in species identification is to use available barcode data to design optimized primers for further analysis, such as high-resolution melting analysis (HRM). This study aimed to determine the effectiveness of the hybrid method Bar-HRM (DNA barcoding combined with HRM) to identify species that share similar external morphological features, rather than conduct traditional taxonomic identification that require major parts (leaf, flower, fruit) of the specimens. The specimens used for testing were those, which could not be identified at the species level and could either be Uvaria longipes or Uvaria wrayias, indicated by morphological identification. Primer pairs derived from chloroplast regions (matK, psbA-trnH, rbcL, and trnL) were used in the Bar-HRM. The results obtained from psbA-trnH primers were good enough to help in identifying the specimen while the rest were not. Bar-HRM analysis was proven to be a fast and cost-effective method for plant species identification. PMID:26909907

  11. Spatial resolution attainable in germanium detectors by pulse shape analysis

    SciTech Connect

    Blair, J., Bechtel, NV; Beckedahl, D.; Kammeraad, J.; Schmid, G., LLNL

    1998-05-01

    There are several applications for which it is desirable to calculate the locations and energies of individual gamma-ray interactions within a high purity germanium (HPGe) detector. These include gamma-ray imaging and Compton suppression. With a segmented detector this can be accomplished by analyzing the pulse shapes of the signals from the various segments. We examine the fundamental limits to the spatial resolution attainable with this approach. The primary source of error is the series noise of the field effect transistors (FETs) at the inputs of the charge amplifiers. We show how to calculate the noise spectral density at the output of the charge amplifiers due to an optimally selected FET. This calculation is based only on the detector capacitance and a noise constant for the FET technology. We show how to use this spectral density to calculate the uncertainties in parameters, such as interaction locations and energies, that are derived from pulse shape analysis using maximum likelihood estimation (MLE) applied to filtered and digitized recordings of the charge signals. Example calculations are given to illustrate our approach. Experimental results are given that demonstrate that one can construct complete systems, from detector through data analysis, that come near the theoretical limits.

  12. Genome evolution in diploid and tetraploid Coffea species as revealed by comparative analysis of orthologous genome segments.

    PubMed

    Cenci, Alberto; Combes, Marie-Christine; Lashermes, Philippe

    2012-01-01

    Sequence comparison of orthologous regions enables estimation of the divergence between genomes, analysis of their evolution and detection of particular features of the genomes, such as sequence rearrangements and transposable elements. Despite the economic importance of Coffea species, little genomic information is currently available. Coffea is a relatively young genus that includes more than one hundred diploid species and a single tetraploid species. Three Coffea orthologous regions of 470-900 kb were analyzed and compared: both subgenomes of allotetraploid Coffea arabica (contributed by the diploid species Coffea eugenioides and Coffea canephora) and the genome of diploid C. canephora. Sequence divergence was calculated on global alignments or on coding and non-coding sequences separately. A search for transposable elements detected 43 retrotransposons and 198 transposons in the sequences analyzed. Comparative insertion analysis made it possible to locate 165 TE insertions in the phylogenetic tree of the three genomes/subgenomes. In the tetraploid C. arabica, a homoeologous non-reciprocal transposition (HNRT) was detected and characterized: a 50 kb region of the C. eugenioides derived subgenome replaced the C. canephora derived counterpart. Comparative sequence analysis on three Coffea genomes/subgenomes revealed almost perfect gene synteny, low sequence divergence and a high number of shared transposable elements. Compared to the results of similar analysis in other genera (Aegilops/Triticum and Oryza), Coffea genomes/subgenomes appeared to be dramatically less diverged, which is consistent with the relatively recent radiation of the Coffea genus. Based on nucleotide substitution frequency, the HNRT was dated at 10,000-50,000 years BP, which is also the most recent estimation of the origin of C. arabica. PMID:22086332

  13. Complete mitochondrial genome of Cervus elaphus songaricus (Cetartiodactyla: Cervinae) and a phylogenetic analysis with related species.

    PubMed

    Li, Yiqing; Ba, Hengxing; Yang, Fuhe

    2016-01-01

    Complete mitochondrial genome of Tianshan wapiti, Cervus elaphus songaricus, is 16,419 bp in length and contains 13 protein-coding genes, 2 rRNA genes, 22 tRNA genes and 1 control region. The phylogenetic trees were reconstructed with the concatenated nucleotide sequences of the 13 protein-coding genes using maximum parsimony (MP) and Bayesian inference (BI) methods. MP and BI phylogenetic trees here showed an identical tree topology. The monopoly of red deer, wapiti and sika deer was well supported, and wapiti was found to share a closer relationship with sika deer. Tianshan wapiti shared a closer relationship with xanthopygus than yarkandensis. Rusa unicolor and Rucervus eldi were given a basal phylogenetic position. Our phylogenetic analysis provided a robust phylogenetic resolution spanning the entire evolutionary relationship of the subfamily Cervinae. PMID:24725059

  14. Population genomic analysis reveals highly conserved mitochondrial genomes in the yeast species Lachancea thermotolerans.

    PubMed

    Freel, Kelle C; Friedrich, Anne; Hou, Jing; Schacherer, Joseph

    2014-10-01

    The increasing availability of mitochondrial (mt) sequence data from various yeasts provides a tool to study genomic evolution within and between different species. While the genomes from a range of lineages are available, there is a lack of information concerning intraspecific mtDNA diversity. Here, we analyzed the mt genomes of 50 strains from Lachancea thermotolerans, a protoploid yeast species that has been isolated from several locations (Europe, Asia, Australia, South Africa, and North / South America) and ecological sources (fruit, tree exudate, plant material, and grape and agave fermentations). Protein-coding genes from the mtDNA were used to construct a phylogeny, which reflected a similar, yet less resolved topology than the phylogenetic tree of 50 nuclear genes. In comparison to its sister species Lachancea kluyveri, L. thermotolerans has a smaller mt genome. This is due to shorter intergenic regions and fewer introns, of which the latter are only found in COX1. We revealed that L. kluyveri and L. thermotolerans share similar levels of intraspecific divergence concerning the nuclear genomes. However, L. thermotolerans has a more highly conserved mt genome with the coding regions characterized by low rates of nonsynonymous substitution. Thus, in the mt genomes of L. thermotolerans, stronger purifying selection and lower mutation rates potentially shape genome diversity in contract to what was found for L. kluyveri, demonstrating that the factors driving mt genome evolution are different even between closely related species. PMID:25212859

  15. Breeding nursery tissue collection for possible genomic analysis

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Phenotyping is considered a major bottleneck in breeding programs. With new genomic technologies, high throughput genotype schemes are constantly being developed. However, every genomic technology requires phenotypic data to inform prediction models generated from the technology. Forage breeders con...

  16. Spliceosomal introns as tools for genomic and evolutionary analysis

    PubMed Central

    Irimia, Manuel; Roy, Scott William

    2008-01-01

    Over the past 5 years, the availability of dozens of whole genomic sequences from a wide variety of eukaryotic lineages has revealed a very large amount of information about the dynamics of intron loss and gain through eukaryotic history, as well as the evolution of intron sequences. Implicit in these advances is a great deal of information about the structure and evolution of surrounding sequences. Here, we review the wealth of ways in which structures of spliceosomal introns as well as their conservation and change through evolution may be harnessed for evolutionary and genomic analysis. First, we discuss uses of intron length distributions and positions in sequence assembly and annotation, and for improving alignment of homologous regions. Second, we review uses of introns in evolutionary studies, including the utility of introns as indicators of rates of sequence evolution, for inferences about molecular evolution, as signatures of orthology and paralogy, and for estimating rates of nucleotide substitution. We conclude with a discussion of phylogenetic methods utilizing intron sequences and positions. PMID:18263615

  17. Delineation of Steroid-Degrading Microorganisms through Comparative Genomic Analysis

    PubMed Central

    Bergstrand, Lee H.; Cardenas, Erick; Holert, Johannes; Van Hamme, Jonathan D.

    2016-01-01

    ABSTRACT Steroids are ubiquitous in natural environments and are a significant growth substrate for microorganisms. Microbial steroid metabolism is also important for some pathogens and for biotechnical applications. This study delineated the distribution of aerobic steroid catabolism pathways among over 8,000 microorganisms whose genomes are available in the NCBI RefSeq database. Combined analysis of bacterial, archaeal, and fungal genomes with both hidden Markov models and reciprocal BLAST identified 265 putative steroid degraders within only Actinobacteria and Proteobacteria, which mainly originated from soil, eukaryotic host, and aquatic environments. These bacteria include members of 17 genera not previously known to contain steroid degraders. A pathway for cholesterol degradation was conserved in many actinobacterial genera, particularly in members of the Corynebacterineae, and a pathway for cholate degradation was conserved in members of the genus Rhodococcus. A pathway for testosterone and, sometimes, cholate degradation had a patchy distribution among Proteobacteria. The steroid degradation genes tended to occur within large gene clusters. Growth experiments confirmed bioinformatic predictions of steroid metabolism capacity in nine bacterial strains. The results indicate there was a single ancestral 9,10-seco-steroid degradation pathway. Gene duplication, likely in a progenitor of Rhodococcus, later gave rise to a cholate degradation pathway. Proteobacteria and additional Actinobacteria subsequently obtained a cholate degradation pathway via horizontal gene transfer, in some cases facilitated by plasmids. Catabolism of steroids appears to be an important component of the ecological niches of broad groups of Actinobacteria and individual species of Proteobacteria. PMID:26956583

  18. Genome-Wide Analysis of Human Metapneumovirus Evolution

    PubMed Central

    Kim, Jin Il; Park, Sehee; Lee, Ilseob; Park, Kwang Sook; Kwak, Eun Jung; Moon, Kwang Mee; Lee, Chang Kyu; Bae, Joon-Yong; Park, Man-Seong; Song, Ki-Joon

    2016-01-01

    Human metapneumovirus (HMPV) has been described as an important etiologic agent of upper and lower respiratory tract infections, especially in young children and the elderly. Most of school-aged children might be introduced to HMPVs, and exacerbation with other viral or bacterial super-infection is common. However, our understanding of the molecular evolution of HMPVs remains limited. To address the comprehensive evolutionary dynamics of HMPVs, we report a genome-wide analysis of the eight genes (N, P, M, F, M2, SH, G, and L) using 103 complete genome sequences. Phylogenetic reconstruction revealed that the eight genes from one HMPV strain grouped into the same genetic group among the five distinct lineages (A1, A2a, A2b, B1, and B2). A few exceptions of phylogenetic incongruence might suggest past recombination events, and we detected possible recombination breakpoints in the F, SH, and G coding regions. The five genetic lineages of HMPVs shared quite remote common ancestors ranging more than 220 to 470 years of age with the most recent origins for the A2b sublineage. Purifying selection was common, but most protein genes except the F and M2-2 coding regions also appeared to experience episodic diversifying selection. Taken together, these suggest that the five lineages of HMPVs maintain their individual evolutionary dynamics and that recombination and selection forces might work on shaping the genetic diversity of HMPVs. PMID:27046055

  19. Analysis of singleton ORFans in fully sequenced microbial genomes.

    PubMed

    Siew, Naomi; Fischer, Daniel

    2003-11-01

    Singleton sequence ORFans are orphan ORFs (open reading frames) that have no detectable sequence similarity to any other sequence in the databases. ORFans are of particular interest not only as evolutionary puzzles but also because we can learn little about them using bioinformatics tools. Here, we present a first systematic analysis of singleton ORFans in the first 60 fully sequenced microbial genomes. We show that although ORFans have been underemphasized, the number of ORFans is steadily growing, currently accounting for 23,634 sequences. At the same time, the percentage of ORFans as a fraction of all sequences is slowly diminishing, and is currently about 14%. Short ORFans comprise about 61% of all ORFans. The abundance of short ORFans may be due to a yet unexplained artifact. The data also suggest that the number of longer ORFans may soon diminish as more genomes of closely related organisms become available. To better address the questions about the functions and origins of ORFans, we propose to focus further studies on the longer ORFans, with emphasis on three new types of ORFans: ORFan modules, paralogous ORFans, and orthologous ORFans. We conclude that the large number of ORFans reflects an intrinsic property of the genetic material not yet fully understood. Further computational and experimental studies aimed at understanding Nature's protein diversity should also include ORFans. PMID:14517975

  20. Genomic Analysis of Stress Response against Arsenic in Caenorhabditis elegans

    PubMed Central

    Sahu, Surasri N.; Lewis, Jada; Patel, Isha; Bozdag, Serdar; Lee, Jeong H.; Sprando, Robert; Cinar, Hediye Nese

    2013-01-01

    Arsenic, a known human carcinogen, is widely distributed around the world and found in particularly high concentrations in certain regions including Southwestern US, Eastern Europe, India, China, Taiwan and Mexico. Chronic arsenic poisoning affects millions of people worldwide and is associated with increased risk of many diseases including arthrosclerosis, diabetes and cancer. In this study, we explored genome level global responses to high and low levels of arsenic exposure in Caenorhabditis elegans using Affymetrix expression microarrays. This experimental design allows us to do microarray analysis of dose-response relationships of global gene expression patterns. High dose (0.03%) exposure caused stronger global gene expression changes in comparison with low dose (0.003%) exposure, suggesting a positive dose-response correlation. Biological processes such as oxidative stress, and iron metabolism, which were previously reported to be involved in arsenic toxicity studies using cultured cells, experimental animals, and humans, were found to be affected in C. elegans. We performed genome-wide gene expression comparisons between our microarray data and publicly available C. elegans microarray datasets of cadmium, and sediment exposure samples of German rivers Rhine and Elbe. Bioinformatics analysis of arsenic-responsive regulatory networks were done using FastMEDUSA program. FastMEDUSA analysis identified cancer-related genes, particularly genes associated with leukemia, such as dnj-11, which encodes a protein orthologous to the mammalian ZRF1/MIDA1/MPP11/DNAJC2 family of ribosome-associated molecular chaperones. We analyzed the protective functions of several of the identified genes using RNAi. Our study indicates that C. elegans could be a substitute model to study the mechanism of metal toxicity using high-throughput expression data and bioinformatics tools such as FastMEDUSA. PMID:23894281

  1. Systems-Level Analysis of Genome-Wide Association Data

    PubMed Central

    Farber, Charles R.

    2013-01-01

    Genome-wide association studies (GWAS) have emerged as the method of choice for identifying common variants affecting complex disease. In a GWAS, particular attention is placed, for obvious reasons, on single-nucleotide polymorphisms (SNPs) that exceed stringent genome-wide significance thresholds. However, it is expected that many SNPs with only nominal evidence of association (e.g., P < 0.05) truly influence disease. Efforts to extract additional biological information from entire GWAS datasets have primarily focused on pathway-enrichment analyses. However, these methods suffer from a number of limitations and typically fail to lead to testable hypotheses. To evaluate alternative approaches, we performed a systems-level analysis of GWAS data using weighted gene coexpression network analysis. A weighted gene coexpression network was generated for 1918 genes harboring SNPs that displayed nominal evidence of association (P ≤ 0.05) from a GWAS of bone mineral density (BMD) using microarray data on circulating monocytes isolated from individuals with extremely low or high BMD. Thirteen distinct gene modules were identified, each comprising coexpressed and highly interconnected GWAS genes. Through the characterization of module content and topology, we illustrate how network analysis can be used to discover disease-associated subnetworks and characterize novel interactions for genes with a known role in the regulation of BMD. In addition, we provide evidence that network metrics can be used as a prioritizing tool when selecting genes and SNPs for replication studies. Our results highlight the advantages of using systems-level strategies to add value to and inform GWAS. PMID:23316444

  2. Genomic analysis of stress response against arsenic in Caenorhabditis elegans.

    PubMed

    Sahu, Surasri N; Lewis, Jada; Patel, Isha; Bozdag, Serdar; Lee, Jeong H; Sprando, Robert; Cinar, Hediye Nese

    2013-01-01

    Arsenic, a known human carcinogen, is widely distributed around the world and found in particularly high concentrations in certain regions including Southwestern US, Eastern Europe, India, China, Taiwan and Mexico. Chronic arsenic poisoning affects millions of people worldwide and is associated with increased risk of many diseases including arthrosclerosis, diabetes and cancer. In this study, we explored genome level global responses to high and low levels of arsenic exposure in Caenorhabditis elegans using Affymetrix expression microarrays. This experimental design allows us to do microarray analysis of dose-response relationships of global gene expression patterns. High dose (0.03%) exposure caused stronger global gene expression changes in comparison with low dose (0.003%) exposure, suggesting a positive dose-response correlation. Biological processes such as oxidative stress, and iron metabolism, which were previously reported to be involved in arsenic toxicity studies using cultured cells, experimental animals, and humans, were found to be affected in C. elegans. We performed genome-wide gene expression comparisons between our microarray data and publicly available C. elegans microarray datasets of cadmium, and sediment exposure samples of German rivers Rhine and Elbe. Bioinformatics analysis of arsenic-responsive regulatory networks were done using FastMEDUSA program. FastMEDUSA analysis identified cancer-related genes, particularly genes associated with leukemia, such as dnj-11, which encodes a protein orthologous to the mammalian ZRF1/MIDA1/MPP11/DNAJC2 family of ribosome-associated molecular chaperones. We analyzed the protective functions of several of the identified genes using RNAi. Our study indicates that C. elegans could be a substitute model to study the mechanism of metal toxicity using high-throughput expression data and bioinformatics tools such as FastMEDUSA. PMID:23894281

  3. 13C metabolic flux analysis at a genome-scale.

    PubMed

    Gopalakrishnan, Saratram; Maranas, Costas D

    2015-11-01

    Metabolic models used in 13C metabolic flux analysis generally include a limited number of reactions primarily from central metabolism. They typically omit degradation pathways, complete cofactor balances, and atom transition contributions for reactions outside central metabolism. This study addresses the impact on prediction fidelity of scaling-up mapping models to a genome-scale. The core mapping model employed in this study accounts for (75 reactions and 65 metabolites) primarily from central metabolism. The genome-scale metabolic mapping model (GSMM) (697 reaction and 595 metabolites) is constructed using as a basis the iAF1260 model upon eliminating reactions guaranteed not to carry flux based on growth and fermentation data for a minimal glucose growth medium. Labeling data for 17 amino acid fragments obtained from cells fed with glucose labeled at the second carbon was used to obtain fluxes and ranges. Metabolic fluxes and confidence intervals are estimated, for both core and genome-scale mapping models, by minimizing the sum of square of differences between predicted and experimentally measured labeling patterns using the EMU decomposition algorithm. Overall, we find that both topology and estimated values of the metabolic fluxes remain largely consistent between core and GSM model. Stepping up to a genome-scale mapping model leads to wider flux inference ranges for 20 key reactions present in the core model. The glycolysis flux range doubles due to the possibility of active gluconeogenesis, the TCA flux range expanded by 80% due to the availability of a bypass through arginine consistent with labeling data, and the transhydrogenase reaction flux was essentially unresolved due to the presence of as many as five routes for the inter-conversion of NADPH to NADH afforded by the genome-scale model. By globally accounting for ATP demands in the GSMM model the unused ATP decreased drastically with the lower bound matching the maintenance ATP requirement. A non

  4. Tomato functional genomics database (TFGD): a comprehensive collection and analysis package for tomato functional genomics

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Tomato Functional Genomics Database (TFGD; http://ted.bti.cornell.edu) provides a comprehensive systems biology resource to store, mine, analyze, visualize and integrate large-scale tomato functional genomics datasets. The database is expanded from the previously described Tomato Expression Database...

  5. Genome sequence analysis of the model grass Brachypodium distachyon: insights into grass genome evolution

    SciTech Connect

    Schulman, Al

    2009-08-09

    Three subfamilies of grasses, the Erhardtoideae (rice), the Panicoideae (maize, sorghum, sugar cane and millet), and the Pooideae (wheat, barley and cool season forage grasses) provide the basis of human nutrition and are poised to become major sources of renewable energy. Here we describe the complete genome sequence of the wild grass Brachypodium distachyon (Brachypodium), the first member of the Pooideae subfamily to be completely sequenced. Comparison of the Brachypodium, rice and sorghum genomes reveals a precise sequence- based history of genome evolution across a broad diversity of the grass family and identifies nested insertions of whole chromosomes into centromeric regions as a predominant mechanism driving chromosome evolution in the grasses. The relatively compact genome of Brachypodium is maintained by a balance of retroelement replication and loss. The complete genome sequence of Brachypodium, coupled to its exceptional promise as a model system for grass research, will support the development of new energy and food crops

  6. The tiger genome and comparative analysis with lion and snow leopard genomes.

    PubMed

    Cho, Yun Sung; Hu, Li; Hou, Haolong; Lee, Hang; Xu, Jiaohui; Kwon, Soowhan; Oh, Sukhun; Kim, Hak-Min; Jho, Sungwoong; Kim, Sangsoo; Shin, Young-Ah; Kim, Byung Chul; Kim, Hyunmin; Kim, Chang-Uk; Luo, Shu-Jin; Johnson, Warren E; Koepfli, Klaus-Peter; Schmidt-Küntzel, Anne; Turner, Jason A; Marker, Laurie; Harper, Cindy; Miller, Susan M; Jacobs, Wilhelm; Bertola, Laura D; Kim, Tae Hyung; Lee, Sunghoon; Zhou, Qian; Jung, Hyun-Ju; Xu, Xiao; Gadhvi, Priyvrat; Xu, Pengwei; Xiong, Yingqi; Luo, Yadan; Pan, Shengkai; Gou, Caiyun; Chu, Xiuhui; Zhang, Jilin; Liu, Sanyang; He, Jing; Chen, Ying; Yang, Linfeng; Yang, Yulan; He, Jiaju; Liu, Sha; Wang, Junyi; Kim, Chul Hong; Kwak, Hwanjong; Kim, Jong-Soo; Hwang, Seungwoo; Ko, Junsu; Kim, Chang-Bae; Kim, Sangtae; Bayarlkhagva, Damdin; Paek, Woon Kee; Kim, Seong-Jin; O'Brien, Stephen J; Wang, Jun; Bhak, Jong

    2013-01-01

    Tigers and their close relatives (Panthera) are some of the world's most endangered species. Here we report the de novo assembly of an Amur tiger whole-genome sequence as well as the genomic sequences of a white Bengal tiger, African lion, white African lion and snow leopard. Through comparative genetic analyses of these genomes, we find genetic signatures that may reflect molecular adaptations consistent with the big cats' hypercarnivorous diet and muscle strength. We report a snow leopard-specific genetic determinant in EGLN1 (Met39>Lys39), which is likely to be associated with adaptation to high altitude. We also detect a TYR260G>A mutation likely responsible for the white lion coat colour. Tiger and cat genomes show similar repeat composition and an appreciably conserved synteny. Genomic data from the five big cats provide an invaluable resource for resolving easily identifiable phenotypes evident in very close, but distinct, species. PMID:24045858

  7. The tiger genome and comparative analysis with lion and snow leopard genomes

    PubMed Central

    Cho, Yun Sung; Hu, Li; Hou, Haolong; Lee, Hang; Xu, Jiaohui; Kwon, Soowhan; Oh, Sukhun; Kim, Hak-Min; Jho, Sungwoong; Kim, Sangsoo; Shin, Young-Ah; Kim, Byung Chul; Kim, Hyunmin; Kim, Chang-uk; Luo, Shu-Jin; Johnson, Warren E.; Koepfli, Klaus-Peter; Schmidt-Küntzel, Anne; Turner, Jason A.; Marker, Laurie; Harper, Cindy; Miller, Susan M.; Jacobs, Wilhelm; Bertola, Laura D.; Kim, Tae Hyung; Lee, Sunghoon; Zhou, Qian; Jung, Hyun-Ju; Xu, Xiao; Gadhvi, Priyvrat; Xu, Pengwei; Xiong, Yingqi; Luo, Yadan; Pan, Shengkai; Gou, Caiyun; Chu, Xiuhui; Zhang, Jilin; Liu, Sanyang; He, Jing; Chen, Ying; Yang, Linfeng; Yang, Yulan; He, Jiaju; Liu, Sha; Wang, Junyi; Kim, Chul Hong; Kwak, Hwanjong; Kim, Jong-Soo; Hwang, Seungwoo; Ko, Junsu; Kim, Chang-Bae; Kim, Sangtae; Bayarlkhagva, Damdin; Paek, Woon Kee; Kim, Seong-Jin; O’Brien, Stephen J.; Wang, Jun; Bhak, Jong

    2013-01-01

    Tigers and their close relatives (Panthera) are some of the world’s most endangered species. Here we report the de novo assembly of an Amur tiger whole-genome sequence as well as the genomic sequences of a white Bengal tiger, African lion, white African lion and snow leopard. Through comparative genetic analyses of these genomes, we find genetic signatures that may reflect molecular adaptations consistent with the big cats’ hypercarnivorous diet and muscle strength. We report a snow leopard-specific genetic determinant in EGLN1 (Met39>Lys39), which is likely to be associated with adaptation to high altitude. We also detect a TYR260G>A mutation likely responsible for the white lion coat colour. Tiger and cat genomes show similar repeat composition and an appreciably conserved synteny. Genomic data from the five big cats provide an invaluable resource for resolving easily identifiable phenotypes evident in very close, but distinct, species. PMID:24045858

  8. Comparative Genomics Analysis of Streptomyces Species Reveals Their Adaptation to the Marine Environment and Their Diversity at the Genomic Level

    PubMed Central

    Tian, Xinpeng; Zhang, Zhewen; Yang, Tingting; Chen, Meili; Li, Jie; Chen, Fei; Yang, Jin; Li, Wenjie; Zhang, Bing; Zhang, Zhang; Wu, Jiayan; Zhang, Changsheng; Long, Lijuan; Xiao, Jingfa

    2016-01-01

    Over 200 genomes of streptomycete strains that were isolated from various environments are available from the NCBI. However, little is known about the characteristics that are linked to marine adaptation in marine-derived streptomycetes. The particularity and complexity of the marine environment suggest that marine streptomycetes are genetically diverse. Here, we sequenced nine strains from the Streptomyces genus that were isolated from different longitudes, latitudes, and depths of the South China Sea. Then we compared these strains to 22 NCBI downloaded streptomycete strains. Thirty-one streptomycete strains are clearly grouped into a marine-derived subgroup and multiple source subgroup-based phylogenetic tree. The phylogenetic analyses have revealed the dynamic process underlying streptomycete genome evolution, and lateral gene transfer is an important driving force during the process. Pan-genomics analyses have revealed that streptomycetes have an open pan-genome, which reflects the diversity of these streptomycetes and guarantees the species a quick and economical response to diverse environments. Functional and comparative genomics analyses indicate that the marine-derived streptomycetes subgroup possesses some common characteristics of marine adaptation. Our findings have expanded our knowledge of how ocean isolates of streptomycete strains adapt to marine environments. The availability of streptomycete genomes from the South China Sea will be beneficial for further analysis on marine streptomycetes and will enrich the South China Sea’s genetic data sources. PMID:27446038

  9. Exploring a Nonmodel Teleost Genome Through RAD Sequencing—Linkage Mapping in Common Pandora, Pagellus erythrinus and Comparative Genomic Analysis

    PubMed Central

    Manousaki, Tereza; Tsakogiannis, Alexandros; Taggart, John B.; Palaiokostas, Christos; Tsaparis, Dimitris; Lagnel, Jacques; Chatziplis, Dimitrios; Magoulas, Antonios; Papandroulakis, Nikos; Mylonas, Constantinos C.; Tsigenopoulos, Costas S.

    2015-01-01

    Common pandora (Pagellus erythrinus) is a benthopelagic marine fish belonging to the teleost family Sparidae, and a newly recruited species in Mediterranean aquaculture. The paucity of genetic information relating to sparids, despite their growing economic value for aquaculture, provides the impetus for exploring the genomics of this fish group. Genomic tool development, such as genetic linkage maps provision, lays the groundwork for linking genotype to phenotype, allowing fine-mapping of loci responsible for beneficial traits. In this study, we applied ddRAD methodology to identify polymorphic markers in a full-sib family of common pandora. Employing the Illumina MiSeq platform, we sampled and sequenced a size-selected genomic fraction of 99 individuals, which led to the identification of 920 polymorphic loci. Downstream mapping analysis resulted in the construction of 24 robust linkage groups, corresponding to the karyotype of the species. The common pandora linkage map showed varying degrees of conserved synteny with four other teleost genomes, namely the European seabass (Dicentrarchus labrax), Nile tilapia (Oreochromis niloticus), stickleback (Gasterosteus aculeatus), and medaka (Oryzias latipes), suggesting a conserved genomic evolution in Sparidae. Our work exploits the possibilities of genotyping by sequencing to gain novel insights into genome structure and evolution. Such information will boost the study of cultured species and will set the foundation for a deeper understanding of the complex evolutionary history of teleosts. PMID:26715088

  10. Comparative Genomics Analysis of Streptomyces Species Reveals Their Adaptation to the Marine Environment and Their Diversity at the Genomic Level.

    PubMed

    Tian, Xinpeng; Zhang, Zhewen; Yang, Tingting; Chen, Meili; Li, Jie; Chen, Fei; Yang, Jin; Li, Wenjie; Zhang, Bing; Zhang, Zhang; Wu, Jiayan; Zhang, Changsheng; Long, Lijuan; Xiao, Jingfa

    2016-01-01

    Over 200 genomes of streptomycete strains that were isolated from various environments are available from the NCBI. However, little is known about the characteristics that are linked to marine adaptation in marine-derived streptomycetes. The particularity and complexity of the marine environment suggest that marine streptomycetes are genetically diverse. Here, we sequenced nine strains from the Streptomyces genus that were isolated from different longitudes, latitudes, and depths of the South China Sea. Then we compared these strains to 22 NCBI downloaded streptomycete strains. Thirty-one streptomycete strains are clearly grouped into a marine-derived subgroup and multiple source subgroup-based phylogenetic tree. The phylogenetic analyses have revealed the dynamic process underlying streptomycete genome evolution, and lateral gene transfer is an important driving force during the process. Pan-genomics analyses have revealed that streptomycetes have an open pan-genome, which reflects the diversity of these streptomycetes and guarantees the species a quick and economical response to diverse environments. Functional and comparative genomics analyses indicate that the marine-derived streptomycetes subgroup possesses some common characteristics of marine adaptation. Our findings have expanded our knowledge of how ocean isolates of streptomycete strains adapt to marine environments. The availability of streptomycete genomes from the South China Sea will be beneficial for further analysis on marine streptomycetes and will enrich the South China Sea's genetic data sources. PMID:27446038

  11. Genetic analysis of type 1 diabetes using whole genome approaches.

    PubMed Central

    Todd, J A

    1995-01-01

    Whole genome linkage analysis of type 1 diabetes using affected sib pair families and semi-automated genotyping and data capture procedures has shown how type 1 diabetes is inherited. A major proportion of clustering of the disease in families can be accounted for by sharing of alleles at susceptibility loci in the major histocompatibility complex on chromosome 6 (IDDM1) and at a minimum of 11 other loci on nine chromosomes. Primary etiological components of IDDM1, the HLA-DQB1 and -DRB1 class II immune response genes, and of IDDM2, the minisatellite repeat sequence in the 5' regulatory region of the insulin gene on chromosome 11p15, have been identified. Identification of the other loci will involve linkage disequilibrium mapping and sequencing of candidate genes in regions of linkage. PMID:7567975

  12. Complete mitochondrial genome of Paracobitis variegates and its phylogenetic analysis.

    PubMed

    Liu, Chang Zhong; Wei, Guang Hui; Hu, Jian He; Liu, Xing You

    2016-07-01

    In this study, the complete mitochondrial genome sequence of the Paracobitis variegates was first reported. The total length of the mitogenome is 16,571 bp long with the A + T content of 55.6%. It contains the typical structure, including 13 protein-coding genes, 22 transfer RNA genes, two ribosomal RNA genes, and one D-loop region. The protein-coding genes start with the typical ATG codon, while COI gene uses GTG as the initiation codon. Most tRNA genes could form typical secondary structures except tRNA(ser), which had an absence of the DHU arm. There are 43 helices structures in 12S rRNA, and six domains, 53 helices structures in 16S rRNA. According to the phylogenetic analysis, Paracobitis variegates has a closer relationship with Barbatula toni. PMID:25922960

  13. STINGRAY: system for integrated genomic resources and analysis

    PubMed Central

    2014-01-01

    Background The STINGRAY system has been conceived to ease the tasks of integrating, analyzing, annotating and presenting genomic and expression data from Sanger and Next Generation Sequencing (NGS) platforms. Findings STINGRAY includes: (a) a complete and integrated workflow (more than 20 bioinformatics tools) ranging from functional annotation to phylogeny; (b) a MySQL database schema, suitable for data integration and user access control; and (c) a user-friendly graphical web-based interface that makes the system intuitive, facilitating the tasks of data analysis and annotation. Conclusion STINGRAY showed to be an easy to use and complete system for analyzing sequencing data. While both Sanger and NGS platforms are supported, the system could be faster using Sanger data, since the large NGS datasets could potentially slow down the MySQL database usage. STINGRAY is available at http://stingray.biowebdb.org and the open source code at http://sourceforge.net/projects/stingray-biowebdb/. PMID:24606808

  14. Detangling the Web of Sulfur Metabolisms in Santa Barbara Basin with High-Resolution δ34S and Genomic Profiles

    NASA Astrophysics Data System (ADS)

    Raven, M. R.; Adkins, J. F.; Sessions, A. L.; Dawson, K.; Connon, S. A.; Orphan, V. J.

    2014-12-01

    Sulfur metabolisms are major drivers of organic matter remineralization and microbial growth in marine sediments. Sulfur-isotope systematics are particularly powerful for interrogating metabolic processes in these systems due to the large sulfur-isotope fractionations associated with bacterial sulfate reduction (BSR) and some other metabolic reactions. Recent analytical advancements have made it possible to measure δ34S values of very small samples (>50 nmol), including aqueous sulfate and sulfide as well as pyrite, elemental sulfur, and multiple fractions of sedimentary organic matter. We have generated comprehensive 2.5 cm-resolution depth profiles of these sulfur pools over a 2-m core from Santa Barbara Basin, a sub-oxic environment off the California coast. We find that the porewater sulfide δ34S values appear to be strongly influenced by anaerobic sulfide oxidation and sulfur disproportionation in addition to BSR. These sulfur-isotope signals can be tracked over the course of several thousand years of sediment diagenesis, moving from the oxic-anoxic transition at the sediment-water interface to the sulfate-methane transition zone in deeper sediments. Shifts in δ34S relationships among sulfur pools correlate with changes in microbial community composition as shown in TAG genomic data, which supports the existence of the metabolisms indicated by δ34S profiles. Our results suggest that the existence and activity of multiple microbial communities and coexisting sulfur metabolisms have the potential to be recorded in sedimentary δ34S records.

  15. IslandViewer 3: more flexible, interactive genomic island discovery, visualization and analysis.

    PubMed

    Dhillon, Bhavjinder K; Laird, Matthew R; Shay, Julie A; Winsor, Geoffrey L; Lo, Raymond; Nizam, Fazmin; Pereira, Sheldon K; Waglechner, Nicholas; McArthur, Andrew G; Langille, Morgan G I; Brinkman, Fiona S L

    2015-07-01

    IslandViewer (http://pathogenomics.sfu.ca/islandviewer) is a widely used web-based resource for the prediction and analysis of genomic islands (GIs) in bacterial and archaeal genomes. GIs are clusters of genes of probable horizontal origin, and are of high interest since they disproportionately encode genes involved in medically and environmentally important adaptations, including antimicrobial resistance and virulence. We now report a major new release of IslandViewer, since the last release in 2013. IslandViewer 3 incorporates a completely new genome visualization tool, IslandPlot, enabling for the first time interactive genome analysis and gene search capabilities using synchronized circular, horizontal and vertical genome views. In addition, more curated virulence factors and antimicrobial resistance genes have been incorporated, and homologs of these genes identified in closely related genomes using strict filters. Pathogen-associated genes have been re-calculated for all pre-computed complete genomes. For user-uploaded genomes to be analysed, IslandViewer 3 can also now handle incomplete genomes, with an improved queuing system on compute nodes to handle user demand. Overall, IslandViewer 3 represents a significant new version of this GI analysis software, with features that may make it more broadly useful for general microbial genome analysis and visualization. PMID:25916842

  16. Resonant nuclear reaction analysis with high depth resolution

    NASA Astrophysics Data System (ADS)

    Kul'ment'ev, A. I.; Storizhko, V. E.; Zabashta, O. I.

    1994-03-01

    This paper discusses the potential of the resonant NRA technique for measuring the impurity depth profiles. An integrated program package is developed for analysis of the experimental data with high depth resolution. The input information for the package consists of the experimental yield from the impurity reaction selected. The impurity profile can be obtained by solving either a direct or an inverse problem. In the former case the simulated yield from the assigned profile is fitted to the experimental yield. In the latter case the depth profile is obtained by solving the theoretical yield equation. Since the latter procedure is an incorrect problem, we used Tikhonov's regularization method. This approach allows a correct inclusion of the experimental yield errors as well as of the assumptions made in the model describing the incident ion beam interaction with the material. The equation for the yield is derived taking into account the energy distribution of the initial beam, energy loss straggling, resonance width and Doppler effect. The package of programs is menu-driven with a friendly user interface and possibilities of visual representation, which makes the spectrum processing procedure simple and easy even to an unexperienced user. The computational system permits convenient selection of a certain reaction with an arbitrary shape of the resonance, selection of the beam-material interaction and energy loss straggling model and allows the processing of the spectra from a single or several simultaneously excited resonances.

  17. Approach and analysis of contention resolution in optical switching network

    NASA Astrophysics Data System (ADS)

    Yang, Xiaolong; Dang, Mingrui; Mao, Youju; Zhang, Min; Li, Lemin

    2002-09-01

    As the Internet traffic exponentially growing, the next generation IP network is forced to scale far beyond its present performances. The more and more mature optical switching technology, such as optical burst switching, is expected to provide an ideal infrastructure for meeting the demands. However in optical switching, there is one critical issue, namely contention, which roots from multiple optical data requesting the same output port How to resolve contention in optical domain will have a significant effect on the performance (in terms of the burst-loss rate, average delay time and network throughput) of optical switching network. The paper proposes a contention resolution scheme based on FDL, AWG and TWC. Here FDL is used as two functions, i.e. forwarding and feedback for smaller or longer buffering time requirements respectively. In the paper we incorporate the scheme into the design of optical switch. We descript the optical data buffering strategy when contention occurs. We also study the performance of the scheme in a Markov process model under the assumption of uniform Bernoulli traffic, and validate the analysis through numerical simulation. The computer simulation results show that the scheme can efficiently use FDL buffering and AWG switching capacities, hence can obviously reduce the contentions.

  18. Accuracy Enhancement of Inertial Sensors Utilizing High Resolution Spectral Analysis

    PubMed Central

    Noureldin, Aboelmagd; Armstrong, Justin; El-Shafie, Ahmed; Karamat, Tashfeen; McGaughey, Don; Korenberg, Michael; Hussain, Aini

    2012-01-01

    In both military and civilian applications, the inertial navigation system (INS) and the global positioning system (GPS) are two complementary technologies that can be integrated to provide reliable positioning and navigation information for land vehicles. The accuracy enhancement of INS sensors and the integration of INS with GPS are the subjects of widespread research. Wavelet de-noising of INS sensors has had limited success in removing the long-term (low-frequency) inertial sensor errors. The primary objective of this research is to develop a novel inertial sensor accuracy enhancement technique that can remove both short-term and long-term error components from inertial sensor measurements prior to INS mechanization and INS/GPS integration. A high resolution spectral analysis technique called the fast orthogonal search (FOS) algorithm is used to accurately model the low frequency range of the spectrum, which includes the vehicle motion dynamics and inertial sensor errors. FOS models the spectral components with the most energy first and uses an adaptive threshold to stop adding frequency terms when fitting a term does not reduce the mean squared error more than fitting white noise. The proposed method was developed, tested and validated through road test experiments involving both low-end tactical grade and low cost MEMS-based inertial systems. The results demonstrate that in most cases the position accuracy during GPS outages using FOS de-noised data is superior to the position accuracy using wavelet de-noising.

  19. A parallel solution for high resolution histological image analysis.

    PubMed

    Bueno, G; González, R; Déniz, O; García-Rojo, M; González-García, J; Fernández-Carrobles, M M; Vállez, N; Salido, J

    2012-10-01

    This paper describes a general methodology for developing parallel image processing algorithms based on message passing for high resolution images (on the order of several Gigabytes). These algorithms have been applied to histological images and must be executed on massively parallel processing architectures. Advances in new technologies for complete slide digitalization in pathology have been combined with developments in biomedical informatics. However, the efficient use of these digital slide systems is still a challenge. The image processing that these slides are subject to is still limited both in terms of data processed and processing methods. The work presented here focuses on the need to design and develop parallel image processing tools capable of obtaining and analyzing the entire gamut of information included in digital slides. Tools have been developed to assist pathologists in image analysis and diagnosis, and they cover low and high-level image processing methods applied to histological images. Code portability, reusability and scalability have been tested by using the following parallel computing architectures: distributed memory with massive parallel processors and two networks, INFINIBAND and Myrinet, composed of 17 and 1024 nodes respectively. The parallel framework proposed is flexible, high performance solution and it shows that the efficient processing of digital microscopic images is possible and may offer important benefits to pathology laboratories. PMID:22522064

  20. Functional Analysis of Shewanella, a cross genome comparison.

    SciTech Connect

    Serres, Margrethe H.

    2009-05-15

    The bacterial genus Shewanella includes a group of highly versatile organisms that have successfully adapted to life in many environments ranging from aquatic (fresh and marine) to sedimentary (lake and marine sediments, subsurface sediments, sea vent). A unique respiratory capability of the Shewanellas, initially observed for Shewanella oneidensis MR-1, is the ability to use metals and metalloids, including radioactive compounds, as electron acceptors. Members of the Shewanella genus have also been shown to degrade environmental pollutants i.e. halogenated compounds, making this group highly applicable for the DOE mission. S. oneidensis MR-1 has in addition been found to utilize a diverse set of nutrients and to have a large set of genes dedicated to regulation and to sensing of the environment. The sequencing of the S. oneidensis MR-1 genome facilitated experimental and bioinformatics analyses by a group of collaborating researchers, the Shewanella Federation. Through the joint effort and with support from Department of Energy S. oneidensis MR-1 has become a model organism of study. Our work has been a functional analysis of S. oneidensis MR-1, both by itself and as part of a comparative study. We have improved the annotation of gene products, assigned metabolic functions, and analyzed protein families present in S. oneidensis MR-1. The data has been applied to analysis of experimental data (i.e. gene expression, proteome) generated for S. oneidensis MR-1. Further, this work has formed the basis for a comparative study of over 20 members of the Shewanella genus. The species and strains selected for genome sequencing represented an evolutionary gradient of DNA relatedness, ranging from close to intermediate, and to distant. The organisms selected have also adapted to a variety of ecological niches. Through our work we have been able to detect and interpret genome similarities and differences between members of the genus. We have in this way contributed to the

  1. Single exon-resolution targeted chromosomal microarray analysis of known and candidate intellectual disability genes.

    PubMed

    Tucker, Tracy; Zahir, Farah R; Griffith, Malachi; Delaney, Allen; Chai, David; Tsang, Erica; Lemyre, Emmanuelle; Dobrzeniecka, Sylvia; Marra, Marco; Eydoux, Patrice; Langlois, Sylvie; Hamdan, Fadi F; Michaud, Jacques L; Friedman, Jan M

    2014-06-01

    Intellectual disability affects about 3% of individuals globally, with∼50% idiopathic. We designed an exonic-resolution array targeting all known submicroscopic chromosomal intellectual disability syndrome loci, causative genes for intellectual disability, and potential candidate genes, all genes encoding glutamate receptors and epigenetic regulators. Using this platform, we performed chromosomal microarray analysis on 165 intellectual disability trios (affected child and both normal parents). We identified and independently validated 36 de novo copy-number changes in 32 trios. In all, 67% of the validated events were intragenic, involving only exon 1 (which includes the promoter sequence according to our design), exon 1 and adjacent exons, or one or more exons excluding exon 1. Seventeen of the 36 copy-number variants involve genes known to cause intellectual disability. Eleven of these, including seven intragenic variants, are clearly pathogenic (involving STXBP1, SHANK3 (3 patients), IL1RAPL1, UBE2A, NRXN1, MEF2C, CHD7, 15q24 and 9p24 microdeletion), two are likely pathogenic (PI4KA, DCX), two are unlikely to be pathogenic (GRIK2, FREM2), and two are unclear (ARID1B, 15q22 microdeletion). Twelve individuals with genomic imbalances identified by our array were tested with a clinical microarray, and six had a normal result. We identified de novo copy-number variants within genes not previously implicated in intellectual disability and uncovered pathogenic variation of known intellectual disability genes below the detection limit of standard clinical diagnostic chromosomal microarray analysis. PMID:24253858

  2. Reference genome-directed resolution of homologous and homeologous relationships within and between different oat linkage maps

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genome research on oat (Avena sativa) has received less attention than wheat and barley because it is a less prominent component of the food chain. To assess the potential of the model grass Brachypodium as a surrogate for oat genome research, the whole genome sequence (WGS) of Brachypodium was empl...

  3. 75 FR 27464 - Special Reporting, Analysis and Contingent Resolution Plans at Certain Large Insured Depository...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-05-17

    ... CORPORATION 12 CFR Part 360 RIN 3064-AD59 Special Reporting, Analysis and Contingent Resolution Plans at... complex financial parent companies to submit to the FDIC analysis, information, and contingent resolution... be wound down or resolved in an orderly fashion. The IDI's plan would include a gap analysis...

  4. Pattern Analysis and Decision Support for Cancer through Clinico-Genomic Profiles

    NASA Astrophysics Data System (ADS)

    Exarchos, Themis P.; Giannakeas, Nikolaos; Goletsis, Yorgos; Papaloukas, Costas; Fotiadis, Dimitrios I.

    Advances in genome technology are playing a growing role in medicine and healthcare. With the development of new technologies and opportunities for large-scale analysis of the genome, genomic data have a clear impact on medicine. Cancer prognostics and therapeutics are among the first major test cases for genomic medicine, given that all types of cancer are related with genomic instability. In this paper we present a novel system for pattern analysis and decision support in cancer. The system integrates clinical data from electronic health records and genomic data. Pattern analysis and data mining methods are applied to these integrated data and the discovered knowledge is used for cancer decision support. Through this integration, conclusions can be drawn for early diagnosis, staging and cancer treatment.

  5. The nuclear genome of Brachypodium distachyon: analysis of BAC end sequences.

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Due in part to its small genome (~350 Mb), Brachypodium distachyon is emerging as a model system for temperate grasses, including important crops like wheat and barley. We present the analysis of 10.9% of the Brachypodium genome based on 64,696 BAC end sequences (BES). Analysis of repeat DNA content...

  6. Surface ligation-based resonance light scattering analysis of methylated genomic DNA on a microarray platform.

    PubMed

    Ma, Lan; Lei, Zhen; Liu, Xia; Liu, Dianjun; Wang, Zhenxin

    2016-05-10

    DNA methylation is a crucial epigenetic modification and is closely related to tumorigenesis. Herein, a surface ligation-based high throughput method combined with bisulfite treatment is developed for analysis of methylated genomic DNA. In this method, a DNA microarray is employed as a reaction platform, and resonance light scattering (RLS) of nanoparticles is used as the detection principle. The specificity stems from allele-specific ligation of Taq DNA ligase, which is further enhanced by improving the fidelity of Taq DNA ligase in a heterogeneous reaction. Two amplification techniques, rolling circle amplification (RCA) and silver enhancement, are employed after the ligation reaction and a gold nanoparticle (GNP) labeling procedure is used to amplify the signal. As little as 0.01% methylated DNA (i.e. 2 pmol L(-1)) can be distinguished from the cocktail of methylated and unmethylated DNA by the proposed method. More importantly, this method shows good accuracy and sensitivity in profiling the methylation level of genomic DNA of three selected colonic cancer cell lines. This strategy provides a high throughput alternative with reasonable sensitivity and resolution for cancer study and diagnosis. PMID:27093298

  7. [Complete genomic analysis of a novel infectious bronchitis virus isolate].

    PubMed

    Hu, Bei-Xia; Yang, Shao-Hua; Zhang, Xiu-Mei; Zhang, Wei; Cao, San-Jie; Xu, Chuan-Tian; Huang, Qing-Hua; Zhang, Lin; Huang, Yan-Yan; Wen, Xin-Tian

    2014-07-01

    The genome of CK/CH/SD09/005, an isolate of infectious bronchitis virus (IBV), was characterized to enable the further understanding of the epidemiology and evolution of IBV in China. Twenty-five pairs of primers were designed to amplify the full-length genome of CK/CH/SD09/005. The nucleotide sequence of CK/CH/SD09/005 was compared with reference IBV strains retrieved from GenBank. The phylogenic relationship between CK/CH/SD09/005 and the reference strains was analyzed based on S1 gene sequences. The complete genome of CK/CH/SD09/005 consisted of 27691 nucleotides (nt), excluding the 5' cap and 3' poly A tail. The whole-genome of CK/CH/SD09/005 shared 97 - 99% nucleotide sequence homology with the GX-NN09032 strain, which was the only complete genome that was closely related to CK/CH/SD09/005. When compared with all reference strains except GX-NN09032, CK/CH/SD09/005 showed the highest similarity to ck/CH/LDL/091022 and SDIB821/2012 (QX-like) in the replicase gene (Gene 1) and 3'UTR, with a sequence identity rate of 97% and 98%, respectively. However, CK/CH/SD09/005 exhibited lower levels of similarity with ck/CH/LDL/091022 and SDIB821/2012 in S-3a-3b-3c/ E-M-5a-5b-N with a sequence identity of 72% - 90%. CK/CH/SD09/005 showed the highest level of nucleotide identity with Korean strain 1011, and Chinese strains CK/CH/LXJ/02I, DK/CH/HN/ZZ2004 and YX10, in ORF 3c/E (97%), 5a (96%), 5b (99%) and N (96%), respectively. ORFs 3a, 3b and M of CK/CH/SD09/005 exhibited no more than 90% homology with the reference strains, excluding GX-NN09032. The phylogenic analysis based on the S1 gene revealed that CK/CH/SD09/005 and 39 published strains were classified into seven clades (genotypes). CK/CH/SD09/005 was distributed in clade IV with several isolates collected between 2007 and 2012. CK/CH/SD09/005 showed 66% - 69% and 72% - 81% nucleotide identities with the IBV strains of other six clades in the S1 and S2 subunits, respectively. More over, multiple substitutions were

  8. GEnomes Management Application (GEM.app): a new software tool for large-scale collaborative genome analysis.

    PubMed

    Gonzalez, Michael A; Lebrigio, Rafael F Acosta; Van Booven, Derek; Ulloa, Rick H; Powell, Eric; Speziani, Fiorella; Tekin, Mustafa; Schüle, Rebecca; Züchner, Stephan

    2013-06-01

    Novel genes are now identified at a rapid pace for many Mendelian disorders, and increasingly, for genetically complex phenotypes. However, new challenges have also become evident: (1) effectively managing larger exome and/or genome datasets, especially for smaller labs; (2) direct hands-on analysis and contextual interpretation of variant data in large genomic datasets; and (3) many small and medium-sized clinical and research-based investigative teams around the world are generating data that, if combined and shared, will significantly increase the opportunities for the entire community to identify new genes. To address these challenges, we have developed GEnomes Management Application (GEM.app), a software tool to annotate, manage, visualize, and analyze large genomic datasets (https://genomics.med.miami.edu/). GEM.app currently contains ∼1,600 whole exomes from 50 different phenotypes studied by 40 principal investigators from 15 different countries. The focus of GEM.app is on user-friendly analysis for nonbioinformaticians to make next-generation sequencing data directly accessible. Yet, GEM.app provides powerful and flexible filter options, including single family filtering, across family/phenotype queries, nested filtering, and evaluation of segregation in families. In addition, the system is fast, obtaining results within 4 sec across ∼1,200 exomes. We believe that this system will further enhance identification of genetic causes of human disease. PMID:23463597

  9. Genome-Wide Prediction and Analysis of 3D-Domain Swapped Proteins in the Human Genome from Sequence Information

    PubMed Central

    Upadhyay, Atul Kumar; Sowdhamini, Ramanathan

    2016-01-01

    3D-domain swapping is one of the mechanisms of protein oligomerization and the proteins exhibiting this phenomenon have many biological functions. These proteins, which undergo domain swapping, have acquired much attention owing to their involvement in human diseases, such as conformational diseases, amyloidosis, serpinopathies, proteionopathies etc. Early realisation of proteins in the whole human genome that retain tendency to domain swap will enable many aspects of disease control management. Predictive models were developed by using machine learning approaches with an average accuracy of 78% (85.6% of sensitivity, 87.5% of specificity and an MCC value of 0.72) to predict putative domain swapping in protein sequences. These models were applied to many complete genomes with special emphasis on the human genome. Nearly 44% of the protein sequences in the human genome were predicted positive for domain swapping. Enrichment analysis was performed on the positively predicted sequences from human genome for their domain distribution, disease association and functional importance based on Gene Ontology (GO). Enrichment analysis was also performed to infer a better understanding of the functional importance of these sequences. Finally, we developed hinge region prediction, in the given putative domain swapped sequence, by using important physicochemical properties of amino acids. PMID:27467780

  10. Sequencing and Comparative Genome Analysis of Two Pathogenic Streptococcus gallolyticus Subspecies: Genome Plasticity, Adaptation and Virulence

    PubMed Central

    Teng, Yu-Ting; Wu, Hui-Lun; Liu, Yen-Ming; Wu, Keh-Ming; Chang, Chuan-Hsiung; Hsu, Ming-Ta

    2011-01-01

    Streptococcus gallolyticus infections in humans are often associated with bacteremia, infective endocarditis and colon cancers. The disease manifestations are different depending on the subspecies of S. gallolyticus causing the infection. Here, we present the complete genomes of S. gallolyticus ATCC 43143 (biotype I) and S. pasteurianus ATCC 43144 (biotype II.2). The genomic differences between the two biotypes were characterized with comparative genomic analyses. The chromosome of ATCC 43143 and ATCC 43144 are 2,36 and 2,10 Mb in length and encode 2246 and 1869 CDS respectively. The organization and genomic contents of both genomes were most similar to the recently published S. gallolyticus UCN34, where 2073 (92%) and 1607 (86%) of the ATCC 43143 and ATCC 43144 CDS were conserved in UCN34 respectively. There are around 600 CDS conserved in all Streptococcus genomes, indicating the Streptococcus genus has a small core-genome (constitute around 30% of total CDS) and substantial evolutionary plasticity. We identified eight and five regions of genome plasticity in ATCC 43143 and ATCC 43144 respectively. Within these regions, several proteins were recognized to contribute to the fitness and virulence of each of the two subspecies. We have also predicted putative cell-surface associated proteins that could play a role in adherence to host tissues, leading to persistent infections causing sub-acute and chronic diseases in humans. This study showed evidence that the S. gallolyticus still possesses genes making it suitable in a rumen environment, whereas the ability for S. pasteurianus to live in rumen is reduced. The genome heterogeneity and genetic diversity among the two biotypes, especially membrane and lipoproteins, most likely contribute to the differences in the pathogenesis of the two S. gallolyticus biotypes and the type of disease an infected patient eventually develops. PMID:21633709

  11. Genome-wide DNA methylation analysis in hepatocellular carcinoma.

    PubMed

    Yamada, Nobuhisa; Yasui, Kohichiroh; Dohi, Osamu; Gen, Yasuyuki; Tomie, Akira; Kitaichi, Tomoko; Iwai, Naoto; Mitsuyoshi, Hironori; Sumida, Yoshio; Moriguchi, Michihisa; Yamaguchi, Kanji; Nishikawa, Taichiro; Umemura, Atsushi; Naito, Yuji; Tanaka, Shinji; Arii, Shigeki; Itoh, Yoshito

    2016-04-01

    Epigenetic changes as well as genetic changes are mechanisms of tumorigenesis. We aimed to identify novel genes that are silenced by DNA hypermethylation in hepatocellular carcinoma (HCC). We screened for genes with promoter DNA hypermethylation using a genome-wide methylation microarray analysis in primary HCC (the discovery set). The microarray analysis revealed that there were 2,670 CpG sites that significantly differed in regards to the methylation level between the tumor and non-tumor liver tissues; 875 were significantly hypermethylated and 1,795 were significantly hypomethylated in the HCC tumors compared to the non‑tumor tissues. Further analyses using methylation-specific PCR, combined with expression analysis, in the validation set of primary HCC showed that, in addition to three known tumor-suppressor genes (APC, CDKN2A, and GSTP1), eight genes (AKR1B1, GRASP, MAP9, NXPE3, RSPH9, SPINT2, STEAP4, and ZNF154) were significantly hypermethylated and downregulated in the HCC tumors compared to the non-tumor liver tissues. Our results suggest that epigenetic silencing of these genes may be associated with HCC. PMID:26883180

  12. Comparative genomic analysis of hyperthermophilic archaeal fuselloviridae viruses

    SciTech Connect

    B. Wiedenheft; K. Stedman; F. Roberto; D. Willits; A. K. Gleske; L. Zoeller; J. Snyder; T. Douglas; M. Young

    2004-02-01

    The complete genome sequences of two Sulfolobus spindle-shaped viruses (SSVs) from acidic hot springs in Kamchatka (Russia) and Yellowstone National Park (United States) have been determined. These nonlytic temperate viruses were isolated from hyperthermophilic Sulfolobus hosts, and both viruses share the spindleshaped morphology characteristic of the Fuselloviridae family. These two genomes, in combination with the previously determined SSV1 genome from Japan and the SSV2 genome from Iceland, have allowed us to carry out a phylogenetic comparison of these geographically distributed hyperthermal viruses. Each virus contains a circular double-stranded DNA genome of _15 kbp with approximately 34 open reading frames (ORFs). These Fusellovirus ORFs show little or no similarity to genes in the public databases. In contrast, 18 ORFs are common to all four isolates and may represent the minimal gene set defining this viral group. In general, ORFs on one half of the genome are colinear and highly conserved, while ORFs on the other half are not. One shared ORF among all four genomes is an integrase of the tyrosine recombinase family. All four viral genomes integrate into their host tRNA genes. The specific tRNA gene used for integration varies, and one genome integrates into multiple loci. Several unique ORFs are found in the genome of each isolate.

  13. Comparative genomic analysis of hyperthermophilic archaeal Fuselloviridae viruses.

    PubMed

    Wiedenheft, Blake; Stedman, Kenneth; Roberto, Francisco; Willits, Deborah; Gleske, Anne-Kathrin; Zoeller, Luisa; Snyder, Jamie; Douglas, Trevor; Young, Mark

    2004-02-01

    The complete genome sequences of two Sulfolobus spindle-shaped viruses (SSVs) from acidic hot springs in Kamchatka (Russia) and Yellowstone National Park (United States) have been determined. These nonlytic temperate viruses were isolated from hyperthermophilic Sulfolobus hosts, and both viruses share the spindle-shaped morphology characteristic of the Fuselloviridae family. These two genomes, in combination with the previously determined SSV1 genome from Japan and the SSV2 genome from Iceland, have allowed us to carry out a phylogenetic comparison of these geographically distributed hyperthermal viruses. Each virus contains a circular double-stranded DNA genome of approximately 15 kbp with approximately 34 open reading frames (ORFs). These Fusellovirus ORFs show little or no similarity to genes in the public databases. In contrast, 18 ORFs are common to all four isolates and may represent the minimal gene set defining this viral group. In general, ORFs on one half of the genome are colinear and highly conserved, while ORFs on the other half are not. One shared ORF among all four genomes is an integrase of the tyrosine recombinase family. All four viral genomes integrate into their host tRNA genes. The specific tRNA gene used for integration varies, and one genome integrates into multiple loci. Several unique ORFs are found in the genome of each isolate. PMID:14747560

  14. Genomic Analysis and Comparison of Two Gonorrhea Outbreaks

    PubMed Central

    Dordel, Janina; Whittles, Lilith K.; Collins, Caitlin; Bilek, Nicole; Bishop, Cynthia J.; White, Peter J.; Aanensen, David M.; Bentley, Stephen D.; Spratt, Brian G.

    2016-01-01

    ABSTRACT Gonorrhea is a sexually transmitted disease causing growing concern, with a substantial increase in reported incidence over the past few years in the United Kingdom and rising levels of resistance to a wide range of antibiotics. Understanding its epidemiology is therefore of major biomedical importance, not only on a population scale but also at the level of direct transmission. However, the molecular typing techniques traditionally used for gonorrhea infections do not provide sufficient resolution to investigate such fine-scale patterns. Here we sequenced the genomes of 237 isolates from two local collections of isolates from Sheffield and London, each of which was resolved into a single type using traditional methods. The two data sets were selected to have different epidemiological properties: the Sheffield data were collected over 6 years from a predominantly heterosexual population, whereas the London data were gathered within half a year and strongly associated with men who have sex with men. Based on contact tracing information between individuals in Sheffield, we found that transmission is associated with a median time to most recent common ancestor of 3.4 months, with an upper bound of 8 months, which we used as a criterion to identify likely transmission links in both data sets. In London, we found that transmission happened predominantly between individuals of similar age, sexual orientation, and location and also with the same HIV serostatus, which may reflect serosorting and associated risk behaviors. Comparison of the two data sets suggests that the London epidemic involved about ten times more cases than the Sheffield outbreak. PMID:27353752

  15. LOLA: enrichment analysis for genomic region sets and regulatory elements in R and Bioconductor

    PubMed Central

    Sheffield, Nathan C.; Bock, Christoph

    2016-01-01

    Summary: Genomic datasets are often interpreted in the context of large-scale reference databases. One approach is to identify significantly overlapping gene sets, which works well for gene-centric data. However, many types of high-throughput data are based on genomic regions. Locus Overlap Analysis (LOLA) provides easy and automatable enrichment analysis for genomic region sets, thus facilitating the interpretation of functional genomics and epigenomics data. Availability and Implementation: R package available in Bioconductor and on the following website: http://lola.computational-epigenetics.org. Contact: nsheffield@cemm.oeaw.ac.at or cbock@cemm.oeaw.ac.at PMID:26508757

  16. Analysis of the ABCA4 genomic locus in Stargardt disease.

    PubMed

    Zernant, Jana; Xie, Yajing Angela; Ayuso, Carmen; Riveiro-Alvarez, Rosa; Lopez-Martinez, Miguel-Angel; Simonelli, Francesca; Testa, Francesco; Gorin, Michael B; Strom, Samuel P; Bertelsen, Mette; Rosenberg, Thomas; Boone, Philip M; Yuan, Bo; Ayyagari, Radha; Nagy, Peter L; Tsang, Stephen H; Gouras, Peter; Collison, Frederick T; Lupski, James R; Fishman, Gerald A; Allikmets, Rando

    2014-12-20

    Autosomal recessive Stargardt disease (STGD1, MIM 248200) is caused by mutations in the ABCA4 gene. Complete sequencing of ABCA4 in STGD patients identifies compound heterozygous or homozygous disease-associated alleles in 65-70% of patients and only one mutation in 15-20% of patients. This study was designed to find the missing disease-causing ABCA4 variation by a combination of next-generation sequencing (NGS), array-Comparative Genome Hybridization (aCGH) screening, familial segregation and in silico analyses. The entire 140 kb ABCA4 genomic locus was sequenced in 114 STGD patients with one known ABCA4 exonic mutation revealing, on average, 200 intronic variants per sample. Filtering of these data resulted in 141 candidates for new mutations. Two variants were detected in four samples, two in three samples, and 20 variants in two samples, the remaining 117 new variants were detected only once. Multimodal analysis suggested 12 new likely pathogenic intronic ABCA4 variants, some of which were specific to (isolated) ethnic groups. No copy number variation (large deletions and insertions) was detected in any patient suggesting that it is a very rare event in the ABCA4 locus. Many variants were excluded since they were not conserved in non-human primates, were frequent in African populations and, therefore, represented ancestral, and not disease-associated, variants. The sequence variability in the ABCA4 locus is extensive and the non-coding sequences do not harbor frequent mutations in STGD patients of European-American descent. Defining disease-associated alleles in the ABCA4 locus requires exceptionally well characterized large cohorts and extensive analyses by a combination of various approaches. PMID:25082829

  17. Decelerated genome evolution in modern vertebrates revealed by analysis of multiple lancelet genomes.

    PubMed

    Huang, Shengfeng; Chen, Zelin; Yan, Xinyu; Yu, Ting; Huang, Guangrui; Yan, Qingyu; Pontarotti, Pierre Antoine; Zhao, Hongchen; Li, Jie; Yang, Ping; Wang, Ruihua; Li, Rui; Tao, Xin; Deng, Ting; Wang, Yiquan; Li, Guang; Zhang, Qiujin; Zhou, Sisi; You, Leiming; Yuan, Shaochun; Fu, Yonggui; Wu, Fenfang; Dong, Meiling; Chen, Shangwu; Xu, Anlong

    2014-01-01

    Vertebrates diverged from other chordates ~500 Myr ago and experienced successful innovations and adaptations, but the genomic basis underlying vertebrate origins are not fully understood. Here we suggest, through comparison with multiple lancelet (amphioxus) genomes, that ancient vertebrates experienced high rates of protein evolution, genome rearrangement and domain shuffling and that these rates greatly slowed down after the divergence of jawed and jawless vertebrates. Compared with lancelets, modern vertebrates retain, at least relatively, less protein diversity, fewer nucleotide polymorphisms, domain combinations and conserved non-coding elements (CNE). Modern vertebrates also lost substantial transposable element (TE) diversity, whereas lancelets preserve high TE diversity that includes even the long-sought RAG transposon. Lancelets also exhibit rapid gene turnover, pervasive transcription, fastest exon shuffling in metazoans and substantial TE methylation not observed in other invertebrates. These new lancelet genome sequences provide new insights into the chordate ancestral state and the vertebrate evolution. PMID:25523484

  18. Decelerated genome evolution in modern vertebrates revealed by analysis of multiple lancelet genomes

    PubMed Central

    Huang, Shengfeng; Chen, Zelin; Yan, Xinyu; Yu, Ting; Huang, Guangrui; Yan, Qingyu; Pontarotti, Pierre Antoine; Zhao, Hongchen; Li, Jie; Yang, Ping; Wang, Ruihua; Li, Rui; Tao, Xin; Deng, Ting; Wang, Yiquan; Li, Guang; Zhang, Qiujin; Zhou, Sisi; You, Leiming; Yuan, Shaochun; Fu, Yonggui; Wu, Fenfang; Dong, Meiling; Chen, Shangwu; Xu, Anlong

    2014-01-01

    Vertebrates diverged from other chordates ~500 Myr ago and experienced successful innovations and adaptations, but the genomic basis underlying vertebrate origins are not fully understood. Here we suggest, through comparison with multiple lancelet (amphioxus) genomes, that ancient vertebrates experienced high rates of protein evolution, genome rearrangement and domain shuffling and that these rates greatly slowed down after the divergence of jawed and jawless vertebrates. Compared with lancelets, modern vertebrates retain, at least relatively, less protein diversity, fewer nucleotide polymorphisms, domain combinations and conserved non-coding elements (CNE). Modern vertebrates also lost substantial transposable element (TE) diversity, whereas lancelets preserve high TE diversity that includes even the long-sought RAG transposon. Lancelets also exhibit rapid gene turnover, pervasive transcription, fastest exon shuffling in metazoans and substantial TE methylation not observed in other invertebrates. These new lancelet genome sequences provide new insights into the chordate ancestral state and the vertebrate evolution. PMID:25523484

  19. Genome-wide survey and analysis of microsatellites in the Pacific oyster genome: abundance, distribution, and potential for marker development

    NASA Astrophysics Data System (ADS)

    Wang, Jiafeng; Qi, Haigang; Li, Li; Zhang, Guofan

    2014-01-01

    Microsatellites are a ubiquitous component of the eukaryote genome and constitute one of the most popular sources of molecular markers for genetic studies. However, no data are currently available regarding microsatellites across the entire genome in oysters, despite their importance to the aquaculture industry. We present the first genome-wide investigation of microsatellites in the Pacific oyster Crassostrea gigas by analysis of the complete genome, resequencing, and expression data. The Pacific oyster genome is rich in microsatellites. A total of 604 653 repeats were identified, in average of one locus per 815 base pairs (bp). A total of 12 836 genes had coding repeats, and 7 332 were expressed normally, including genes with a wide range of molecular functions. Compared with 20 different species of animals, microsatellites in the oyster genome typically exhibited 1) an intermediate overall frequency; 2) relatively uniform contents of (A)n and (C)n repeats and abundant long (C)n repeats (≥24 bp); 3) large average length of (AG)n repeats; and 4) scarcity of trinucleotide repeats. The microsatellite-flanking regions exhibited a high degree of polymorphism with a heterozygosity rate of around 2.0%, but there was no correlation between heterozygosity and microsatellite abundance. A total of 19 462 polymorphic microsatellites were discovered, and dinucleotide repeats were the most active, with over 26% of loci found to harbor allelic variations. In all, 7 451 loci with high potential for marker development were identified. Better knowledge of the microsatellites in the oyster genome will provide information for the future design of a wide range of molecular markers and contribute to further advancements in the field of oyster genetics, particularly for molecular-based selection and breeding.

  20. On the analysis of large-scale genomic structures.

    PubMed

    Oiwa, Nestor Norio; Goldman, Carla

    2005-01-01

    We apply methods from statistical physics (histograms, correlation functions, fractal dimensions, and singularity spectra) to characterize large-scale structure of the distribution of nucleotides along genomic sequences. We discuss the role of the extension of noncoding segments ("junk DNA") for the genomic organization, and the connection between the coding segment distribution and the high-eukaryotic chromatin condensation. The following sequences taken from GenBank were analyzed: complete genome of Xanthomonas campestri, complete genome of yeast, chromosome V of Caenorhabditis elegans, and human chromosome XVII around gene BRCA1. The results are compared with the random and periodic sequences and those generated by simple and generalized fractal Cantor sets. PMID:15858230

  1. CoCoNUT: an efficient system for the comparison and analysis of genomes

    PubMed Central

    2008-01-01

    Background Comparative genomics is the analysis and comparison of genomes from different species. This area of research is driven by the large number of sequenced genomes and heavily relies on efficient algorithms and software to perform pairwise and multiple genome comparisons. Results Most of the software tools available are tailored for one specific task. In contrast, we have developed a novel system CoCoNUT (Computational Comparative geNomics Utility Toolkit) that allows solving several different tasks in a unified framework: (1) finding regions of high similarity among multiple genomic sequences and aligning them, (2) comparing two draft or multi-chromosomal genomes, (3) locating large segmental duplications in large genomic sequences, and (4) mapping cDNA/EST to genomic sequences. Conclusion CoCoNUT is competitive with other software tools w.r.t. the quality of the results. The use of state of the art algorithms and data structures allows CoCoNUT to solve comparative genomics tasks more efficiently than previous tools. With the improved user interface (including an interactive visualization component), CoCoNUT provides a unified, versatile, and easy-to-use software tool for large scale studies in comparative genomics. PMID:19014477

  2. Genome wide survey and analysis of small repetitive sequences in caulimoviruses.

    PubMed

    George, Biju; Gnanasekaran, Prabu; Jain, S K; Chakraborty, Supriya

    2014-10-01

    Microsatellites are known to exhibit ubiquitous presence across all kingdoms of life including viruses. Members of the Caulimoviridae family severely affect growth of vegetable and fruit plants and reduce economic yield in diverse cropping systems worldwide. Here, we analyzed the nature and distribution of both simple and complex microsatellites present in complete genome of 44 species of Caulimoviridae. Our results showed, in all analyzed genomes, genome size and GC content had a weak influence on number, relative abundance and relative density of microsatellites, respectively. For each genome, mono- and dinucleotide repeats were found to be highly predominant and are overrepresented in genome of majority of caulimoviruses. AT/TA and GAA/AAG/AGA was the most abundant di- and trinucleotide repeat motif, respectively. Repeats larger than trinucleotide were rarely found in these genomes. Comparative study of occurrence, abundance and density of microsatellite among available RNA and DNA viral genomes indicated that simple repeats were least abundant in genomes of caulimoviruses. Polymorphic repeats even though rare were observed in the large intergenic region of the genome, indicating strand slippage and/or unequal recombination processes do occur in caulimoviruses. To our knowledge, this is the first analysis of microsatellites occurring in any dsDNA viral genome. Characterization of such variations in repeat sequences would be important in deciphering the origin, mutational processes, and role of repeat sequences in viral genomes. PMID:24999243

  3. Whole Genome Analysis Informs Breast Cancer Response to Aromatase Inhibition

    PubMed Central

    Shen, Dong; Luo, Jingqin; Suman, Vera J.; Wallis, John W.; Van Tine, Brian A.; Hoog, Jeremy; Goiffon, Reece J.; Goldstein, Theodore C.; Ng, Sam; Lin, Li; Crowder, Robert; Snider, Jacqueline; Ballman, Karla; Weber, Jason; Chen, Ken; Koboldt, Daniel C.; Kandoth, Cyriac; Schierding, William S.; McMichael, Joshua F.; Miller, Christopher A.; Lu, Charles; Harris, Christopher C.; McLellan, Michael D.; Wendl, Michael C.; DeSchryver, Katherine; Allred, D. Craig; Esserman, Laura; Unzeitig, Gary; Margenthaler, Julie; Babiera, G.V.; Marcom, P. Kelly; Guenther, J.M.; Leitch, Marilyn; Hunt, Kelly; Olson, John; Tao, Yu; Maher, Christopher A.; Fulton, Lucinda L.; Fulton, Robert S.; Harrison, Michelle; Oberkfell, Ben; Du, Feiyu; Demeter, Ryan; Vickery, Tammi L.; Elhammali, Adnan; Piwnica-Worms, Helen; McDonald, Sandra; Watson, Mark; Dooling, David J.; Ota, David; Chang, Li-Wei; Bose, Ron; Ley, Timothy J.; Piwnica-Worms, David; Stuart, Joshua M.; Wilson, Richard K.

    2012-01-01

    Summary To correlate the variable clinical features of estrogen receptor positive (ER+) breast cancer with somatic alterations, we studied pre-treatment tumour biopsies accrued from patients in a study of neoadjuvant aromatase inhibitor (AI) therapy by massively parallel sequencing and analysis. Eighteen significantly mutated genes were identified, including five genes (RUNX1, CBFB, MYH9, MLL3 and SF3B1) previously linked to hematopoietic disorders. Mutant MAP3K1 was associated with Luminal A status, low grade histology and low proliferation rates whereas mutant TP53 associated with the opposite pattern. Moreover, mutant GATA3 correlated with suppression of proliferation upon AI treatment. Pathway analysis demonstrated mutations in MAP2K4, a MAP3K1 substrate, produced similar perturbations as MAP3K1 loss. Distinct phenotypes in ER+ breast cancer are associated with specific patterns of somatic mutations that map into cellular pathways linked to tumor biology but most recurrent mutations are relatively infrequent. Prospective clinical trials based on these findings will require comprehensive genome sequencing. PMID:22722193

  4. Genomic analysis of regulatory network dynamics reveals large topological changes

    NASA Astrophysics Data System (ADS)

    Luscombe, Nicholas M.; Madan Babu, M.; Yu, Haiyuan; Snyder, Michael; Teichmann, Sarah A.; Gerstein, Mark

    2004-09-01

    Network analysis has been applied widely, providing a unifying language to describe disparate systems ranging from social interactions to power grids. It has recently been used in molecular biology, but so far the resulting networks have only been analysed statically. Here we present the dynamics of a biological network on a genomic scale, by integrating transcriptional regulatory information and gene-expression data for multiple conditions in Saccharomyces cerevisiae. We develop an approach for the statistical analysis of network dynamics, called SANDY, combining well-known global topological measures, local motifs and newly derived statistics. We uncover large changes in underlying network architecture that are unexpected given current viewpoints and random simulations. In response to diverse stimuli, transcription factors alter their interactions to varying degrees, thereby rewiring the network. A few transcription factors serve as permanent hubs, but most act transiently only during certain conditions. By studying sub-network structures, we show that environmental responses facilitate fast signal propagation (for example, with short regulatory cascades), whereas the cell cycle and sporulation direct temporal progression through multiple stages (for example, with highly inter-connected transcription factors). Indeed, to drive the latter processes forward, phase-specific transcription factors inter-regulate serially, and ubiquitously active transcription factors layer above them in a two-tiered hierarchy. We anticipate that many of the concepts presented here-particularly the large-scale topological changes and hub transience-will apply to other biological networks, including complex sub-systems in higher eukaryotes.

  5. Sequence and comparative genomic analysis of actin-related proteins.

    PubMed

    Muller, Jean; Oma, Yukako; Vallar, Laurent; Friederich, Evelyne; Poch, Olivier; Winsor, Barbara

    2005-12-01

    Actin-related proteins (ARPs) are key players in cytoskeleton activities and nuclear functions. Two complexes, ARP2/3 and ARP1/11, also known as dynactin, are implicated in actin dynamics and in microtubule-based trafficking, respectively. ARP4 to ARP9 are components of many chromatin-modulating complexes. Conventional actins and ARPs codefine a large family of homologous proteins, the actin superfamily, with a tertiary structure known as the actin fold. Because ARPs and actin share high sequence conservation, clear family definition requires distinct features to easily and systematically identify each subfamily. In this study we performed an in depth sequence and comparative genomic analysis of ARP subfamilies. A high-quality multiple alignment of approximately 700 complete protein sequences homologous to actin, including 148 ARP sequences, allowed us to extend the ARP classification to new organisms. Sequence alignments revealed conserved residues, motifs, and inserted sequence signatures to define each ARP subfamily. These discriminative characteristics allowed us to develop ARPAnno (http://bips.u-strasbg.fr/ARPAnno), a new web server dedicated to the annotation of ARP sequences. Analyses of sequence conservation among actins and ARPs highlight part of the actin fold and suggest interactions between ARPs and actin-binding proteins. Finally, analysis of ARP distribution across eukaryotic phyla emphasizes the central importance of nuclear ARPs, particularly the multifunctional ARP4. PMID:16195354

  6. Genomic Analysis of ATP Efflux in Saccharomyces cerevisiae

    PubMed Central

    Peters, Theodore W.; Miller, Aaron W.; Tourette, Cendrine; Agren, Hannah; Hubbard, Alan; Hughes, Robert E.

    2015-01-01

    Adenosine triphosphate (ATP) plays an important role as a primary molecule for the transfer of chemical energy to drive biological processes. ATP also functions as an extracellular signaling molecule in a diverse array of eukaryotic taxa in a conserved process known as purinergic signaling. Given the important roles of extracellular ATP in cell signaling, we sought to comprehensively elucidate the pathways and mechanisms governing ATP efflux from eukaryotic cells. Here, we present results of a genomic analysis of ATP efflux from Saccharomyces cerevisiae by measuring extracellular ATP levels in cultures of 4609 deletion mutants. This screen revealed key cellular processes that regulate extracellular ATP levels, including mitochondrial translation and vesicle sorting in the late endosome, indicating that ATP production and transport through vesicles are required for efflux. We also observed evidence for altered ATP efflux in strains deleted for genes involved in amino acid signaling, and mitochondrial retrograde signaling. Based on these results, we propose a model in which the retrograde signaling pathway potentiates amino acid signaling to promote mitochondrial respiration. This study advances our understanding of the mechanism of ATP secretion in eukaryotes and implicates TOR complex 1 (TORC1) and nutrient signaling pathways in the regulation of ATP efflux. These results will facilitate analysis of ATP efflux mechanisms in higher eukaryotes. PMID:26585826

  7. Improved Statistics for Genome-Wide Interaction Analysis

    PubMed Central

    Ueki, Masao; Cordell, Heather J.

    2012-01-01

    Recently, Wu and colleagues [1] proposed two novel statistics for genome-wide interaction analysis using case/control or case-only data. In computer simulations, their proposed case/control statistic outperformed competing approaches, including the fast-epistasis option in PLINK and logistic regression analysis under the correct model; however, reasons for its superior performance were not fully explored. Here we investigate the theoretical properties and performance of Wu et al.'s proposed statistics and explain why, in some circumstances, they outperform competing approaches. Unfortunately, we find minor errors in the formulae for their statistics, resulting in tests that have higher than nominal type 1 error. We also find minor errors in PLINK's fast-epistasis and case-only statistics, although theory and simulations suggest that these errors have only negligible effect on type 1 error. We propose adjusted versions of all four statistics that, both theoretically and in computer simulations, maintain correct type 1 error rates under the null hypothesis. We also investigate statistics based on correlation coefficients that maintain similar control of type 1 error. Although designed to test specifically for interaction, we show that some of these previously-proposed statistics can, in fact, be sensitive to main effects at one or both loci, particularly in the presence of linkage disequilibrium. We propose two new “joint effects” statistics that, provided the disease is rare, are sensitive only to genuine interaction effects. In computer simulations we find, in most situations considered, that highest power is achieved by analysis under the correct genetic model. Such an analysis is unachievable in practice, as we do not know this model. However, generally high power over a wide range of scenarios is exhibited by our joint effects and adjusted Wu statistics. We recommend use of these alternative or adjusted statistics and urge caution when using Wu et al

  8. High-resolution abundance analysis of HD 140283

    NASA Astrophysics Data System (ADS)

    Siqueira-Mello, C.; Andrievsky, S. M.; Barbuy, B.; Spite, M.; Spite, F.; Korotin, S. A.

    2015-12-01

    Context. HD 140283 is a reference subgiant that is metal poor and confirmed to be a very old star. The element abundances of this type of old star can constrain the nature and nucleosynthesis processes that occurred in its (even older) progenitors. The present study may shed light on nucleosynthesis processes yielding heavy elements early in the Galaxy. Aims: A detailed analysis of a high-quality spectrum is carried out, with the intent of providing a reference on stellar lines and abundances of a very old, metal-poor subgiant. We aim to derive abundances from most available and measurable spectral lines. Methods: The analysis is carried out using high-resolution (R = 81 000) and high signal-to-noise ratio (800 analysis in non-LTE (NLTE) is based on the MULTI code. We present LTE abundances for 26 elements, and NLTE calculations for the species C i, O i, Na i, Mg i, Al i, K i, Ca i, Sr ii, and Ba ii lines. Results: The abundance analysis provided an extensive line list suitable for metal-poor subgiant stars. The results for Li, CNO, α-, and iron peak elements are in good agreement with literature. The newly NLTE Ba abundance, along with a NLTE Eu correction and a 3D Ba correction from literature, leads to [Eu/Ba] = + 0.59 ± 0.18. This result confirms a dominant r-process contribution, possibly together with a very small contribution from the main s-process, to the neutron-capture elements in HD 140283. Overabundances of the lighter heavy elements and the high abundances derived for Ba, La, and Ce favour the operation of the weak r-process in HD 140283

  9. Cinteny: flexible analysis and visualization of synteny and genome rearrangements in multiple organisms

    PubMed Central

    Sinha, Amit U; Meller, Jaroslaw

    2007-01-01

    Background Identifying syntenic regions, i.e., blocks of genes or other markers with evolutionary conserved order, and quantifying evolutionary relatedness between genomes in terms of chromosomal rearrangements is one of the central goals in comparative genomics. However, the analysis of synteny and the resulting assessment of genome rearrangements are sensitive to the choice of a number of arbitrary parameters that affect the detection of synteny blocks. In particular, the choice of a set of markers and the effect of different aggregation strategies, which enable coarse graining of synteny blocks and exclusion of micro-rearrangements, need to be assessed. Therefore, existing tools and resources that facilitate identification, visualization and analysis of synteny need to be further improved to provide a flexible platform for such analysis, especially in the context of multiple genomes. Results We present a new tool, Cinteny, for fast identification and analysis of synteny with different sets of markers and various levels of coarse graining of syntenic blocks. Using Hannenhalli-Pevzner approach and its extensions, Cinteny also enables interactive determination of evolutionary relationships between genomes in terms of the number of rearrangements (the reversal distance). In particular, Cinteny provides: i) integration of synteny browsing with assessment of evolutionary distances for multiple genomes; ii) flexibility to adjust the parameters and re-compute the results on-the-fly; iii) ability to work with user provided data, such as orthologous genes, sequence tags or other conserved markers. In addition, Cinteny provides many annotated mammalian, invertebrate and fungal genomes that are pre-loaded and available for analysis at . Conclusion Cinteny allows one to automatically compare multiple genomes and perform sensitivity analysis for synteny block detection and for the subsequent computation of reversal distances. Cinteny can also be used to interactively browse

  10. Analysis of the Complete Chloroplast Genome of a Medicinal Plant, Dianthus superbus var. longicalyncinus, from a Comparative Genomics Perspective

    PubMed Central

    Raman, Gurusamy; Park, SeonJoo

    2015-01-01

    Dianthus superbus var. longicalycinus is an economically important traditional Chinese medicinal plant that is also used for ornamental purposes. In this study, D. superbus was compared to its closely related family of Caryophyllaceae chloroplast (cp) genomes such as Lychnis chalcedonica and Spinacia oleracea. D. superbus had the longest large single copy (LSC) region (82,805 bp), with some variations in the inverted repeat region A (IRA)/LSC regions. The IRs underwent both expansion and constriction during evolution of the Caryophyllaceae family; however, intense variations were not identified. The pseudogene ribosomal protein subunit S19 (rps19) was identified at the IRA/LSC junction, but was not present in the cp genome of other Caryophyllaceae family members. The translation initiation factor IF-1 (infA) and ribosomal protein subunit L23 (rpl23) genes were absent from the Dianthus cp genome. When the cp genome of Dianthus was compared with 31 other angiosperm lineages, the infA gene was found to have been lost in most members of rosids, solanales of asterids and Lychnis of Caryophyllales, whereas rpl23 gene loss or pseudogization had occurred exclusively in Caryophyllales. Nevertheless, the cp genome of Dianthus and Spinacia has two introns in the proteolytic subunit of ATP-dependent protease (clpP) gene, but Lychnis has lost introns from the clpP gene. Furthermore, phylogenetic analysis of individual protein-coding genes infA and rpl23 revealed that gene loss or pseudogenization occurred independently in the cp genome of Dianthus. Molecular phylogenetic analysis also demonstrated a sister relationship between Dianthus and Lychnis based on 78 protein-coding sequences. The results presented herein will contribute to studies of the evolution, molecular biology and genetic engineering of the medicinal and ornamental plant, D. superbus var. longicalycinus. PMID:26513163

  11. Comparative Genome Analysis Provides Insights into the Pathogenicity of Flavobacterium psychrophilum.

    PubMed

    Castillo, Daniel; Christiansen, Rói Hammershaimb; Dalsgaard, Inger; Madsen, Lone; Espejo, Romilio; Middelboe, Mathias

    2016-01-01

    Flavobacterium psychrophilum is a fish pathogen in salmonid aquaculture worldwide that causes cold water disease (CWD) and rainbow trout fry syndrome (RTFS). Comparative genome analyses of 11 F. psychrophilum isolates representing temporally and geographically distant populations were used to describe the F. psychrophilum pan-genome and to examine virulence factors, prophages, CRISPR arrays, and genomic islands present in the genomes. Analysis of the genomic DNA sequences were complemented with selected phenotypic characteristics of the strains. The pan genome analysis showed that F. psychrophilum could hold at least 3373 genes, while the core genome contained 1743 genes. On average, 67 new genes were detected for every new genome added to the analysis, indicating that F. psychrophilum possesses an open pan genome. The putative virulence factors were equally distributed among isolates, independent of geographic location, year of isolation and source of isolates. Only one prophage-related sequence was found which corresponded to the previously described prophage 6H, and appeared in 5 out of 11 isolates. CRISPR array analysis revealed two different loci with dissimilar spacer content, which only matched one sequence in the database, the temperate bacteriophage 6H. Genomic Islands (GIs) were identified in F. psychrophilum isolates 950106-1/1 and CSF 259-93, associated with toxins and antibiotic resistance. Finally, phenotypic characterization revealed a high degree of similarity among the strains with respect to biofilm formation and secretion of extracellular enzymes. Global scale dispersion of virulence factors in the genomes and the abilities for biofilm formation, hemolytic activity and secretion of extracellular enzymes among the strains suggested that F. psychrophilum isolates have a similar mode of action on adhesion, colonization and destruction of fish tissues across large spatial and temporal scales of occurrence. Overall, the genomic characterization and

  12. Fastbreak: a tool for analysis and visualization of structural variations in genomic data

    PubMed Central

    2012-01-01

    Genomic studies are now being undertaken on thousands of samples requiring new computational tools that can rapidly analyze data to identify clinically important features. Inferring structural variations in cancer genomes from mate-paired reads is a combinatorially difficult problem. We introduce Fastbreak, a fast and scalable toolkit that enables the analysis and visualization of large amounts of data from projects such as The Cancer Genome Atlas. PMID:23046488

  13. Comparative Genome Analysis Provides Insights into the Pathogenicity of Flavobacterium psychrophilum

    PubMed Central

    Castillo, Daniel; Christiansen, Rói Hammershaimb; Dalsgaard, Inger; Madsen, Lone; Espejo, Romilio

    2016-01-01

    Flavobacterium psychrophilum is a fish pathogen in salmonid aquaculture worldwide that causes cold water disease (CWD) and rainbow trout fry syndrome (RTFS). Comparative genome analyses of 11 F. psychrophilum isolates representing temporally and geographically distant populations were used to describe the F. psychrophilum pan-genome and to examine virulence factors, prophages, CRISPR arrays, and genomic islands present in the genomes. Analysis of the genomic DNA sequences were complemented with selected phenotypic characteristics of the strains. The pan genome analysis showed that F. psychrophilum could hold at least 3373 genes, while the core genome contained 1743 genes. On average, 67 new genes were detected for every new genome added to the analysis, indicating that F. psychrophilum possesses an open pan genome. The putative virulence factors were equally distributed among isolates, independent of geographic location, year of isolation and source of isolates. Only one prophage-related sequence was found which corresponded to the previously described prophage 6H, and appeared in 5 out of 11 isolates. CRISPR array analysis revealed two different loci with dissimilar spacer content, which only matched one sequence in the database, the temperate bacteriophage 6H. Genomic Islands (GIs) were identified in F. psychrophilum isolates 950106-1/1 and CSF 259–93, associated with toxins and antibiotic resistance. Finally, phenotypic characterization revealed a high degree of similarity among the strains with respect to biofilm formation and secretion of extracellular enzymes. Global scale dispersion of virulence factors in the genomes and the abilities for biofilm formation, hemolytic activity and secretion of extracellular enzymes among the strains suggested that F. psychrophilum isolates have a similar mode of action on adhesion, colonization and destruction of fish tissues across large spatial and temporal scales of occurrence. Overall, the genomic characterization and

  14. High-resolution array CGH identifies novel regions of genomic alteration in intermediate-risk prostate cancer.

    PubMed

    Ishkanian, Adrian S; Mallof, Chad A; Ho, James; Meng, Alice; Albert, Monique; Syed, Amena; van der Kwast, Theodorus; Milosevic, Michael; Yoshimoto, Maisa; Squire, Jeremy A; Lam, Wan L; Bristow, Robert G

    2009-07-01

    Approximately one-third of prostate cancer patients present with intermediate risk disease. Interestingly, while this risk group is clinically well defined, it demonstrates the most significant heterogeneity in PSA-based biochemical outcome. Further, the majority of candidate genes associated with prostate cancer progression have been identified using cell lines, xenograft models, and high-risk androgen-independent or metastatic patient samples. We used a global high-resolution array comparative genomic hybridization (CGH) assay to characterize copy number alterations (CNAs) in intermediate risk prostate cancer. Herein, we show this risk group contains a number of alterations previously associated with high-risk disease: (1) deletions at 21q22.2 (TMPRSS2:ERG), 16q22-24 (containing CDH1), 13q14.2 (RB1), 10q23.31 (PTEN), 8p21 (NKX3.1); and, (2) amplification at 8q21.3-24.3 (containing c-MYC). In addition, we identified six novel microdeletions at high frequency: 1q42.12-q42.3 (33.3%), 5q12.3-13.3 (21%), 20q13.32-13.33 (29.2%), 22q11.21 (25%), 22q12.1 (29.2%), and 22q13.31 (33.3%). Further, we show there is little concordance between CNAs from these clinical samples and those found in commonly used prostate cancer cell models. These unexpected findings suggest that the intermediate-risk category is a crucial cohort warranting further study to determine if a unique molecular fingerprint can predict aggressive versus indolent phenotypes. PMID:19350549

  15. Complete Genome Sequence of Borrelia afzelii K78 and Comparative Genome Analysis

    PubMed Central

    Schüler, Wolfgang; Bunikis, Ignas; Weber-Lehman, Jacqueline; Comstedt, Pär; Kutschan-Bunikis, Sabrina; Stanek, Gerold; Huber, Jutta; Meinke, Andreas; Bergström, Sven; Lundberg, Urban

    2015-01-01

    The main Borrelia species causing Lyme borreliosis in Europe and Asia are Borrelia afzelii, B. garinii, B. burgdorferi and B. bavariensis. This is in contrast to the United States, where infections are exclusively caused by B. burgdorferi. Until to date the genome sequences of four B. afzelii strains, of which only two include the numerous plasmids, are available. In order to further assess the genetic diversity of B. afzelii, the most common species in Europe, responsible for the large variety of clinical manifestations of Lyme borreliosis, we have determined the full genome sequence of the B. afzelii strain K78, a clinical isolate from Austria. The K78 genome contains a linear chromosome (905,949 bp) and 13 plasmids (8 linear and 5 circular) together presenting 1,309 open reading frames of which 496 are located on plasmids. With the exception of lp28-8, all linear replicons in their full length including their telomeres have been sequenced. The comparison with the genomes of the four other B. afzelii strains, ACA-1, PKo, HLJ01 and Tom3107, as well as the one of B. burgdorferi strain B31, confirmed a high degree of conservation within the linear chromosome of B. afzelii, whereas plasmid encoded genes showed a much larger diversity. Since some plasmids present in B. burgdorferi are missing in the B. afzelii genomes, the corresponding virulence factors of B. burgdorferi are found in B. afzelii on other unrelated plasmids. In addition, we have identified a species specific region in the circular plasmid, cp26, which could be used for species determination. Different non-coding RNAs have been located on the B. afzelii K78 genome, which have not previously been annotated in any of the published Borrelia genomes. PMID:25798594

  16. Five Complete Chloroplast Genome Sequences from Diospyros: Genome Organization and Comparative Analysis

    PubMed Central

    Hu, Jingjing; Liang, Yuqin; Liang, Jinjun; Wuyun, Tana; Tan, Xiaofeng

    2016-01-01

    Diospyros is the largest genus in Ebenaceae, comprising more than 500 species with remarkable economic value, especially Diospyros kaki Thunb., which has traditionally been an important food resource in China, Korea, and Japan. Complete chloroplast (cp) genomes from D. kaki, D. lotus L., D. oleifera Cheng., D. glaucifolia Metc., and Diospyros ‘Jinzaoshi’ were sequenced using Illumina sequencing technology. This is the first cp genome reported in Ebenaceae. The cp genome sequences of Diospyros ranged from 157,300 to 157,784 bp in length, presenting a typical quadripartite structure with two inverted repeats each separated by one large and one small single-copy region. For each cp genome, 134 genes were annotated, including 80 protein-coding, 31 tRNA, and 4 rRNA unique genes. In all, 179 repeats and 283 single sequence repeats were identified. Four hypervariable regions, namely, intergenic region of trnQ_rps16, trnV_ndhC, and psbD_trnT, and intron of ndhA, were identified in the Diospyros genomes. Phylogenetic analyses based on the whole cp genome, protein-coding, and intergenic and intron sequences indicated that D. oleifera is closely related to D. kaki and could be used as a model plant for future research on D. kaki; to our knowledge, this is proposed for the first time. Further, these analyses together with two large deletions (301 and 140 bp) in the cp genome of D. ‘Jinzaoshi’, support its placement as a new species in Diospyros. Both maximum parsimony and likelihood analyses for 19 taxa indicated the basal position of Ericales in asterids and suggested that Ebenaceae is monophyletic in Ericales. PMID:27442423

  17. Five Complete Chloroplast Genome Sequences from Diospyros: Genome Organization and Comparative Analysis.

    PubMed

    Fu, Jianmin; Liu, Huimin; Hu, Jingjing; Liang, Yuqin; Liang, Jinjun; Wuyun, Tana; Tan, Xiaofeng

    2016-01-01

    Diospyros is the largest genus in Ebenaceae, comprising more than 500 species with remarkable economic value, especially Diospyros kaki Thunb., which has traditionally been an important food resource in China, Korea, and Japan. Complete chloroplast (cp) genomes from D. kaki, D. lotus L., D. oleifera Cheng., D. glaucifolia Metc., and Diospyros 'Jinzaoshi' were sequenced using Illumina sequencing technology. This is the first cp genome reported in Ebenaceae. The cp genome sequences of Diospyros ranged from 157,300 to 157,784 bp in length, presenting a typical quadripartite structure with two inverted repeats each separated by one large and one small single-copy region. For each cp genome, 134 genes were annotated, including 80 protein-coding, 31 tRNA, and 4 rRNA unique genes. In all, 179 repeats and 283 single sequence repeats were identified. Four hypervariable regions, namely, intergenic region of trnQ_rps16, trnV_ndhC, and psbD_trnT, and intron of ndhA, were identified in the Diospyros genomes. Phylogenetic analyses based on the whole cp genome, protein-coding, and intergenic and intron sequences indicated that D. oleifera is closely related to D. kaki and could be used as a model plant for future research on D. kaki; to our knowledge, this is proposed for the first time. Further, these analyses together with two large deletions (301 and 140 bp) in the cp genome of D. 'Jinzaoshi', support its placement as a new species in Diospyros. Both maximum parsimony and likelihood analyses for 19 taxa indicated the basal position of Ericales in asterids and suggested that Ebenaceae is monophyletic in Ericales. PMID:27442423

  18. High resolution melting (HRM) analysis of DNA--its role and potential in food analysis.

    PubMed

    Druml, Barbara; Cichna-Markl, Margit

    2014-09-01

    DNA based methods play an increasing role in food safety control and food adulteration detection. Recent papers show that high resolution melting (HRM) analysis is an interesting approach. It involves amplification of the target of interest in the presence of a saturation dye by the polymerase chain reaction (PCR) and subsequent melting of the amplicons by gradually increasing the temperature. Since the melting profile depends on the GC content, length, sequence and strand complementarity of the product, HRM analysis is highly suitable for the detection of single-base variants and small insertions or deletions. The review gives an introduction into HRM analysis, covers important aspects in the development of an HRM analysis method and describes how HRM data are analysed and interpreted. Then we discuss the potential of HRM analysis based methods in food analysis, i.e. for the identification of closely related species and cultivars and the identification of pathogenic microorganisms. PMID:24731338

  19. From pixels to picograms: a beginners' guide to genome quantification by Feulgen image analysis densitometry.

    PubMed

    Hardie, David C; Gregory, T Ryan; Hebert, Paul D N

    2002-06-01

    The study of genome size variation is important from a number of practical and theoretical perspectives. For example, the long-standing "C-value enigma" relating to the more than 200,000-fold range in eukaryotic genome sizes is best studied from a broad comparative standpoint. Genome size data are also required in detailed analyses of genome structure and evolution. The choice of future genome sequencing projects will be dependent on knowledge regarding the sizes of genomes to be sequenced, and so on. To date, genome size data have been acquired primarily by Feulgen microdensitometry or flow cytometry. Each has several advantages but also important limitations. In this review, we provide a practical guide to the new technique of Feulgen image analysis densitometry. The review is designed for those interested in genome size measurements but not extensively experienced in histochemistry, densitometry, or microscopy. Therefore, relevant historical and technical background information is included. For easy reference, we provide recipes for required reagents, guidelines for cell staining, and a checklist of steps for successful image analysis. We hope that the accuracy, rapidity, and cost-effectiveness of Feulgen image analysis demonstrated here will stimulate further surveys of genome sizes in a variety of taxa. PMID:12019291

  20. Analysis of copy number variation in the bovine genome

    Technology Transfer Automated Retrieval System (TEKTRAN)

    We initiated a systematic study of the copy number variation (CNV) within the Bovine HapMap cattle population using array comparative genomic hybridization (array CGH). Oligonucleotide CGH arrays were designed and fabricated to provide a genome-wide coverage with an average interval of 6 kb using t...

  1. Generation of a genomic tiling array of the human Major Histocompatibility Complex (MHC) and its application for DNA methylation analysis

    PubMed Central

    Tomazou, Eleni M; Rakyan, Vardhman K; Lefebvre, Gregory; Andrews, Robert; Ellis, Peter; Jackson, David K; Langford, Cordelia; Francis, Matthew D; Bäckdahl, Liselotte; Miretti, Marcos; Coggill, Penny; Ottaviani, Diego; Sheer, Denise; Murrell, Adele; Beck, Stephan

    2008-01-01

    Background The major histocompatibility complex (MHC) is essential for human immunity and is highly associated with common diseases, including cancer. While the genetics of the MHC has been studied intensively for many decades, very little is known about the epigenetics of this most polymorphic and disease-associated region of the genome. Methods To facilitate comprehensive epigenetic analyses of this region, we have generated a genomic tiling array of 2 Kb resolution covering the entire 4 Mb MHC region. The array has been designed to be compatible with chromatin immunoprecipitation (ChIP), methylated DNA immunoprecipitation (MeDIP), array comparative genomic hybridization (aCGH) and expression profiling, including of non-coding RNAs. The array comprises 7832 features, consisting of two replicates of both forward and reverse strands of MHC amplicons and appropriate controls. Results Using MeDIP, we demonstrate the application of the MHC array for DNA methylation profiling and the identification of tissue-specific differentially methylated regions (tDMRs). Based on the analysis of two tissues and two cell types, we identified 90 tDMRs within the MHC and describe their characterisation. Conclusion A tiling array covering the MHC region was developed and validated. Its successful application for DNA methylation profiling indicates that this array represents a useful tool for molecular analyses of the MHC in the context of medical genomics. PMID:18513384

  2. Comparative genome analysis of Bacillus cereus group genomes withBacillus subtilis

    SciTech Connect

    Anderson, Iain; Sorokin, Alexei; Kapatral, Vinayak; Reznik, Gary; Bhattacharya, Anamitra; Mikhailova, Natalia; Burd, Henry; Joukov, Victor; Kaznadzey, Denis; Walunas, Theresa; D'Souza, Mark; Larsen, Niels; Pusch,Gordon; Liolios, Konstantinos; Grechkin, Yuri; Lapidus, Alla; Goltsman,Eugene; Chu, Lien; Fonstein, Michael; Ehrlich, S. Dusko; Overbeek, Ross; Kyrpides, Nikos; Ivanova, Natalia

    2005-09-14

    Genome features of the Bacillus cereus group genomes (representative strains of Bacillus cereus, Bacillus anthracis and Bacillus thuringiensis sub spp israelensis) were analyzed and compared with the Bacillus subtilis genome. A core set of 1,381 protein families among the four Bacillus genomes, with an additional set of 933 families common to the B. cereus group, was identified. Differences in signal transduction pathways, membrane transporters, cell surface structures, cell wall, and S-layer proteins suggesting differences in their phenotype were identified. The B. cereus group has signal transduction systems including a tyrosine kinase related to two-component system histidine kinases from B. subtilis. A model for regulation of the stress responsive sigma factor sigmaB in the B. cereus group different from the well studied regulation in B. subtilis has been proposed. Despite a high degree of chromosomal synteny among these genomes, significant differences in cell wall and spore coat proteins that contribute to the survival and adaptation in specific hosts has been identified.

  3. Physical mapping of complex genomes by sampled sequencing: A theoretical analysis

    SciTech Connect

    Kupfer, K.; Smith, M.; Quackenbush, J.

    1995-05-01

    A method for high-throughput, high-resolution physical mapping of complex genomes and human chromosomes called Genomic Sequence Sampling (GSS) has recently been proposed. This mapping strategy employs high-density cosmid contig assembly over 200-kb to 1-Mb regions of the target genome coupled with DNA sequencing of the cosmid ends. The relative order and spacing of the sequence fragments is determined from the template contig, resulting in a physical map of 1-to 5-kb resolution that contains a substantial portion of the entire sequence at one-pass accuracy. The purpose of this paper is to determine the theoretical parameters for GSS mapping, to evaluate the effectiveness of the contig-building strategy, and to calculate the expected fraction of the target genome that can be recovered as mapped sequence. A novel aspect of the cosmid fingerprinting and contig-building strategy involves determining the orientation of the genomic inserts relative to the cloning vectors, so that the sampled sequence fragments can be mapped with high resolution. The algorithm is based upon complete restriction enzyme digestion, contig assembly by matching fragments, and end-orientation of individual cosmids by determining the best consistent fit of the labeled cosmid end fragments in the consensus restriction map. 32 refs., 7 figs.

  4. Single cell genome analysis of an uncultured heterotrophic stramenopile

    NASA Astrophysics Data System (ADS)

    Roy, Rajat S.; Price, Dana C.; Schliep, Alexander; Cai, Guohong; Korobeynikov, Anton; Yoon, Hwan Su; Yang, Eun Chan; Bhattacharya, Debashish

    2014-04-01

    A broad swath of eukaryotic microbial biodiversity cannot be cultivated in the lab and is therefore inaccessible to conventional genome-wide comparative methods. One promising approach to study these lineages is single cell genomics (SCG), whereby an individual cell is captured from nature and genome data are produced from the amplified total DNA. Here we tested the efficacy of SCG to generate a draft genome assembly from a single sample, in this case a cell belonging to the broadly distributed MAST-4 uncultured marine stramenopiles. Using de novo gene prediction, we identified 6,996 protein-encoding genes in the MAST-4 genome. This genetic inventory was sufficient to place the cell within the ToL using multigene phylogenetics and provided preliminary insights into the complex evolutionary history of horizontal gene transfer (HGT) in the MAST-4 lineage.

  5. A comprehensive 1000 Genomes-based genome-wide association meta-analysis of coronary artery disease

    PubMed Central

    Kyriakou, Theodosios; Nelson, Christopher P; Hopewell, Jemma C; Webb, Thomas R; Zeng, Lingyao; Dehghan, Abbas; Alver, Maris; Armasu, Sebastian M; Auro, Kirsi; Bjonnes, Andrew; Chasman, Daniel I; Chen, Shufeng; Ford, Ian; Franceschini, Nora; Gieger, Christian; Grace, Christopher; Gustafsson, Stefan; Huang, Jie; Hwang, Shih-Jen; Kim, Yun Kyoung; Kleber, Marcus E; Lau, King Wai; Lu, Xiangfeng; Lu, Yingchang; Lyytikäinen, Leo-Pekka; Mihailov, Evelin; Morrison, Alanna C; Pervjakova, Natalia; Qu, Liming; Rose, Lynda M; Salfati, Elias; Saxena, Richa; Scholz, Markus; Smith, Albert V; Tikkanen, Emmi; Uitterlinden, Andre; Yang, Xueli; Zhang, Weihua; Zhao, Wei; de Andrade, Mariza; de Vries, Paul S; van Zuydam, Natalie R; Anand, Sonia S; Bertram, Lars; Beutner, Frank; Dedoussis, George; Frossard, Philippe; Gauguier, Dominique; Goodall, Alison H; Gottesman, Omri; Haber, Marc; Han, Bok-Ghee; Huang, Jianfeng; Jalilzadeh, Shapour; Kessler, Thorsten; König, Inke R; Lannfelt, Lars; Lieb, Wolfgang; Lind, Lars; Lindgren, Cecilia M; Lokki, Marja-Liisa; Magnusson, Patrik K; Mallick, Nadeem H; Mehra, Narinder; Meitinger, Thomas; Memon, Fazal-ur-Rehman; Morris, Andrew P; Nieminen, Markku S; Pedersen, Nancy L; Peters, Annette; Rallidis, Loukianos S; Rasheed, Asif; Samuel, Maria; Shah, Svati H; Sinisalo, Juha; Stirrups, Kathleen E; Trompet, Stella; Wang, Laiyuan; Zaman, Khan S; Ardissino, Diego; Boerwinkle, Eric; Borecki, Ingrid B; Bottinger, Erwin P; Buring, Julie E; Chambers, John C; Collins, Rory; Cupples, L Adrienne; Danesh, John; Demuth, Ilja; Elosua, Roberto; Epstein, Stephen E; Esko, Tõnu; Feitosa, Mary F; Franco, Oscar H; Franzosi, Maria Grazia; Granger, Christopher B; Gu, Dongfeng; Gudnason, Vilmundur; Hall, Alistair S; Hamsten, Anders; Harris, Tamara B; Hazen, Stanley L; Hengstenberg, Christian; Hofman, Albert; Ingelsson, Erik; Iribarren, Carlos; Jukema, J Wouter; Karhunen, Pekka J; Kim, Bong-Jo; Kooner, Jaspal S; Kullo, Iftikhar J; Lehtimäki, Terho; Loos, Ruth J F; Melander, Olle; Metspalu, Andres; März, Winfried; Palmer, Colin N; Perola, Markus; Quertermous, Thomas; Rader, Daniel J; Ridker, Paul M; Ripatti, Samuli; Roberts, Robert; Salomaa, Veikko; Sanghera, Dharambir K; Schwartz, Stephen M; Seedorf, Udo; Stewart, Alexandre F; Stott, David J; Thiery, Joachim; Zalloua, Pierre A; O’Donnell, Christopher J; Reilly, Muredach P; Assimes, Themistocles L; Thompson, John R; Erdmann, Jeanette; Clarke, Robert; Watkins, Hugh; Kathiresan, Sekar; McPherson, Ruth; Deloukas, Panos; Schunkert, Heribert; Samani, Nilesh J; Farrall, Martin

    2015-01-01

    Existing knowledge of genetic variants affecting risk of coronary artery disease (CAD) is largely based on genome-wide association studies (GWAS) analysis of common SNPs. Leveraging phased haplotypes from the 1000 Genomes Project, we report a GWAS meta-analysis of 185 thousand CAD cases and controls, interrogating 6.7 million common (MAF>0.05) as well as 2.7 million low frequency (0.005analysis provides a comprehensive survey of the fine genetic architecture of CAD showing that genetic susceptibility to this common disease is largely determined by common SNPs of small effect size. PMID:26343387

  6. HIGH RESOLUTION FOURIER ANALYSIS WITH AUTO-REGRESSIVE LINEAR PREDICTION

    SciTech Connect

    Barton, J.; Shirley, D.A.

    1984-04-01

    Auto-regressive linear prediction is adapted to double the resolution of Angle-Resolved Photoemission Extended Fine Structure (ARPEFS) Fourier transforms. Even with the optimal taper (weighting function), the commonly used taper-and-transform Fourier method has limited resolution: it assumes the signal is zero beyond the limits of the measurement. By seeking the Fourier spectrum of an infinite extent oscillation consistent with the measurements but otherwise having maximum entropy, the errors caused by finite data range can be reduced. Our procedure developed to implement this concept adapts auto-regressive linear prediction to extrapolate the signal in an effective and controllable manner. Difficulties encountered when processing actual ARPEFS data are discussed. A key feature of this approach is the ability to convert improved measurements (signal-to-noise or point density) into improved Fourier resolution.

  7. Identification and Evaluation of Single-Nucleotide Polymorphisms in Allotetraploid Peanut (Arachis hypogaea L.) Based on Amplicon Sequencing Combined with High Resolution Melting (HRM) Analysis

    PubMed Central

    Hong, Yanbin; Pandey, Manish K.; Liu, Ying; Chen, Xiaoping; Liu, Hong; Varshney, Rajeev K.; Liang, Xuanqiang; Huang, Shangzhi

    2015-01-01

    The cultivated peanut (Arachis hypogaea L.) is an allotetraploid (AABB) species derived from the A-genome (Arachis duranensis) and B-genome (Arachis ipaensis) progenitors. Presence of two versions of a DNA sequence based on the two progenitor genomes poses a serious technical and analytical problem during single nucleotide polymorphism (SNP) marker identification and analysis. In this context, we have analyzed 200 amplicons derived from expressed sequence tags (ESTs) and genome survey sequences (GSS) to identify SNPs in a panel of genotypes consisting of 12 cultivated peanut varieties and two diploid progenitors representing the ancestral genomes. A total of 18 EST-SNPs and 44 genomic-SNPs were identified in 12 peanut varieties by aligning the sequence of A. hypogaea with diploid progenitors. The average frequency of sequence polymorphism was higher for genomic-SNPs than the EST-SNPs with one genomic-SNP every 1011 bp as compared to one EST-SNP every 2557 bp. In order to estimate the potential and further applicability of these identified SNPs, 96 peanut varieties were genotyped using high resolution melting (HRM) method. Polymorphism information content (PIC) values for EST-SNPs ranged between 0.021 and 0.413 with a mean of 0.172 in the set of peanut varieties, while genomic-SNPs ranged between 0.080 and 0.478 with a mean of 0.249. Total 33 SNPs were used for polymorphism detection among the parents and 10 selected lines from mapping population Y13Zh (Zhenzhuhei × Yueyou13). Of the total 33 SNPs, nine SNPs showed polymorphism in the mapping population Y13Zh, and seven SNPs were successfully mapped into five linkage groups. Our results showed that SNPs can be identified in allotetraploid peanut with high accuracy through amplicon sequencing and HRM assay. The identified SNPs were very informative and can be used for different genetic and breeding applications in peanut. PMID:26697032

  8. Identification and Evaluation of Single-Nucleotide Polymorphisms in Allotetraploid Peanut (Arachis hypogaea L.) Based on Amplicon Sequencing Combined with High Resolution Melting (HRM) Analysis.

    PubMed

    Hong, Yanbin; Pandey, Manish K; Liu, Ying; Chen, Xiaoping; Liu, Hong; Varshney, Rajeev K; Liang, Xuanqiang; Huang, Shangzhi

    2015-01-01

    The cultivated peanut (Arachis hypogaea L.) is an allotetraploid (AABB) species derived from the A-genome (Arachis duranensis) and B-genome (Arachis ipaensis) progenitors. Presence of two versions of a DNA sequence based on the two progenitor genomes poses a serious technical and analytical problem during single nucleotide polymorphism (SNP) marker identification and analysis. In this context, we have analyzed 200 amplicons derived from expressed sequence tags (ESTs) and genome survey sequences (GSS) to identify SNPs in a panel of genotypes consisting of 12 cultivated peanut varieties and two diploid progenitors representing the ancestral genomes. A total of 18 EST-SNPs and 44 genomic-SNPs were identified in 12 peanut varieties by aligning the sequence of A. hypogaea with diploid progenitors. The average frequency of sequence polymorphism was higher for genomic-SNPs than the EST-SNPs with one genomic-SNP every 1011 bp as compared to one EST-SNP every 2557 bp. In order to estimate the potential and further applicability of these identified SNPs, 96 peanut varieties were genotyped using high resolution melting (HRM) method. Polymorphism information content (PIC) values for EST-SNPs ranged between 0.021 and 0.413 with a mean of 0.172 in the set of peanut varieties, while genomic-SNPs ranged between 0.080 and 0.478 with a mean of 0.249. Total 33 SNPs were used for polymorphism detection among the parents and 10 selected lines from mapping population Y13Zh (Zhenzhuhei × Yueyou13). Of the total 33 SNPs, nine SNPs showed polymorphism in the mapping population Y13Zh, and seven SNPs were successfully mapped into five linkage groups. Our results showed that SNPs can be identified in allotetraploid peanut with high accuracy through amplicon sequencing and HRM assay. The identified SNPs were very informative and can be used for different genetic and breeding applications in peanut. PMID:26697032

  9. Data for constructing insect genome content matrices for phylogenetic analysis and functional annotation.

    PubMed

    Rosenfeld, Jeffrey; Foox, Jonathan; DeSalle, Rob

    2016-03-01

    Twenty one fully sequenced and well annotated insect genomes were used to construct genome content matrices for phylogenetic analysis and functional annotation of insect genomes. To examine the role of e-value cutoff in ortholog determination we used scaled e-value cutoffs and a single linkage clustering approach.. The present communication includes (1) a list of the genomes used to construct the genome content phylogenetic matrices, (2) a nexus file with the data matrices used in phylogenetic analysis, (3) a nexus file with the Newick trees generated by phylogenetic analysis, (4) an excel file listing the Core (CORE) genes and Unique (UNI) genes found in five insect groups, and (5) a figure showing a plot of consistency index (CI) versus percent of unannotated genes that are apomorphies in the data set for gene losses and gains and bar plots of gains and losses for four consistency index (CI) cutoffs. PMID:26862572

  10. The First Complete Chloroplast Genome Sequences in Actinidiaceae: Genome Structure and Comparative Analysis

    PubMed Central

    Yao, Xiaohong; Tang, Ping; Li, Zuozhou; Li, Dawei; Liu, Yifei; Huang, Hongwen

    2015-01-01

    Actinidia chinensis is an important economic plant belonging to the basal lineage of the asterids. Availability of a complete Actinidia chloroplast genome sequence is crucial to understanding phylogenetic relationships among major lineages of angiosperms and facilitates kiwifruit genetic improvement. We report here the complete nucleotide sequences of the chloroplast genomes for Actinidia chinensis and A. chinensis var deliciosa obtained through de novo assembly of Illumina paired-end reads produced by total DNA sequencing. The total genome size ranges from 155,446 to 157,557 bp, with an inverted repeat (IR) of 24,013 to 24,391 bp, a large single copy region (LSC) of 87,984 to 88,337 bp and a small single copy region (SSC) of 20,332 to 20,336 bp. The genome encodes 113 different genes, including 79 unique protein-coding genes, 30 tRNA genes and 4 ribosomal RNA genes, with 16 duplicated in the inverted repeats, and a tRNA gene (trnfM-CAU) duplicated once in the LSC region. Comparisons of IR boundaries among four asterid species showed that IR/LSC borders were extended into the 5’ portion of the psbA gene and IR contraction occurred in Actinidia. The clap gene has been lost from the chloroplast genome in Actinidia, and may have been transferred to the nucleus during chloroplast evolution. Twenty-seven polymorphic simple sequence repeat (SSR) loci were identified in the Actinidia chloroplast genome. Maximum parsimony analyses of a 72-gene, 16 taxa angiosperm dataset strongly support the placement of Actinidiaceae in Ericales within the basal asterids. PMID:26046631

  11. Rapid real-time PCR and high resolution melt analysis in a self-filling thermoplastic chip.

    PubMed

    Sposito, A; Hoang, V; DeVoe, D L

    2016-09-21

    A microfluidic platform designed for point-of-care PCR-based nucleic acid diagnostics is described. Compared to established microfluidic PCR technologies, the system is unique in its ability to achieve exceptionally rapid PCR amplification in a low cost thermoplastic format, together with high temperature accuracy enabling effective validation of reaction product by high resolution melt analysis performed in the same chamber as PCR. In addition, the system employs capillary pumping for automated loading of sample into the reaction chamber, combined with an integrated hydrophilic valve for precise self-metering of sample volumes into the device. Using the microfluidic system to target a mutation in the G6PC gene, efficient PCR from human genomic DNA template is achieved with cycle times as low as 14 s, full amplification in 8.5 min, and final melt analysis accurately identifying the desired amplicon. PMID:27460504

  12. Genetic analysis by DNA fingerprinting in tsetse fly genomes.

    PubMed

    Blanchetot, A; Gooding, R H

    1993-12-01

    Genomic DNA from tsetse flies (Diptera: Glossinidae: Glossina Wiedemann) was analyzed by hybridization using the whole M13 phage as a probe to reveal DNA fingerprinting (DNAfp) profiles. Intrapopulation variability, measured by comparison of DNAfp profiles of tsetse flies from a large colony of G. brevipalpis, showed a high degree of polymorphism similar to that found in other animal species. Different lines of G. m. morsitans, G. m. centralis, G. m. submorsitans, G. p. palpalis and G. p. gambiensis established from small colonies displayed less genetic variability than the G. brevipalpis population. The analysis of pedigree relationships within an inbred line of G. m. centralis conformed to a Mendelian inheritance pattern. In the pedigree presented no mutations were observed, one fragment was linked to the X chromosome, and three fragment sets were linked, but most fragments showed independent segregation. M13 revealed no characteristic DNAfp profile differences between the subgenus Glossina and the subgenus Nemorhina, but a conserved distribution pattern was found in the laboratory colonies within each subspecies. M13 also revealed line specific DNA fragments that may be useful as genetic markers to expand the present linkage map of G. m. morsitans. PMID:8220390

  13. Genome-wide analysis of TCP family in tobacco.

    PubMed

    Chen, L; Chen, Y Q; Ding, A M; Chen, H; Xia, F; Wang, W F; Sun, Y H

    2016-01-01

    The TCP family is a transcription factor family, members of which are extensively involved in plant growth and development as well as in signal transduction in the response against many physiological and biochemical stimuli. In the present study, 61 TCP genes were identified in tobacco (Nicotiana tabacum) genome. Bioinformatic methods were employed for predicting and analyzing the gene structure, gene expression, phylogenetic analysis, and conserved domains of TCP proteins in tobacco. The 61 NtTCP genes were divided into three diverse groups, based on the division of TCP genes in tomato and Arabidopsis, and the results of the conserved domain and sequence analyses further confirmed the classification of the NtTCP genes. The expression pattern of NtTCP also demonstrated that majority of these genes play important roles in all the tissues, while some special genes exercise their functions only in specific tissues. In brief, the comprehensive and thorough study of the TCP family in other plants provides sufficient resources for studying the structure and functions of TCPs in tobacco. PMID:27323069

  14. Monochromosomal hybrids for the analysis of the human genome

    SciTech Connect

    Athwal, R.S.

    1990-01-01

    In this research project the authors proposed to develop rodent/human hybrid cell lines each containing a single different human chromosome. The human chromosomes will be marked with Ecogpt and stably maintained by selection in the hybrid cells. The experimental approach to produce the proposed cell lines involve the following: they will first transfer a cloned selectable marker, Ecogpt (an E. coli gene for xanthine-guanine phosphoribosyltransferase: XGPRT) to normal diploid human cells using a retroviral vector. The transferred gene will integrate at random into multiple sites in the recipient cell genome. Clonal cell lines from independent transgenotes will each carry the selectable marker integrated into a different site and perhaps a different chromosome. The chromosome carrying the selectable marker will then be transferred further to mouse cells by microcell fusion. In addition they also use directed integration of Ecogpt into the chromosome present in rodent cells, otherwise not marked with a selectable marker. This allows them to complete the bank of proposed cell line. The human chromosome, since it will be marked with a selectable marker, can be transferred to any other cell line of interest for complementation analysis. Clones of each cell line, containing varying size segments of the same chromosome produced by selection for the retention or loss of the selectable marker following x-irradiation or by metaphase chromosome transfer method will facilitate physical mapping and determination of gene order on a chromosome. 1 fig.

  15. Genome-wide transcriptome analysis of human epidermal melanocytes

    PubMed Central

    Haltaufderhyde, Kirk D.; Oancea, Elena

    2015-01-01

    Because human epidermal melanocytes (HEMs) provide critical protection against skin cancer, sunburn, and photoaging, a genome-wide perspective of gene expression in these cells is vital to understanding human skin physiology. In this study we performed high throughput sequencing of HEMs to obtain a complete data set of transcript sizes, abundances, and splicing. As expected, we found that melanocyte specific genes that function in pigmentation were among the highest expressed genes. We analyzed receptor, ion channel and transcription factor gene families to get a better understanding of the cell signalling pathways used by melanocytes. We also performed a comparative transcriptomic analysis of lightly versus darkly pigmented HEMs and found 16 genes differentially expressed in the two pigmentation phenotypes; of those, only one putative melanosomal transporter (SLC45A2) has known function in pigmentation. In addition, we found 166 genes with splice isoforms expressed exclusively in one pigmentation phenotype, 17 of which are genes involved in signal transduction. Our melanocyte transcriptome study provides a comprehensive view and may help identify novel pigmentation genes and potential pharmacological targets. PMID:25451175

  16. Connecting Genomic Alterations to Cancer Biology with Proteomics: The NCI Clinical Proteomic Tumor Analysis Consortium

    SciTech Connect

    Ellis, Matthew; Gillette, Michael; Carr, Steven A.; Paulovich, Amanda G.; Smith, Richard D.; Rodland, Karin D.; Townsend, Reid; Kinsinger, Christopher; Mesri, Mehdi; Rodriguez, Henry; Liebler, Daniel

    2013-10-03

    The National Cancer Institute (NCI) Clinical Proteomic Tumor Analysis Consortium is applying the latest generation of proteomic technologies to genomically annotated tumors from The Cancer Genome Atlas (TCGA) program, a joint initiative of the NCI and the National Human Genome Research Institute. By providing a fully integrated accounting of DNA, RNA, and protein abnormalities in individual tumors, these datasets will illuminate the complex relationship between genomic abnormalities and cancer phenotypes, thus producing biologic insights as well as a wave of novel candidate biomarkers and therapeutic targets amenable to verifi cation using targeted mass spectrometry methods.

  17. Comparative analysis of prophage-like elements in Helicobacter sp. genomes

    PubMed Central

    Fan, Xiangyu; Li, Yumei; He, Rong

    2016-01-01

    Prophages are regarded as one of the factors underlying bacterial virulence, genomic diversification, and fitness, and are ubiquitous in bacterial genomes. Information on Helicobacter sp. prophages remains scarce. In this study, sixteen prophages were identified and analyzed in detail. Eight of them are described for the first time. Based on a comparative genomic analysis, these sixteen prophages can be classified into four different clusters. Phylogenetic relationships of Cluster A Helicobacter prophages were investigated. Furthermore, genomes of Helicobacter prophages from Clusters B, C, and D were analyzed. Interestingly, some putative antibiotic resistance proteins and virulence factors were associated with Helicobacter prophages. PMID:27169002

  18. Signatures of positive selection in East African Shorthorn Zebu: a genome-wide SNP analysis

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The small East African Shorthorn Zebu is the main indigenous cattle across East Africa. A recent genome wide SNPs analysis has revealed their ancient stable African taurine x Asian zebu admixture. Here, we assess the presence of candidate signature of positive selection in their genome, with the aim...

  19. Genomic analysis of a nontoxigenic, invasive Corynebacterium diphtheriae strain from Brazil

    PubMed Central

    Encinas, Fernando; Marin, Michel A; Ramos, Juliana N; Vieira, Verônica V; Mattos-Guaraldi, Ana Luiza; Vicente, Ana Carolina P

    2015-01-01

    We report the complete genome sequence and analysis of an invasive Corynebacterium diphtheriae strain that caused endocarditis in Rio de Janeiro, Brazil. It was selected for sequencing on the basis of the current relevance of nontoxigenic strains for public health. The genomic information was explored in the context of diversity, plasticity and genetic relatedness with other contemporary strains. PMID:26517665

  20. Major soybean maturity gene haplotypes revealed by SNPViz analysis of 72 sequenced soybean genomes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    In this Genomics Era, vast amounts of next generation sequencing data have become publicly-available for multiple genomes across hundreds of species. Analysis of these large-scale datasets can become cumbersome, especially when comparing nucleotide polymorphisms across many samples within a dataset...

  1. Dissection of genomic correlation matrices of US Holsteins using multivariate factor analysis

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aim of the study was to compare correlation matrices between direct genomic predictions for 31 production, fitness and conformation traits both at genomic and chromosomal level in US Holstein bulls. Multivariate factor analysis was used to quantify basic features of correlation matrices. Factor extr...

  2. Analysis Of Papaya BAC End Sequences: Insights Into The Organization Of A Tree Fruit Genome

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Papaya (Carica papaya L.) is a major tree fruit crop of tropical and subtropical regions with an estimated genome size of 372 Mbp. We present the analysis of 4.7% of the papaya genome based on BAC end sequences (BESs) representing 17 million high-quality bases. Microsatellites discovered in 5,452 BE...

  3. Determination and analysis of the genome sequence of Spodoptera littoralis multiple nucleopolyhedrovirus

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The Spodoptera littoralis multiple nucleopolyhedrovirus (SpliMNPV), a pathogen of the Egyptian cotton leaf worm Spodoptera littoralis, was subjected to sequencing of its entire DNA genome and bioassay analysis comparing its virulence to that of other baculoviruses. The annotated SpliMNPV genome of...

  4. Analysis of pig genomes provide insight into porcine demography and evolution

    Technology Transfer Automated Retrieval System (TEKTRAN)

    For nearly 8,000 years pigs and humans have shared a close and complex relationship, and through domestication and breeding, humans have shaped the genomes of current diverse pig breeds. Here we present the assembly and analysis of the genome sequence of a female domestic pig from the European Duroc...

  5. Meta-Analysis of Genome-Wide Association Studies of Attention-Deficit/Hyperactivity Disorder

    ERIC Educational Resources Information Center

    Neale, Benjamin M.; Medland, Sarah E.; Ripke, Stephan; Asherson, Philip; Franke, Barbara; Lesch, Klaus-Peter; Faraone, Stephen V.; Nguyen, Thuy Trang; Schafer, Helmut; Holmans, Peter; Daly, Mark; Steinhausen, Hans-Christoph; Freitag, Christine; Reif, Andreas; Renner, Tobias J.; Romanos, Marcel; Romanos, Jasmin; Walitza, Susanne; Warnke, Andreas; Meyer, Jobst; Palmason, Haukur; Buitelaar, Jan; Vasquez, Alejandro Arias; Lambregts-Rommelse, Nanda; Gill, Michael; Anney, Richard J. L.; Langely, Kate; O'Donovan, Michael; Williams, Nigel; Owen, Michael; Thapar, Anita; Kent, Lindsey; Sergeant, Joseph; Roeyers, Herbert; Mick, Eric; Biederman, Joseph; Doyle, Alysa; Smalley, Susan; Loo, Sandra; Hakonarson, Hakon; Elia, Josephine; Todorov, Alexandre; Miranda, Ana; Mulas, Fernando; Ebstein, Richard P.; Rothenberger, Aribert; Banaschewski, Tobias; Oades, Robert D.; Sonuga-Barke, Edmund; McGough, James; Nisenbaum, Laura; Middleton, Frank; Hu, Xiaolan; Nelson, Stan

    2010-01-01

    Objective: Although twin and family studies have shown attention-deficit/hyperactivity disorder (ADHD) to be highly heritable, genetic variants influencing the trait at a genome-wide significant level have yet to be identified. As prior genome-wide association studies (GWAS) have not yielded significant results, we conducted a meta-analysis of…

  6. Comparative genomics in chicken and Pekin duck using FISH mapping and microarray analysis

    PubMed Central

    2009-01-01

    Background The availability of the complete chicken (Gallus gallus) genome sequence as well as a large number of chicken probes for fluorescent in-situ hybridization (FISH) and microarray resources facilitate comparative genomic studies between chicken and other bird species. In a previous study, we provided a comprehensive cytogenetic map for the turkey (Meleagris gallopavo) and the first an