Science.gov

Sample records for resolution genomic analysis

  1. Topological Data Analysis Generates High-Resolution, Genome-wide Maps of Human Recombination.

    PubMed

    Camara, Pablo G; Rosenbloom, Daniel I S; Emmett, Kevin J; Levine, Arnold J; Rabadan, Raul

    2016-07-01

    Meiotic recombination is a fundamental evolutionary process driving diversity in eukaryotes. In mammals, recombination is known to occur preferentially at specific genomic regions. Using topological data analysis (TDA), a branch of applied topology that extracts global features from large data sets, we developed an efficient method for mapping recombination at fine scales. When compared to standard linkage-based methods, TDA can deal with a larger number of SNPs and genomes without incurring prohibitive computational costs. We applied TDA to 1,000 Genomes Project data and constructed high-resolution whole-genome recombination maps of seven human populations. Our analysis shows that recombination is generally under-represented within transcription start sites. However, the binding sites of specific transcription factors are enriched for sites of recombination. These include transcription factors that regulate the expression of meiosis- and gametogenesis-specific genes, cell cycle progression, and differentiation blockage. Additionally, our analysis identifies an enrichment for sites of recombination at repeat-derived loci matched by piwi-interacting RNAs. PMID:27345159

  2. Genome-wide analysis of chromatin packing in Arabidopsis thaliana at single-gene resolution

    PubMed Central

    Liu, Chang; Wang, Congmao; Wang, George; Becker, Claude; Zaidem, Maricris; Weigel, Detlef

    2016-01-01

    The three-dimensional packing of the genome plays an important role in regulating gene expression. We have used Hi-C, a genome-wide chromatin conformation capture (3C) method, to analyze Arabidopsis thaliana chromosomes dissected into subkilobase segments, which is required for gene-level resolution in this species with a gene-dense genome. We found that the repressive H3K27me3 histone mark is overrepresented in the promoter regions of genes that are in conformational linkage over long distances. In line with the globally dispersed distribution of RNA polymerase II in A. thaliana nuclear space, actively transcribed genes do not show a strong tendency to associate with each other. In general, there are often contacts between 5′ and 3′ ends of genes, forming local chromatin loops. Such self-loop structures of genes are more likely to occur in more highly expressed genes, although they can also be found in silent genes. Silent genes with local chromatin loops are highly enriched for the histone variant H3.3 at their 5′ and 3′ ends but depleted of repressive marks such as heterochromatic histone modifications and DNA methylation in flanking regions. Our results suggest that, different from animals, a major theme of genome folding in A. thaliana is the formation of structural units that correspond to gene bodies. PMID:27225844

  3. Genome-wide and fine-resolution association analysis of malaria in West Africa.

    PubMed

    Jallow, Muminatou; Teo, Yik Ying; Small, Kerrin S; Rockett, Kirk A; Deloukas, Panos; Clark, Taane G; Kivinen, Katja; Bojang, Kalifa A; Conway, David J; Pinder, Margaret; Sirugo, Giorgio; Sisay-Joof, Fatou; Usen, Stanley; Auburn, Sarah; Bumpstead, Suzannah J; Campino, Susana; Coffey, Alison; Dunham, Andrew; Fry, Andrew E; Green, Angela; Gwilliam, Rhian; Hunt, Sarah E; Inouye, Michael; Jeffreys, Anna E; Mendy, Alieu; Palotie, Aarno; Potter, Simon; Ragoussis, Jiannis; Rogers, Jane; Rowlands, Kate; Somaskantharajah, Elilan; Whittaker, Pamela; Widden, Claire; Donnelly, Peter; Howie, Bryan; Marchini, Jonathan; Morris, Andrew; SanJoaquin, Miguel; Achidi, Eric Akum; Agbenyega, Tsiri; Allen, Angela; Amodu, Olukemi; Corran, Patrick; Djimde, Abdoulaye; Dolo, Amagana; Doumbo, Ogobara K; Drakeley, Chris; Dunstan, Sarah; Evans, Jennifer; Farrar, Jeremy; Fernando, Deepika; Hien, Tran Tinh; Horstmann, Rolf D; Ibrahim, Muntaser; Karunaweera, Nadira; Kokwaro, Gilbert; Koram, Kwadwo A; Lemnge, Martha; Makani, Julie; Marsh, Kevin; Michon, Pascal; Modiano, David; Molyneux, Malcolm E; Mueller, Ivo; Parker, Michael; Peshu, Norbert; Plowe, Christopher V; Puijalon, Odile; Reeder, John; Reyburn, Hugh; Riley, Eleanor M; Sakuntabhai, Anavaj; Singhasivanon, Pratap; Sirima, Sodiomon; Tall, Adama; Taylor, Terrie E; Thera, Mahamadou; Troye-Blomberg, Marita; Williams, Thomas N; Wilson, Michael; Kwiatkowski, Dominic P

    2009-06-01

    We report a genome-wide association (GWA) study of severe malaria in The Gambia. The initial GWA scan included 2,500 children genotyped on the Affymetrix 500K GeneChip, and a replication study included 3,400 children. We used this to examine the performance of GWA methods in Africa. We found considerable population stratification, and also that signals of association at known malaria resistance loci were greatly attenuated owing to weak linkage disequilibrium (LD). To investigate possible solutions to the problem of low LD, we focused on the HbS locus, sequencing this region of the genome in 62 Gambian individuals and then using these data to conduct multipoint imputation in the GWA samples. This increased the signal of association, from P = 4 × 10(-7) to P = 4 × 10(-14), with the peak of the signal located precisely at the HbS causal variant. Our findings provide proof of principle that fine-resolution multipoint imputation, based on population-specific sequencing data, can substantially boost authentic GWA signals and enable fine mapping of causal variants in African populations. PMID:19465909

  4. Genome-wide and fine-resolution association analysis of malaria in West Africa.

    PubMed

    Jallow, Muminatou; Teo, Yik Ying; Small, Kerrin S; Rockett, Kirk A; Deloukas, Panos; Clark, Taane G; Kivinen, Katja; Bojang, Kalifa A; Conway, David J; Pinder, Margaret; Sirugo, Giorgio; Sisay-Joof, Fatou; Usen, Stanley; Auburn, Sarah; Bumpstead, Suzannah J; Campino, Susana; Coffey, Alison; Dunham, Andrew; Fry, Andrew E; Green, Angela; Gwilliam, Rhian; Hunt, Sarah E; Inouye, Michael; Jeffreys, Anna E; Mendy, Alieu; Palotie, Aarno; Potter, Simon; Ragoussis, Jiannis; Rogers, Jane; Rowlands, Kate; Somaskantharajah, Elilan; Whittaker, Pamela; Widden, Claire; Donnelly, Peter; Howie, Bryan; Marchini, Jonathan; Morris, Andrew; SanJoaquin, Miguel; Achidi, Eric Akum; Agbenyega, Tsiri; Allen, Angela; Amodu, Olukemi; Corran, Patrick; Djimde, Abdoulaye; Dolo, Amagana; Doumbo, Ogobara K; Drakeley, Chris; Dunstan, Sarah; Evans, Jennifer; Farrar, Jeremy; Fernando, Deepika; Hien, Tran Tinh; Horstmann, Rolf D; Ibrahim, Muntaser; Karunaweera, Nadira; Kokwaro, Gilbert; Koram, Kwadwo A; Lemnge, Martha; Makani, Julie; Marsh, Kevin; Michon, Pascal; Modiano, David; Molyneux, Malcolm E; Mueller, Ivo; Parker, Michael; Peshu, Norbert; Plowe, Christopher V; Puijalon, Odile; Reeder, John; Reyburn, Hugh; Riley, Eleanor M; Sakuntabhai, Anavaj; Singhasivanon, Pratap; Sirima, Sodiomon; Tall, Adama; Taylor, Terrie E; Thera, Mahamadou; Troye-Blomberg, Marita; Williams, Thomas N; Wilson, Michael; Kwiatkowski, Dominic P

    2009-06-01

    We report a genome-wide association (GWA) study of severe malaria in The Gambia. The initial GWA scan included 2,500 children genotyped on the Affymetrix 500K GeneChip, and a replication study included 3,400 children. We used this to examine the performance of GWA methods in Africa. We found considerable population stratification, and also that signals of association at known malaria resistance loci were greatly attenuated owing to weak linkage disequilibrium (LD). To investigate possible solutions to the problem of low LD, we focused on the HbS locus, sequencing this region of the genome in 62 Gambian individuals and then using these data to conduct multipoint imputation in the GWA samples. This increased the signal of association, from P = 4 × 10(-7) to P = 4 × 10(-14), with the peak of the signal located precisely at the HbS causal variant. Our findings provide proof of principle that fine-resolution multipoint imputation, based on population-specific sequencing data, can substantially boost authentic GWA signals and enable fine mapping of causal variants in African populations.

  5. Representational difference analysis, high-resolution physical mapping, and transcript identification of the zebrafish genomic region for a motor behavior.

    PubMed

    Sato, Tomomi; Mishina, Masayoshi

    2003-08-01

    Zebrafish is one of the best model organisms for investigating gene functions in vertebrates. By 4,5',8-trimethylpsoralen mutagenesis, we isolated a zebrafish mutant, vibrato, with defects in the spontaneous contraction and touch response. Whole genome subtraction between the wild-type and the mutant genomes by representational difference analysis yielded polymorphic markers tightly linked to the vibrato locus. Using these markers, we constructed a high-resolution physical map and localized the vibrato locus within a genomic region of 720 kb. Direct cDNA selection with the contig led to the identification of a novel gene, solo, encoding a protein with SEC14 and spectrin repeat domains. These domains of Solo shared significant amino acid sequence identities with those of mammalian Trio and Karilin. In addition, we found the zebrafish orthologs for mammalian TTN, COL5A2, and CED-6 in the vibrato region. Mapping of these genes localized human chromosomal regions possibly involved in motor disorders. Our results suggest that representational difference analysis provides an efficient way to isolate mutated genomic regions in zebrafish. PMID:12837271

  6. The interaction of high-resolution electrophoresis and computational analysis in genome mapping

    SciTech Connect

    Carrano, A.V.; Branscomb, E.W.; de Jong, P.J.; Mohrenweiser, H.; Olsen, A.; Slezak, T.

    1990-07-26

    The construction of physical maps and the determination of the DNA sequence of chromosome-size segments of the human genome is a complex, multidisciplinary undertaking. The approach we have taken to construct a physical map and sequence of human chromosome 19 typifies these interactions. We exploit the power of both acrylamide and agarose gel electrophoresis to provide a simple and versatile method for DNA fingerprinting and the creation of contigs or sets of overlapping genomic clones. Cosmid libraries are constructed from Yeast Artificial Chromosomes (YAC) clones or from flow-sorted chromosomes. Cosmid DNA isolated from the screened library array is cut with a combination of five restriction enzymes and the fragment ends labeled with one of four different fluorochromes. Our approach to contig construction uses a robotic system to label restriction fragments from cosmids with fluorochromes, use of an automated DNA sequencer to capture fragment mobility data in a high resolution multiplex mode processes the mobility data to determine fragment length and provide a statistical measure of overlap among cosmids; and display the contigs and underlying cosmids for operator interaction and access to a database. Data analyses and interactions are conducted over a network of SUN workstations using a set of software tools that we developed and coupled to a commercially available database. Applying these methods, we have analyzed 5154 cosmid clones and assembled 515 contigs for chromosome 19. Some of these contigs have been identified with known genes and many have been mapped to the chromosome by fluorescence in situ hybridization. Existing contigs are being extended by a combination of walking and fingerprinting. 21 refs., 2 figs.

  7. A high-resolution genomic analysis of multidrug-resistant hospital outbreaks of Klebsiella pneumoniae

    PubMed Central

    Chung The, Hao; Karkey, Abhilasha; Pham Thanh, Duy; Boinett, Christine J; Cain, Amy K; Ellington, Matthew; Baker, Kate S; Dongol, Sabina; Thompson, Corinne; Harris, Simon R; Jombart, Thibaut; Le Thi Phuong, Tu; Tran Do Hoang, Nhu; Ha Thanh, Tuyen; Shretha, Shrijana; Joshi, Suchita; Basnyat, Buddha; Thwaites, Guy; Thomson, Nicholas R; Rabaa, Maia A; Baker, Stephen

    2015-01-01

    Multidrug-resistant (MDR) Klebsiella pneumoniae has become a leading cause of nosocomial infections worldwide. Despite its prominence, little is known about the genetic diversity of K. pneumoniae in resource-poor hospital settings. Through whole-genome sequencing (WGS), we reconstructed an outbreak of MDR K. pneumoniae occurring on high-dependency wards in a hospital in Kathmandu during 2012 with a case-fatality rate of 75%. The WGS analysis permitted the identification of two MDR K. pneumoniae lineages causing distinct outbreaks within the complex endemic K. pneumoniae. Using phylogenetic reconstruction and lineage-specific PCR, our data predicted a scenario in which K. pneumoniae, circulating for 6 months before the outbreak, underwent a series of ward-specific clonal expansions after the acquisition of genes facilitating virulence and MDR. We suggest that the early detection of a specific NDM-1 containing lineage in 2011 would have alerted the high-dependency ward staff to intervene. We argue that some form of real-time genetic characterisation, alongside clade-specific PCR during an outbreak, should be factored into future healthcare infection control practices in both high- and low-income settings. PMID:25712531

  8. A high-resolution genomic analysis of multidrug-resistant hospital outbreaks of Klebsiella pneumoniae.

    PubMed

    Chung The, Hao; Karkey, Abhilasha; Pham Thanh, Duy; Boinett, Christine J; Cain, Amy K; Ellington, Matthew; Baker, Kate S; Dongol, Sabina; Thompson, Corinne; Harris, Simon R; Jombart, Thibaut; Le Thi Phuong, Tu; Tran Do Hoang, Nhu; Ha Thanh, Tuyen; Shretha, Shrijana; Joshi, Suchita; Basnyat, Buddha; Thwaites, Guy; Thomson, Nicholas R; Rabaa, Maia A; Baker, Stephen

    2015-03-01

    Multidrug-resistant (MDR) Klebsiella pneumoniae has become a leading cause of nosocomial infections worldwide. Despite its prominence, little is known about the genetic diversity of K. pneumoniae in resource-poor hospital settings. Through whole-genome sequencing (WGS), we reconstructed an outbreak of MDR K. pneumoniae occurring on high-dependency wards in a hospital in Kathmandu during 2012 with a case-fatality rate of 75%. The WGS analysis permitted the identification of two MDR K. pneumoniae lineages causing distinct outbreaks within the complex endemic K. pneumoniae. Using phylogenetic reconstruction and lineage-specific PCR, our data predicted a scenario in which K. pneumoniae, circulating for 6 months before the outbreak, underwent a series of ward-specific clonal expansions after the acquisition of genes facilitating virulence and MDR. We suggest that the early detection of a specific NDM-1 containing lineage in 2011 would have alerted the high-dependency ward staff to intervene. We argue that some form of real-time genetic characterisation, alongside clade-specific PCR during an outbreak, should be factored into future healthcare infection control practices in both high- and low-income settings. PMID:25712531

  9. Analysis of complete mitochondrial genome sequences increases phylogenetic resolution of bears (Ursidae), a mammalian family that experienced rapid speciation

    PubMed Central

    Yu, Li; Li, Yi-Wei; Ryder, Oliver A; Zhang, Ya-Ping

    2007-01-01

    Background Despite the small number of ursid species, bear phylogeny has long been a focus of study due to their conservation value, as all bear genera have been classified as endangered at either the species or subspecies level. The Ursidae family represents a typical example of rapid evolutionary radiation. Previous analyses with a single mitochondrial (mt) gene or a small number of mt genes either provide weak support or a large unresolved polytomy for ursids. We revisit the contentious relationships within Ursidae by analyzing complete mt genome sequences and evaluating the performance of both entire mt genomes and constituent mtDNA genes in recovering a phylogeny of extremely recent speciation events. Results This mitochondrial genome-based phylogeny provides strong evidence that the spectacled bear diverged first, while within the genus Ursus, the sloth bear is the sister taxon of all the other five ursines. The latter group is divided into the brown bear/polar bear and the two black bears/sun bear assemblages. These findings resolve the previous conflicts between trees using partial mt genes. The ability of different categories of mt protein coding genes to recover the correct phylogeny is concordant with previous analyses for taxa with deep divergence times. This study provides a robust Ursidae phylogenetic framework for future validation by additional independent evidence, and also has significant implications for assisting in the resolution of other similarly difficult phylogenetic investigations. Conclusion Identification of base composition bias and utilization of the combined data of whole mitochondrial genome sequences has allowed recovery of a strongly supported phylogeny that is upheld when using multiple alternative outgroups for the Ursidae, a mammalian family that underwent a rapid radiation since the mid- to late Pliocene. It remains to be seen if the reliability of mt genome analysis will hold up in studies of other difficult phylogenetic

  10. Genomic paradigms for food-borne enteric pathogen analysis at the USFDA: case studies highlighting method utility, integration and resolution.

    PubMed

    Elkins, C A; Kotewicz, M L; Jackson, S A; Lacher, D W; Abu-Ali, G S; Patel, I R

    2013-01-01

    Modern risk control and food safety practices involving food-borne bacterial pathogens are benefiting from new genomic technologies for rapid, yet highly specific, strain characterisations. Within the United States Food and Drug Administration (USFDA) Center for Food Safety and Applied Nutrition (CFSAN), optical genome mapping and DNA microarray genotyping have been used for several years to quickly assess genomic architecture and gene content, respectively, for outbreak strain subtyping and to enhance retrospective trace-back analyses. The application and relative utility of each method varies with outbreak scenario and the suspect pathogen, with comparative analytical power enhanced by database scale and depth. Integration of these two technologies allows high-resolution scrutiny of the genomic landscapes of enteric food-borne pathogens with notable examples including Shiga toxin-producing Escherichia coli (STEC) and Salmonella enterica serovars from a variety of food commodities. Moreover, the recent application of whole genome sequencing technologies to food-borne pathogen outbreaks and surveillance has enhanced resolution to the single nucleotide scale. This new wealth of sequence data will support more refined next-generation custom microarray designs, targeted re-sequencing and "genomic signature recognition" approaches involving a combination of genes and single nucleotide polymorphism detection to distil strain-specific fingerprinting to a minimised scale. This paper examines the utility of microarrays and optical mapping in analysing outbreaks, reviews best practices and the limits of these technologies for pathogen differentiation, and it considers future integration with whole genome sequencing efforts.

  11. Xenopus tropicalis Genome Re-Scaffolding and Re-Annotation Reach the Resolution Required for In Vivo ChIA-PET Analysis

    PubMed Central

    Buisine, Nicolas; Ruan, Xiaoan; Bilesimo, Patrice; Grimaldi, Alexis; Alfama, Gladys; Ariyaratne, Pramila; Mulawadi, Fabianus; Chen, Jieqi; Sung, Wing-Kin; Liu, Edison T.; Demeneix, Barbara A.; Ruan, Yijun; Sachs, Laurent M.

    2015-01-01

    Genome-wide functional analyses require high-resolution genome assembly and annotation. We applied ChIA-PET to analyze gene regulatory networks, including 3D chromosome interactions, underlying thyroid hormone (TH) signaling in the frog Xenopus tropicalis. As the available versions of Xenopus tropicalis assembly and annotation lacked the resolution required for ChIA-PET we improve the genome assembly version 4.1 and annotations using data derived from the paired end tag (PET) sequencing technologies and approaches (e.g., DNA-PET [gPET], RNA-PET etc.). The large insert (~10Kb, ~17Kb) paired end DNA-PET with high throughput NGS sequencing not only significantly improved genome assembly quality, but also strongly reduced genome “fragmentation”, reducing total scaffold numbers by ~60%. Next, RNA-PET technology, designed and developed for the detection of full-length transcripts and fusion mRNA in whole transcriptome studies (ENCODE consortia), was applied to capture the 5' and 3' ends of transcripts. These amendments in assembly and annotation were essential prerequisites for the ChIA-PET analysis of TH transcription regulation. Their application revealed complex regulatory configurations of target genes and the structures of the regulatory networks underlying physiological responses. Our work allowed us to improve the quality of Xenopus tropicalis genomic resources, reaching the standard required for ChIA-PET analysis of transcriptional networks. We consider that the workflow proposed offers useful conceptual and methodological guidance and can readily be applied to other non-conventional models that have low-resolution genome data. PMID:26348928

  12. Xenopus tropicalis Genome Re-Scaffolding and Re-Annotation Reach the Resolution Required for In Vivo ChIA-PET Analysis.

    PubMed

    Buisine, Nicolas; Ruan, Xiaoan; Bilesimo, Patrice; Grimaldi, Alexis; Alfama, Gladys; Ariyaratne, Pramila; Mulawadi, Fabianus; Chen, Jieqi; Sung, Wing-Kin; Liu, Edison T; Demeneix, Barbara A; Ruan, Yijun; Sachs, Laurent M

    2015-01-01

    Genome-wide functional analyses require high-resolution genome assembly and annotation. We applied ChIA-PET to analyze gene regulatory networks, including 3D chromosome interactions, underlying thyroid hormone (TH) signaling in the frog Xenopus tropicalis. As the available versions of Xenopus tropicalis assembly and annotation lacked the resolution required for ChIA-PET we improve the genome assembly version 4.1 and annotations using data derived from the paired end tag (PET) sequencing technologies and approaches (e.g., DNA-PET [gPET], RNA-PET etc.). The large insert (~10 Kb, ~17 Kb) paired end DNA-PET with high throughput NGS sequencing not only significantly improved genome assembly quality, but also strongly reduced genome "fragmentation", reducing total scaffold numbers by ~60%. Next, RNA-PET technology, designed and developed for the detection of full-length transcripts and fusion mRNA in whole transcriptome studies (ENCODE consortia), was applied to capture the 5' and 3' ends of transcripts. These amendments in assembly and annotation were essential prerequisites for the ChIA-PET analysis of TH transcription regulation. Their application revealed complex regulatory configurations of target genes and the structures of the regulatory networks underlying physiological responses. Our work allowed us to improve the quality of Xenopus tropicalis genomic resources, reaching the standard required for ChIA-PET analysis of transcriptional networks. We consider that the workflow proposed offers useful conceptual and methodological guidance and can readily be applied to other non-conventional models that have low-resolution genome data.

  13. Xenopus tropicalis Genome Re-Scaffolding and Re-Annotation Reach the Resolution Required for In Vivo ChIA-PET Analysis.

    PubMed

    Buisine, Nicolas; Ruan, Xiaoan; Bilesimo, Patrice; Grimaldi, Alexis; Alfama, Gladys; Ariyaratne, Pramila; Mulawadi, Fabianus; Chen, Jieqi; Sung, Wing-Kin; Liu, Edison T; Demeneix, Barbara A; Ruan, Yijun; Sachs, Laurent M

    2015-01-01

    Genome-wide functional analyses require high-resolution genome assembly and annotation. We applied ChIA-PET to analyze gene regulatory networks, including 3D chromosome interactions, underlying thyroid hormone (TH) signaling in the frog Xenopus tropicalis. As the available versions of Xenopus tropicalis assembly and annotation lacked the resolution required for ChIA-PET we improve the genome assembly version 4.1 and annotations using data derived from the paired end tag (PET) sequencing technologies and approaches (e.g., DNA-PET [gPET], RNA-PET etc.). The large insert (~10 Kb, ~17 Kb) paired end DNA-PET with high throughput NGS sequencing not only significantly improved genome assembly quality, but also strongly reduced genome "fragmentation", reducing total scaffold numbers by ~60%. Next, RNA-PET technology, designed and developed for the detection of full-length transcripts and fusion mRNA in whole transcriptome studies (ENCODE consortia), was applied to capture the 5' and 3' ends of transcripts. These amendments in assembly and annotation were essential prerequisites for the ChIA-PET analysis of TH transcription regulation. Their application revealed complex regulatory configurations of target genes and the structures of the regulatory networks underlying physiological responses. Our work allowed us to improve the quality of Xenopus tropicalis genomic resources, reaching the standard required for ChIA-PET analysis of transcriptional networks. We consider that the workflow proposed offers useful conceptual and methodological guidance and can readily be applied to other non-conventional models that have low-resolution genome data. PMID:26348928

  14. Further delineation of chromosomal consensus regions in primary mediastinal B-cell lymphomas: an analysis of 37 tumor samples using high-resolution genomic profiling (array-CGH).

    PubMed

    Wessendorf, S; Barth, T F E; Viardot, A; Mueller, A; Kestler, H A; Kohlhammer, H; Lichter, P; Bentz, M; Döhner, H; Möller, P; Schwaenen, C

    2007-12-01

    Primary mediastinal B-cell lymphoma (PMBL) is an aggressive extranodal B-cell non-Hodgkin's lymphoma with specific clinical, histopathological and genomic features. To characterize further the genotype of PMBL, we analyzed 37 tumor samples and PMBL cell lines Med-B1 and Karpas1106P using array-based comparative genomic hybridization (matrix- or array-CGH) to a 2.8k genomic microarray. Due to a higher genomic resolution, we identified altered chromosomal regions in much higher frequencies compared with standard CGH: for example, +9p24 (68%), +2p15 (51%), +7q22 (32%), +9q34 (32%), +11q23 (18%), +12q (30%) and +18q21 (24%). Moreover, previously unknown small interstitial chromosomal low copy number alterations (for example, -6p21, -11q13.3) and a total of 19 DNA amplifications were identified by array-CGH. For 17 chromosomal localizations (10 gains and 7 losses), which were altered in more than 10% of the analyzed cases, we delineated minimal consensus regions based on genomic base pair positions. These regions and selected immunohistochemistries point to candidate genes that are discussed in the context of NF-kappaB transcription activation, human leukocyte antigen class I/II defects, impaired apoptosis and Janus kinase/signal transducer and activator of transcription (JAK/STAT) activation. Our data confirm the genomic uniqueness of this tumor and provide physically mapped genomic regions of interest for focused candidate gene analysis. PMID:17728785

  15. Combined Analysis of Variation in Core, Accessory and Regulatory Genome Regions Provides a Super-Resolution View into the Evolution of Bacterial Populations.

    PubMed

    McNally, Alan; Oren, Yaara; Kelly, Darren; Pascoe, Ben; Dunn, Steven; Sreecharan, Tristan; Vehkala, Minna; Välimäki, Niko; Prentice, Michael B; Ashour, Amgad; Avram, Oren; Pupko, Tal; Dobrindt, Ulrich; Literak, Ivan; Guenther, Sebastian; Schaufler, Katharina; Wieler, Lothar H; Zhiyong, Zong; Sheppard, Samuel K; McInerney, James O; Corander, Jukka

    2016-09-01

    The use of whole-genome phylogenetic analysis has revolutionized our understanding of the evolution and spread of many important bacterial pathogens due to the high resolution view it provides. However, the majority of such analyses do not consider the potential role of accessory genes when inferring evolutionary trajectories. Moreover, the recently discovered importance of the switching of gene regulatory elements suggests that an exhaustive analysis, combining information from core and accessory genes with regulatory elements could provide unparalleled detail of the evolution of a bacterial population. Here we demonstrate this principle by applying it to a worldwide multi-host sample of the important pathogenic E. coli lineage ST131. Our approach reveals the existence of multiple circulating subtypes of the major drug-resistant clade of ST131 and provides the first ever population level evidence of core genome substitutions in gene regulatory regions associated with the acquisition and maintenance of different accessory genome elements. PMID:27618184

  16. Combined Analysis of Variation in Core, Accessory and Regulatory Genome Regions Provides a Super-Resolution View into the Evolution of Bacterial Populations

    PubMed Central

    McNally, Alan; Oren, Yaara; Kelly, Darren; Sreecharan, Tristan; Vehkala, Minna; Välimäki, Niko; Prentice, Michael B.; Ashour, Amgad; Avram, Oren; Pupko, Tal; Literak, Ivan; Guenther, Sebastian; Schaufler, Katharina; Wieler, Lothar H.; Zhiyong, Zong; Sheppard, Samuel K.; Corander, Jukka

    2016-01-01

    The use of whole-genome phylogenetic analysis has revolutionized our understanding of the evolution and spread of many important bacterial pathogens due to the high resolution view it provides. However, the majority of such analyses do not consider the potential role of accessory genes when inferring evolutionary trajectories. Moreover, the recently discovered importance of the switching of gene regulatory elements suggests that an exhaustive analysis, combining information from core and accessory genes with regulatory elements could provide unparalleled detail of the evolution of a bacterial population. Here we demonstrate this principle by applying it to a worldwide multi-host sample of the important pathogenic E. coli lineage ST131. Our approach reveals the existence of multiple circulating subtypes of the major drug–resistant clade of ST131 and provides the first ever population level evidence of core genome substitutions in gene regulatory regions associated with the acquisition and maintenance of different accessory genome elements. PMID:27618184

  17. Genome-wide SNP identification for the construction of a high-resolution genetic map of Japanese flounder (Paralichthys olivaceus): applications to QTL mapping of Vibrio anguillarum disease resistance and comparative genomic analysis.

    PubMed

    Shao, Changwei; Niu, Yongchao; Rastas, Pasi; Liu, Yang; Xie, Zhiyuan; Li, Hengde; Wang, Lei; Jiang, Yong; Tai, Shuaishuai; Tian, Yongsheng; Sakamoto, Takashi; Chen, Songlin

    2015-04-01

    High-resolution genetic maps are essential for fine mapping of complex traits, genome assembly, and comparative genomic analysis. Single-nucleotide polymorphisms (SNPs) are the primary molecular markers used for genetic map construction. In this study, we identified 13,362 SNPs evenly distributed across the Japanese flounder (Paralichthys olivaceus) genome. Of these SNPs, 12,712 high-confidence SNPs were subjected to high-throughput genotyping and assigned to 24 consensus linkage groups (LGs). The total length of the genetic linkage map was 3,497.29 cM with an average distance of 0.47 cM between loci, thereby representing the densest genetic map currently reported for Japanese flounder. Nine positive quantitative trait loci (QTLs) forming two main clusters for Vibrio anguillarum disease resistance were detected. All QTLs could explain 5.1-8.38% of the total phenotypic variation. Synteny analysis of the QTL regions on the genome assembly revealed 12 immune-related genes, among them 4 genes strongly associated with V. anguillarum disease resistance. In addition, 246 genome assembly scaffolds with an average size of 21.79 Mb were anchored onto the LGs; these scaffolds, comprising 522.99 Mb, represented 95.78% of assembled genomic sequences. The mapped assembly scaffolds in Japanese flounder were used for genome synteny analyses against zebrafish (Danio rerio) and medaka (Oryzias latipes). Flounder and medaka were found to possess almost one-to-one synteny, whereas flounder and zebrafish exhibited a multi-syntenic correspondence. The newly developed high-resolution genetic map, which will facilitate QTL mapping, scaffold assembly, and genome synteny analysis of Japanese flounder, marks a milestone in the ongoing genome project for this species.

  18. Genome-wide SNP identification for the construction of a high-resolution genetic map of Japanese flounder (Paralichthys olivaceus): applications to QTL mapping of Vibrio anguillarum disease resistance and comparative genomic analysis.

    PubMed

    Shao, Changwei; Niu, Yongchao; Rastas, Pasi; Liu, Yang; Xie, Zhiyuan; Li, Hengde; Wang, Lei; Jiang, Yong; Tai, Shuaishuai; Tian, Yongsheng; Sakamoto, Takashi; Chen, Songlin

    2015-04-01

    High-resolution genetic maps are essential for fine mapping of complex traits, genome assembly, and comparative genomic analysis. Single-nucleotide polymorphisms (SNPs) are the primary molecular markers used for genetic map construction. In this study, we identified 13,362 SNPs evenly distributed across the Japanese flounder (Paralichthys olivaceus) genome. Of these SNPs, 12,712 high-confidence SNPs were subjected to high-throughput genotyping and assigned to 24 consensus linkage groups (LGs). The total length of the genetic linkage map was 3,497.29 cM with an average distance of 0.47 cM between loci, thereby representing the densest genetic map currently reported for Japanese flounder. Nine positive quantitative trait loci (QTLs) forming two main clusters for Vibrio anguillarum disease resistance were detected. All QTLs could explain 5.1-8.38% of the total phenotypic variation. Synteny analysis of the QTL regions on the genome assembly revealed 12 immune-related genes, among them 4 genes strongly associated with V. anguillarum disease resistance. In addition, 246 genome assembly scaffolds with an average size of 21.79 Mb were anchored onto the LGs; these scaffolds, comprising 522.99 Mb, represented 95.78% of assembled genomic sequences. The mapped assembly scaffolds in Japanese flounder were used for genome synteny analyses against zebrafish (Danio rerio) and medaka (Oryzias latipes). Flounder and medaka were found to possess almost one-to-one synteny, whereas flounder and zebrafish exhibited a multi-syntenic correspondence. The newly developed high-resolution genetic map, which will facilitate QTL mapping, scaffold assembly, and genome synteny analysis of Japanese flounder, marks a milestone in the ongoing genome project for this species. PMID:25762582

  19. High-resolution analysis of paraffin-embedded and formalin-fixed prostate tumors using comparative genomic hybridization to genomic microarrays.

    PubMed

    Paris, Pamela L; Albertson, Donna G; Alers, Janneke C; Andaya, Armann; Carroll, Peter; Fridlyand, Jane; Jain, Ajay N; Kamkar, Sherwin; Kowbel, David; Krijtenburg, Pieter-Jaap; Pinkel, Daniel; Schröder, Fritz H; Vissers, Kees J; Watson, Vivienne J E; Wildhagen, Mark F; Collins, Colin; Van Dekken, Herman

    2003-03-01

    We have used prostate cancer, the most commonly diagnosed noncutaneous neoplasm among men, to investigate the feasibility of performing genomic array analyses of archival tissue. Prostate-specific antigen and a biopsy Gleason grade have not proven to be accurate in predicting clinical outcome, yet they remain the only accepted biomarkers for prostate cancer. It is likely that distinct spectra of genomic alterations underlie these phenotypic differences, and that once identified, may be used to differentiate between indolent and aggressive tumors. Array comparative genomic hybridization allows quantitative detection and mapping of copy number aberrations in tumors and subsequent associations to be made with clinical outcome. Archived tissues are needed to have patients with sufficient clinical follow-up. In this report, 20 formalin-fixed and paraffin-embedded prostate cancer samples originating from 1986 to 1996 were studied. We present a straightforward protocol and demonstrate the utility of archived tissue for array comparative genomic hybridization with a 2400 element BAC array that provides high-resolution detection of both deletions and amplifications.

  20. Whole-genome analysis of 5-hydroxymethylcytosine and 5-methylcytosine at base resolution in the human brain

    PubMed Central

    2014-01-01

    Background 5-methylcytosine (mC) can be oxidized by the tet methylcytosine dioxygenase (Tet) family of enzymes to 5-hydroxymethylcytosine (hmC), which is an intermediate of mC demethylation and may also be a stable epigenetic modification that influences chromatin structure. hmC is particularly abundant in mammalian brains but its function is currently unknown. A high-resolution hydroxymethylome map is required to fully understand the function of hmC in the human brain. Results We present genome-wide and single-base resolution maps of hmC and mC in the human brain by combined application of Tet-assisted bisulfite sequencing and bisulfite sequencing. We demonstrate that hmCs increase markedly from the fetal to the adult stage, and in the adult brain, 13% of all CpGs are highly hydroxymethylated with strong enrichment at genic regions and distal regulatory elements. Notably, hmC peaks are identified at the 5′splicing sites at the exon-intron boundary, suggesting a mechanistic link between hmC and splicing. We report a surprising transcription-correlated hmC bias toward the sense strand and an mC bias toward the antisense strand of gene bodies. Furthermore, hmC is negatively correlated with H3K27me3-marked and H3K9me3-marked repressive genomic regions, and is more enriched at poised enhancers than active enhancers. Conclusions We provide single-base resolution hmC and mC maps in the human brain and our data imply novel roles of hmC in regulating splicing and gene expression. Hydroxymethylation is the main modification status for a large portion of CpGs situated at poised enhancers and actively transcribed regions, suggesting its roles in epigenetic tuning at these regions. PMID:24594098

  1. High-resolution analysis of chromosomal breakpoints and genomic instability identifies PTPRD as a candidate tumor suppressor gene in neuroblastoma.

    PubMed

    Stallings, Raymond L; Nair, Prakash; Maris, John M; Catchpoole, Daniel; McDermott, Michael; O'Meara, Anne; Breatnach, Fin

    2006-04-01

    Although neuroblastoma is characterized by numerous recurrent, large-scale chromosomal imbalances, the genes targeted by such imbalances have remained elusive. We have applied whole-genome oligonucleotide array comparative genomic hybridization (median probe spacing 6 kb) to 56 neuroblastoma tumors and cell lines to identify genes involved with disease pathogenesis. This set of tumors was selected for having either 11q loss or MYCN amplification, abnormalities that define the two most common genetic subtypes of metastatic neuroblastoma. Our analyses have permitted us to map large-scale chromosomal imbalances and high-level amplifications at exon-level resolution and to identify novel microdeletions and duplications. Chromosomal breakpoints (n = 467) generating imbalances >2 Mb were mapped to intervals ranging between 6 and 50 kb in size, providing substantial information on each abnormality. For example, breakpoints leading to large-scale hemizygous loss of chromosome 11q were highly clustered and preferentially associated with segmental duplications. High-level amplifications of MYCN were extremely complex, often resulting in a series of discontinuous regions of amplification. Imbalances (n = 540) <2 Mb long were also detected. Although the majority (78%) of these imbalances mapped to segmentally duplicated regions and primarily reflect constitutional copy number polymorphisms, many subtle imbalances were detected that are likely somatically acquired alterations and include genes involved with tumorigenesis, apoptosis, or neural cell differentiation. The most frequent microdeletion involved the PTPRD locus, indicating a possible tumor suppressor function for this gene.

  2. An object model for genome information at all levels of resolution

    SciTech Connect

    Honda, S.; Parrott, N.W.; Smith, R.; Lawrence, C.

    1993-12-31

    An object model for genome data at all levels of resolution is described. The model was derived by considering the requirements for representing genome related objects in three application domains: genome maps, large-scale DNA sequencing, and exploring functional information in gene and protein sequences. The methodology used for the object-oriented analysis is also described.

  3. High-resolution breakpoint analysis provides evidence for the sequence-directed nature of genome rearrangements in hereditary disorders.

    PubMed

    Kovac, Michal B; Kovacova, Monika; Bachraty, Hynek; Bachrata, Katarina; Piscuoglio, Salvatore; Hutter, Pierre; Ilencikova, Denisa; Bartosova, Zdena; Tomlinson, Ian; Roethlisberger, Benno; Heinimann, Karl

    2015-02-01

    Although most of the pertinent data on the sequence-directed processes leading to genome rearrangements (GRs) have come from studies on somatic tissues, little is known about GRs in the germ line of patients with hereditary disorders. This study aims at identifying DNA motifs and higher order structures of genome architecture, which can result in losses and gains of genetic material in the germ line. We first identified candidate motifs by studying 112 pathogenic germ-line GRs in hereditary colorectal cancer patients, and subsequently created an algorithm, termed recombination type ratio, which correctly predicts the propensity of rearrangements with respect to homologous versus nonhomologous recombination events. PMID:25418510

  4. HAPPY mapping in a plant genome: reconstruction and analysis of a high-resolution physical map of a 1.9 Mbp region of Arabidopsis thaliana chromosome 4.

    PubMed

    Thangavelu, Madan; James, Allan B; Bankier, Alan; Bryan, Glenn J; Dear, Paul H; Waugh, Robbie

    2003-01-01

    HAPPY mapping is an in vitro approach for defining the order and spacing of DNA markers directly on native genomic DNA. This cloning-free technique is based on analysing the segregation of markers amplified from high molecular weight genomic DNA which has been broken randomly and 'segregated' by limiting dilution into subhaploid samples. It is a uniquely versatile tool, allowing for the construction of genome maps with flexible ranges and resolutions. Moreover, it is applicable to plant genomes, for which many of the techniques pioneered in animal genomes are inapplicable or inappropriate. We report here its demonstration in a plant genome by reconstructing the physical map of a 1.9 Mbp region around the FCA locus of Arabidopsis thaliana. The resulting map, spanning around 10% of chromosome 4, is in excellent agreement with the DNA sequence and has a mean marker spacing of 16 kbp. We argue that HAPPY maps of any required resolution can be made immediately and with relatively little effort for most plant species and, furthermore, that such maps can greatly aid the construction of regional or genome-wide physical maps.

  5. Genome wide DNA copy number analysis in cholangiocarcinoma using high resolution molecular inversion probe single nucleotide polymorphism assay.

    PubMed

    Arnold, Alexander; Bahra, Marcus; Lenze, Dido; Bradtmöller, Maren; Guse, Katrin; Gehlhaar, Claire; Bläker, Hendrik; Heppner, Frank L; Koch, Arend

    2015-10-01

    In order to study molecular similarities and differences of intrahepatic (IH-CCA) and extrahepatic (EH-CCA) cholangiocarcinoma, 24 FFPE tumor samples (13 IH-CCA, 11 EH-CCA) were analyzed for whole genome copy number variations (CNVs) using a new high-density Molecular Inversion Probe Single Nucleotide Polymorphism (MIP SNP) assay. Common in both tumor subtypes the most frequent losses were detected on chromosome 1p, 3p, 6q and 9 while gains were mostly seen in 1q, 8q as well as complete chromosome 17 and 20. Applying the statistical GISTIC (Genomic Identification of Significant Targets in Cancer) tool we identified potential novel candidate tumor suppressor- (DBC1, FHIT, PPP2R2A) and oncogenes (LYN, FGF19, GRB7, PTPN1) within these regions of chromosomal instability. Next to common aberrations in IH-CCA and EH-CCA, we additionally found significant differences in copy number variations on chromosome 3 and 14. Moreover, due to the fact that mutations in the Isocitrate dehydrogenase (IDH-1 and IDH-2) genes are more frequent in our IH-CCA than in our EH-CCA samples, we suggest that the tumor subtypes have a different molecular profile. In conclusion, new possible target genes within regions of high significant copy number aberrations were detected using a high-density Molecular Inversion Probe Single Nucleotide Polymorphism (MIP SNP) assay, which opens a future perspective of fast routine copy number and marker gene identification for gene targeted therapy.

  6. High resolution analysis

    NASA Technical Reports Server (NTRS)

    Robinove, C. J.

    1982-01-01

    The possibilities for the use of high spectral resolution analysis in the field of hydrology and water resources are examined. Critical gaps in scientific knowledge that must be filled before technology can be evaluated involve the spectral response of water, substances dissolved and suspended in water, and substances floating on water. The most complete mapping of oil slicks can be done in the ultraviolet region. A mean of measuring the ultraviolet reflection at the surface from satellite altitudes needs to be determined. The use of high spectral resolution sensors in a reasonable number of narrow bands may be able to sense the reflectance or emission characteristics of water and its contained materials that can be correlated with commonly used water quality variables. Technological alternative available to experiment with problems of sensing water quality are to use existing remote sensing instrumentation in an empirical mode and to develop instruments for either testing hypoteses or conducting empirical experiments.

  7. High-Resolution Analysis by Whole-Genome Sequencing of an International Lineage (Sequence Type 111) of Pseudomonas aeruginosa Associated with Metallo-Carbapenemases in the United Kingdom

    PubMed Central

    Wright, Laura; Underwood, Anthony; Witney, Adam A.; Chan, Yuen-Ting; Al-Shahib, Ali; Arnold, Catherine; Doumith, Michel; Patel, Bharat; Planche, Timothy D.; Green, Jonathan; Holliman, Richard; Woodford, Neil

    2015-01-01

    Whole-genome sequencing (WGS) was carried out on 87 isolates of sequence type 111 (ST-111) of Pseudomonas aeruginosa collected between 2005 and 2014 from 65 patients and 12 environmental isolates from 24 hospital laboratories across the United Kingdom on an Illumina HiSeq instrument. Most isolates (73) carried VIM-2, but others carried IMP-1 or IMP-13 (5) or NDM-1 (1); one isolate had VIM-2 and IMP-18, and 7 carried no metallo-beta-lactamase (MBL) gene. Single nucleotide polymorphism analysis divided the isolates into distinct clusters; the NDM-1 isolate was an outlier, and the IMP isolates and 6/7 MBL-negative isolates clustered separately from the main set of 73 VIM-2 isolates. Within the VIM-2 set, there were at least 3 distinct clusters, including a tightly clustered set of isolates from 3 hospital laboratories consistent with an outbreak from a single introduction that was quickly brought under control and a much broader set dominated by isolates from a long-running outbreak in a London hospital likely seeded from an environmental source, requiring different control measures; isolates from 7 other hospital laboratories in London and southeast England were also included. Bayesian evolutionary analysis indicated that all the isolates shared a common ancestor dating back ∼50 years (1960s), with the main VIM-2 set separating approximately 20 to 30 years ago. Accessory gene profiling revealed blocks of genes associated with particular clusters, with some having high similarity (≥95%) to bacteriophage genes. WGS of widely found international lineages such as ST-111 provides the necessary resolution to inform epidemiological investigations and intervention policies. PMID:26041902

  8. High-Resolution Analysis by Whole-Genome Sequencing of an International Lineage (Sequence Type 111) of Pseudomonas aeruginosa Associated with Metallo-Carbapenemases in the United Kingdom.

    PubMed

    Turton, Jane F; Wright, Laura; Underwood, Anthony; Witney, Adam A; Chan, Yuen-Ting; Al-Shahib, Ali; Arnold, Catherine; Doumith, Michel; Patel, Bharat; Planche, Timothy D; Green, Jonathan; Holliman, Richard; Woodford, Neil

    2015-08-01

    Whole-genome sequencing (WGS) was carried out on 87 isolates of sequence type 111 (ST-111) of Pseudomonas aeruginosa collected between 2005 and 2014 from 65 patients and 12 environmental isolates from 24 hospital laboratories across the United Kingdom on an Illumina HiSeq instrument. Most isolates (73) carried VIM-2, but others carried IMP-1 or IMP-13 (5) or NDM-1 (1); one isolate had VIM-2 and IMP-18, and 7 carried no metallo-beta-lactamase (MBL) gene. Single nucleotide polymorphism analysis divided the isolates into distinct clusters; the NDM-1 isolate was an outlier, and the IMP isolates and 6/7 MBL-negative isolates clustered separately from the main set of 73 VIM-2 isolates. Within the VIM-2 set, there were at least 3 distinct clusters, including a tightly clustered set of isolates from 3 hospital laboratories consistent with an outbreak from a single introduction that was quickly brought under control and a much broader set dominated by isolates from a long-running outbreak in a London hospital likely seeded from an environmental source, requiring different control measures; isolates from 7 other hospital laboratories in London and southeast England were also included. Bayesian evolutionary analysis indicated that all the isolates shared a common ancestor dating back ∼50 years (1960s), with the main VIM-2 set separating approximately 20 to 30 years ago. Accessory gene profiling revealed blocks of genes associated with particular clusters, with some having high similarity (≥95%) to bacteriophage genes. WGS of widely found international lineages such as ST-111 provides the necessary resolution to inform epidemiological investigations and intervention policies.

  9. High-Resolution Analysis by Whole-Genome Sequencing of an International Lineage (Sequence Type 111) of Pseudomonas aeruginosa Associated with Metallo-Carbapenemases in the United Kingdom.

    PubMed

    Turton, Jane F; Wright, Laura; Underwood, Anthony; Witney, Adam A; Chan, Yuen-Ting; Al-Shahib, Ali; Arnold, Catherine; Doumith, Michel; Patel, Bharat; Planche, Timothy D; Green, Jonathan; Holliman, Richard; Woodford, Neil

    2015-08-01

    Whole-genome sequencing (WGS) was carried out on 87 isolates of sequence type 111 (ST-111) of Pseudomonas aeruginosa collected between 2005 and 2014 from 65 patients and 12 environmental isolates from 24 hospital laboratories across the United Kingdom on an Illumina HiSeq instrument. Most isolates (73) carried VIM-2, but others carried IMP-1 or IMP-13 (5) or NDM-1 (1); one isolate had VIM-2 and IMP-18, and 7 carried no metallo-beta-lactamase (MBL) gene. Single nucleotide polymorphism analysis divided the isolates into distinct clusters; the NDM-1 isolate was an outlier, and the IMP isolates and 6/7 MBL-negative isolates clustered separately from the main set of 73 VIM-2 isolates. Within the VIM-2 set, there were at least 3 distinct clusters, including a tightly clustered set of isolates from 3 hospital laboratories consistent with an outbreak from a single introduction that was quickly brought under control and a much broader set dominated by isolates from a long-running outbreak in a London hospital likely seeded from an environmental source, requiring different control measures; isolates from 7 other hospital laboratories in London and southeast England were also included. Bayesian evolutionary analysis indicated that all the isolates shared a common ancestor dating back ∼50 years (1960s), with the main VIM-2 set separating approximately 20 to 30 years ago. Accessory gene profiling revealed blocks of genes associated with particular clusters, with some having high similarity (≥95%) to bacteriophage genes. WGS of widely found international lineages such as ST-111 provides the necessary resolution to inform epidemiological investigations and intervention policies. PMID:26041902

  10. High resolution comparative genomic hybridisation in clinical cytogenetics

    PubMed Central

    Kirchhoff, M.; Rose, H.; Lundsteen, C.

    2001-01-01

    High resolution comparative genomic hybridisation (HR-CGH) is a diagnostic tool in our clinical cytogenetics laboratory. The present survey reports the results of 253 clinical cases in which 47 abnormalities were detected. Among 144 dysmorphic and mentally retarded subjects with a normal conventional karyotype, 15 (10%) had small deletions or duplications, of which 11 were interstitial. In addition, a case of mosaic trisomy 9 was detected. Among 25 dysmorphic and mentally retarded subjects carrying apparently balanced de novo translocations, four had deletions at translocation breakpoints and two had deletions elsewhere in the genome. Seventeen of 19 complex rearrangements were clarified by HR-CGH. A small supernumerary marker chromosome occurring with low frequency and the breakpoint of a mosaic r(18) case could not be clarified. Three of 19 other abnormalities could not be confirmed by HR-CGH. One was a Williams syndrome deletion and two were DiGeorge syndrome deletions, which were apparently below the resolution of HR-CGH. However, we were able to confirm Angelman and Prader-Willi syndrome deletions, which are about 3-5 Mb. We conclude that HR-CGH should be used for the evaluation of (1) dysmorphic and mentally retarded subjects where normal karyotyping has failed to show abnormalities, (2) dysmorphic and mentally retarded subjects carrying apparently balanced de novo translocations, (3) apparently balanced de novo translocations detected prenatally, and (4) for clarification of complex structural rearrangements.


Keywords: comparative genomic hybridisation; chromosome analysis; chromosome aberrations; dysmorphism PMID:11694545

  11. Genome-wide analysis of human global and transcription-coupled excision repair of UV damage at single-nucleotide resolution.

    PubMed

    Hu, Jinchuan; Adar, Sheera; Selby, Christopher P; Lieb, Jason D; Sancar, Aziz

    2015-05-01

    We developed a method for genome-wide mapping of DNA excision repair named XR-seq (excision repair sequencing). Human nucleotide excision repair generates two incisions surrounding the site of damage, creating an ∼30-mer. In XR-seq, this fragment is isolated and subjected to high-throughput sequencing. We used XR-seq to produce stranded, nucleotide-resolution maps of repair of two UV-induced DNA damages in human cells: cyclobutane pyrimidine dimers (CPDs) and (6-4) pyrimidine-pyrimidone photoproducts [(6-4)PPs]. In wild-type cells, CPD repair was highly associated with transcription, specifically with the template strand. Experiments in cells defective in either transcription-coupled excision repair or general excision repair isolated the contribution of each pathway to the overall repair pattern and showed that transcription-coupled repair of both photoproducts occurs exclusively on the template strand. XR-seq maps capture transcription-coupled repair at sites of divergent gene promoters and bidirectional enhancer RNA (eRNA) production at enhancers. XR-seq data also uncovered the repair characteristics and novel sequence preferences of CPDs and (6-4)PPs. XR-seq and the resulting repair maps will facilitate studies of the effects of genomic location, chromatin context, transcription, and replication on DNA repair in human cells.

  12. CGAT: computational genomics analysis toolkit.

    PubMed

    Sims, David; Ilott, Nicholas E; Sansom, Stephen N; Sudbery, Ian M; Johnson, Jethro S; Fawcett, Katherine A; Berlanga-Taylor, Antonio J; Luna-Valero, Sebastian; Ponting, Chris P; Heger, Andreas

    2014-05-01

    Computational genomics seeks to draw biological inferences from genomic datasets, often by integrating and contextualizing next-generation sequencing data. CGAT provides an extensive suite of tools designed to assist in the analysis of genome scale data from a range of standard file formats. The toolkit enables filtering, comparison, conversion, summarization and annotation of genomic intervals, gene sets and sequences. The tools can both be run from the Unix command line and installed into visual workflow builders, such as Galaxy.

  13. Superfine resolution acoustooptic spectrum analysis

    NASA Technical Reports Server (NTRS)

    Ansari, Homayoon; Lesh, James R.

    1991-01-01

    High resolution spectrum analysis of RF signals is required in applications such as the search for extraterrestrial intelligence, RF interference monitoring, or general purpose decomposition of signals. Sub-Hertz resolution in three-dimensional acoustooptic spectrum analysis is theoretically and experimentally demonstrated. The operation of a two-dimensional acoustooptic spectrum analyzer is extended to include time integration over a sequence of CCD frames.

  14. Resolution analysis by random probing

    NASA Astrophysics Data System (ADS)

    Simutė, S.; Fichtner, A.; van Leeuwen, T.

    2015-12-01

    We develop and apply methods for resolution analysis in tomography, based on stochastic probing of the Hessian or resolution operators. Key properties of our methods are (i) low algorithmic complexity and easy implementation, (ii) applicability to any tomographic technique, including full-waveform inversion and linearized ray tomography, (iii) applicability in any spatial dimension and to inversions with a large number of model parameters, (iv) low computational costs that are mostly a fraction of those required for synthetic recovery tests, and (v) the ability to quantify both spatial resolution and inter-parameter trade-offs. Using synthetic full-waveform inversions as benchmarks, we demonstrate that auto-correlations of random-model applications to the Hessian yield various resolution measures, including direction- and position-dependent resolution lengths, and the strength of inter-parameter mappings. We observe that the required number of random test models is around 5 in one, two and three dimensions. This means that the proposed resolution analyses are not only more meaningful than recovery tests but also computationally less expensive. We demonstrate the applicability of our method in 3D real-data full-waveform inversions for the western Mediterranean and Japan. In addition to tomographic problems, resolution analysis by random probing may be used in other inverse methods that constrain continuously distributed properties, including electromagnetic and potential-field inversions, as well as recently emerging geodynamic data assimilation.

  15. Genome-wide single nucleotide polymorphism-based assay for high-resolution epidemiological analysis of the methicillin-resistant Staphylococcus aureus hospital clone EMRSA-15.

    PubMed

    Holmes, A; McAllister, G; McAdam, P R; Hsien Choi, S; Girvan, K; Robb, A; Edwards, G; Templeton, K; Fitzgerald, J R

    2014-02-01

    The EMRSA-15 clone is a major cause of nosocomial methicillin-resistant Staphylococcus aureus (MRSA) infections in the UK and elsewhere but existing typing methodologies have limited capacity to discriminate closely related strains, and are often poorly reproducible between laboratories. Here, we report the design, development and validation of a genome-wide single nucleotide polymorphism (SNP) typing method and compare it to established methods for typing of EMRSA-15. In order to identify discriminatory SNPs, the genomes of 17 EMRSA-15 strains, selected to represent the breadth of genotypic and phenotypic diversity of EMRSA-15 isolates in Scotland, were determined and phylogenetic reconstruction was carried out. In addition to 17 phylogenetically informative SNPs, five binary markers were included to form the basis of an EMRSA-15 genotyping assay. The SNP-based typing assay was as discriminatory as pulsed-field gel electrophoresis, and significantly more discriminatory than staphylococcal protein A (spa) typing for typing of a representative panel of diverse EMRSA-15 strains, isolates from two EMRSA-15 hospital outbreak investigations, and a panel of bacteraemia isolates obtained in healthcare facilities in the east of Scotland during a 12-month period. The assay is a rapid, and reproducible approach for epidemiological analysis of EMRSA-15 clinical isolates in Scotland. Unlike established methods the DNA sequence-based method is ideally suited for inter-laboratory comparison of identified genotypes, and its flexibility lends itself to supplementation with additional SNPs or markers for the identification of novel S. aureus strains in other regions of the world.

  16. Fast and Accurate Phylogenetic Reconstruction from High-Resolution Whole-Genome Data and a Novel Robustness Estimator

    NASA Astrophysics Data System (ADS)

    Lin, Yu; Rajan, Vaibhav; Moret, Bernard M. E.

    The rapid accumulation of whole-genome data has renewed interest in the study of genomic rearrangements. Comparative genomics, evolutionary biology, and cancer research all require models and algorithms to elucidate the mechanisms, history, and consequences of these rearrangements. However, even simple models lead to NP-hard problems, particularly in the area of phylogenetic analysis. Current approaches are limited to small collections of genomes and low-resolution data (typically a few hundred syntenic blocks). Moreover, whereas phylogenetic analyses from sequence data are deemed incomplete unless bootstrapping scores (a measure of confidence) are given for each tree edge, no equivalent to bootstrapping exists for rearrangement-based phylogenetic analysis.

  17. High-resolution linkage and quantitative trait locus mapping aided by genome survey sequencing: building up an integrative genomic framework for a bivalve mollusc.

    PubMed

    Jiao, Wenqian; Fu, Xiaoteng; Dou, Jinzhuang; Li, Hengde; Su, Hailin; Mao, Junxia; Yu, Qian; Zhang, Lingling; Hu, Xiaoli; Huang, Xiaoting; Wang, Yangfan; Wang, Shi; Bao, Zhenmin

    2014-02-01

    Genetic linkage maps are indispensable tools in genetic and genomic studies. Recent development of genotyping-by-sequencing (GBS) methods holds great promise for constructing high-resolution linkage maps in organisms lacking extensive genomic resources. In the present study, linkage mapping was conducted for a bivalve mollusc (Chlamys farreri) using a newly developed GBS method-2b-restriction site-associated DNA (2b-RAD). Genome survey sequencing was performed to generate a preliminary reference genome that was utilized to facilitate linkage and quantitative trait locus (QTL) mapping in C. farreri. A high-resolution linkage map was constructed with a marker density (3806) that has, to our knowledge, never been achieved in any other molluscs. The linkage map covered nearly the whole genome (99.5%) with a resolution of 0.41 cM. QTL mapping and association analysis congruously revealed two growth-related QTLs and one potential sex-determination region. An important candidate QTL gene named PROP1, which functions in the regulation of growth hormone production in vertebrates, was identified from the growth-related QTL region detected on the linkage group LG3. We demonstrate that this linkage map can serve as an important platform for improving genome assembly and unifying multiple genomic resources. Our study, therefore, exemplifies how to build up an integrative genomic framework in a non-model organism.

  18. High-Resolution Linkage and Quantitative Trait Locus Mapping Aided by Genome Survey Sequencing: Building Up An Integrative Genomic Framework for a Bivalve Mollusc

    PubMed Central

    Jiao, Wenqian; Fu, Xiaoteng; Dou, Jinzhuang; Li, Hengde; Su, Hailin; Mao, Junxia; Yu, Qian; Zhang, Lingling; Hu, Xiaoli; Huang, Xiaoting; Wang, Yangfan; Wang, Shi; Bao, Zhenmin

    2014-01-01

    Genetic linkage maps are indispensable tools in genetic and genomic studies. Recent development of genotyping-by-sequencing (GBS) methods holds great promise for constructing high-resolution linkage maps in organisms lacking extensive genomic resources. In the present study, linkage mapping was conducted for a bivalve mollusc (Chlamys farreri) using a newly developed GBS method—2b-restriction site-associated DNA (2b-RAD). Genome survey sequencing was performed to generate a preliminary reference genome that was utilized to facilitate linkage and quantitative trait locus (QTL) mapping in C. farreri. A high-resolution linkage map was constructed with a marker density (3806) that has, to our knowledge, never been achieved in any other molluscs. The linkage map covered nearly the whole genome (99.5%) with a resolution of 0.41 cM. QTL mapping and association analysis congruously revealed two growth-related QTLs and one potential sex-determination region. An important candidate QTL gene named PROP1, which functions in the regulation of growth hormone production in vertebrates, was identified from the growth-related QTL region detected on the linkage group LG3. We demonstrate that this linkage map can serve as an important platform for improving genome assembly and unifying multiple genomic resources. Our study, therefore, exemplifies how to build up an integrative genomic framework in a non-model organism. PMID:24107803

  19. Chapter 14: Cancer Genome Analysis

    PubMed Central

    Vazquez, Miguel; de la Torre, Victor; Valencia, Alfonso

    2012-01-01

    Although there is great promise in the benefits to be obtained by analyzing cancer genomes, numerous challenges hinder different stages of the process, from the problem of sample preparation and the validation of the experimental techniques, to the interpretation of the results. This chapter specifically focuses on the technical issues associated with the bioinformatics analysis of cancer genome data. The main issues addressed are the use of database and software resources, the use of analysis workflows and the presentation of clinically relevant action items. We attempt to aid new developers in the field by describing the different stages of analysis and discussing current approaches, as well as by providing practical advice on how to access and use resources, and how to implement recommendations. Real cases from cancer genome projects are used as examples. PMID:23300415

  20. Genome-Wide High-Resolution Mapping by Recurrent Intermating Using Arabidopsis Thaliana as a Model

    PubMed Central

    Liu, S. C.; Kowalski, S. P.; Lan, T. H.; Feldmann, K. A.; Paterson, A. H.

    1996-01-01

    We demonstrate a method for developing populations suitable for genome-wide high-resolution genetic linkage mapping, by recurrent intermating among F(2) individuals derived from crosses between homozygous parents. Comparison of intermated progenies to F(2) and ``recombinant inbred'' (RI) populations from the same pedigree corroborate theoretical expectations that progenies intermated for four generations harbor about threefold more information for estimating recombination fraction between closely linked markers than either RI-selfed or F(2) individuals (which are, in fact, equivalent in this regard). Although intermated populations are heterozygous, homozygous ``intermated recombinant inbred'' (IRI) populations can readily be generated, combining additional information afforded by intermating with the permanence of RI populations. Intermated populations permit fine-mapping of genetic markers throughout a genome, helping to bridge the gap between genetic map resolution and the DNA-carrying capacity of modern cloning vectors, thus facilitating merger of genetic and physical maps. Intermating can also facilitate high-resolution mapping of genes and QTLs, accelerating map-based cloning. Finally, intermated populations will facilitate investigation of other fundamental genetic questions requiring a genome-wide high-resolution analysis, such as comparative mapping of distantly related species, and the genetic basis of heterosis. PMID:8770602

  1. Integration of complete chloroplast genome sequences with small amplicon datasets improves phylogenetic resolution in Acacia.

    PubMed

    Williams, Anna V; Miller, Joseph T; Small, Ian; Nevill, Paul G; Boykin, Laura M

    2016-03-01

    Combining whole genome data with previously obtained amplicon sequences has the potential to increase the resolution of phylogenetic analyses, particularly at low taxonomic levels or where recent divergence, rapid speciation or slow genome evolution has resulted in limited sequence variation. However, the integration of these types of data for large scale phylogenetic studies has rarely been investigated. Here we conduct a phylogenetic analysis of the whole chloroplast genome and two nuclear ribosomal loci for 65 Acacia species from across the most recent Acacia phylogeny. We then combine this data with previously generated amplicon sequences (four chloroplast loci and two nuclear ribosomal loci) for 508 Acacia species. We use several phylogenetic methods, including maximum likelihood bootstrapping (with and without constraint) and ExaBayes, in order to determine the success of combining a dataset of 4000bp with one of 189,000bp. The results of our study indicate that the inclusion of whole genome data gave a far better resolved and well supported representation of the phylogenetic relationships within Acacia than using only amplicon sequences, with the greatest support observed when using a whole genome phylogeny as a constraint on the amplicon sequences. Our study therefore provides methods for optimal integration of genomic and amplicon sequences.

  2. Integration of complete chloroplast genome sequences with small amplicon datasets improves phylogenetic resolution in Acacia.

    PubMed

    Williams, Anna V; Miller, Joseph T; Small, Ian; Nevill, Paul G; Boykin, Laura M

    2016-03-01

    Combining whole genome data with previously obtained amplicon sequences has the potential to increase the resolution of phylogenetic analyses, particularly at low taxonomic levels or where recent divergence, rapid speciation or slow genome evolution has resulted in limited sequence variation. However, the integration of these types of data for large scale phylogenetic studies has rarely been investigated. Here we conduct a phylogenetic analysis of the whole chloroplast genome and two nuclear ribosomal loci for 65 Acacia species from across the most recent Acacia phylogeny. We then combine this data with previously generated amplicon sequences (four chloroplast loci and two nuclear ribosomal loci) for 508 Acacia species. We use several phylogenetic methods, including maximum likelihood bootstrapping (with and without constraint) and ExaBayes, in order to determine the success of combining a dataset of 4000bp with one of 189,000bp. The results of our study indicate that the inclusion of whole genome data gave a far better resolved and well supported representation of the phylogenetic relationships within Acacia than using only amplicon sequences, with the greatest support observed when using a whole genome phylogeny as a constraint on the amplicon sequences. Our study therefore provides methods for optimal integration of genomic and amplicon sequences. PMID:26702955

  3. Core Genome Multilocus Sequence Typing Scheme for High- Resolution Typing of Enterococcus faecium.

    PubMed

    de Been, Mark; Pinholt, Mette; Top, Janetta; Bletz, Stefan; Mellmann, Alexander; van Schaik, Willem; Brouwer, Ellen; Rogers, Malbert; Kraat, Yvette; Bonten, Marc; Corander, Jukka; Westh, Henrik; Harmsen, Dag; Willems, Rob J L

    2015-12-01

    Enterococcus faecium, a common inhabitant of the human gut, has emerged in the last 2 decades as an important multidrug-resistant nosocomial pathogen. Since the start of the 21st century, multilocus sequence typing (MLST) has been used to study the molecular epidemiology of E. faecium. However, due to the use of a small number of genes, the resolution of MLST is limited. Whole-genome sequencing (WGS) now allows for high-resolution tracing of outbreaks, but current WGS-based approaches lack standardization, rendering them less suitable for interlaboratory prospective surveillance. To overcome this limitation, we developed a core genome MLST (cgMLST) scheme for E. faecium. cgMLST transfers genome-wide single nucleotide polymorphism(SNP) diversity into a standardized and portable allele numbering system that is far less computationally intensive than SNP-based analysis of WGS data. The E. faecium cgMLST scheme was built using 40 genome sequences that represented the diversity of the species. The scheme consists of 1,423 cgMLST target genes. To test the performance of the scheme, we performed WGS analysis of 103 outbreak isolates from five different hospitals in the Netherlands, Denmark, and Germany. The cgMLST scheme performed well in distinguishing between epidemiologically related and unrelated isolates, even between those that had the same sequence type (ST), which denotes the higher discriminatory power of this cgMLST scheme over that of conventional MLST. We also show that in terms of resolution, the performance of the E. faecium cgMLST scheme is equivalent to that of an SNP-based approach. In conclusion, the cgMLST scheme developed in this study facilitates rapid, standardized, and high-resolution tracing of E. faecium outbreaks. PMID:26400782

  4. Analysis of genomic DNA with the UCSC genome browser.

    PubMed

    Pevsner, Jonathan

    2009-01-01

    Genomic DNA is being sequenced and annotated at a rapid rate, with terabases of DNA currently deposited in GenBank and other repositories. Genome browsers provide an essential collection of resources to visualize and analyze chromosomal DNA. The University of California, Santa Cruz (UCSC) Genome Browser provides annotations from the level of single nucleotides to whole chromosomes for four dozen metazoan and other species. The Genome Browser may be used to address a wide range of problems in bioinformatics (e.g., sequence analysis), comparative genomics, and evolution.

  5. High-resolution genetic maps of Eucalyptus improve Eucalyptus grandis genome assembly.

    PubMed

    Bartholomé, Jérôme; Mandrou, Eric; Mabiala, André; Jenkins, Jerry; Nabihoudine, Ibouniyamine; Klopp, Christophe; Schmutz, Jeremy; Plomion, Christophe; Gion, Jean-Marc

    2015-06-01

    Genetic maps are key tools in genetic research as they constitute the framework for many applications, such as quantitative trait locus analysis, and support the assembly of genome sequences. The resequencing of the two parents of a cross between Eucalyptus urophylla and Eucalyptus grandis was used to design a single nucleotide polymorphism (SNP) array of 6000 markers evenly distributed along the E. grandis genome. The genotyping of 1025 offspring enabled the construction of two high-resolution genetic maps containing 1832 and 1773 markers with an average marker interval of 0.45 and 0.5 cM for E. grandis and E. urophylla, respectively. The comparison between genetic maps and the reference genome highlighted 85% of collinear regions. A total of 43 noncollinear regions and 13 nonsynthetic regions were detected and corrected in the new genome assembly. This improved version contains 4943 scaffolds totalling 691.3 Mb of which 88.6% were captured by the 11 chromosomes. The mapping data were also used to investigate the effect of population size and number of markers on linkage mapping accuracy. This study provides the most reliable linkage maps for Eucalyptus and version 2.0 of the E. grandis genome. PMID:25385325

  6. BPGA- an ultra-fast pan-genome analysis pipeline

    PubMed Central

    Chaudhari, Narendrakumar M.; Gupta, Vinod Kumar; Dutta, Chitra

    2016-01-01

    Recent advances in ultra-high-throughput sequencing technology and metagenomics have led to a paradigm shift in microbial genomics from few genome comparisons to large-scale pan-genome studies at different scales of phylogenetic resolution. Pan-genome studies provide a framework for estimating the genomic diversity of the dataset, determining core (conserved), accessory (dispensable) and unique (strain-specific) gene pool of a species, tracing horizontal gene-flux across strains and providing insight into species evolution. The existing pan genome software tools suffer from various limitations like limited datasets, difficult installation/requirements, inadequate functional features etc. Here we present an ultra-fast computational pipeline BPGA (Bacterial Pan Genome Analysis tool) with seven functional modules. In addition to the routine pan genome analyses, BPGA introduces a number of novel features for downstream analyses like core/pan/MLST (Multi Locus Sequence Typing) phylogeny, exclusive presence/absence of genes in specific strains, subset analysis, atypical G + C content analysis and KEGG & COG mapping of core, accessory and unique genes. Other notable features include minimum running prerequisites, freedom to select the gene clustering method, ultra-fast execution, user friendly command line interface and high-quality graphics outputs. The performance of BPGA has been evaluated using a dataset of complete genome sequences of 28 Streptococcus pyogenes strains. PMID:27071527

  7. A High-Resolution Map of Segmental DNA Copy Number Variation in the Mouse Genome

    PubMed Central

    Graubert, Timothy A; Selzer, Rebecca R; Richmond, Todd A; Eis, Peggy S; Shannon, William D; Li, Xia; McLeod, Howard L; Cheverud, James M; Ley, Timothy J

    2007-01-01

    Submicroscopic (less than 2 Mb) segmental DNA copy number changes are a recently recognized source of genetic variability between individuals. The biological consequences of copy number variants (CNVs) are largely undefined. In some cases, CNVs that cause gene dosage effects have been implicated in phenotypic variation. CNVs have been detected in diverse species, including mice and humans. Published studies in mice have been limited by resolution and strain selection. We chose to study 21 well-characterized inbred mouse strains that are the focus of an international effort to measure, catalog, and disseminate phenotype data. We performed comparative genomic hybridization using long oligomer arrays to characterize CNVs in these strains. This technique increased the resolution of CNV detection by more than an order of magnitude over previous methodologies. The CNVs range in size from 21 to 2,002 kb. Clustering strains by CNV profile recapitulates aspects of the known ancestry of these strains. Most of the CNVs (77.5%) contain annotated genes, and many (47.5%) colocalize with previously mapped segmental duplications in the mouse genome. We demonstrate that this technique can identify copy number differences associated with known polymorphic traits. The phenotype of previously uncharacterized strains can be predicted based on their copy number at these loci. Annotation of CNVs in the mouse genome combined with sequence-based analysis provides an important resource that will help define the genetic basis of complex traits. PMID:17206864

  8. Toward high-resolution population genomics using archaeological samples.

    PubMed

    Morozova, Irina; Flegontov, Pavel; Mikheyev, Alexander S; Bruskin, Sergey; Asgharian, Hosseinali; Ponomarenko, Petr; Klyuchnikov, Vladimir; ArunKumar, GaneshPrasad; Prokhortchouk, Egor; Gankin, Yuriy; Rogaev, Evgeny; Nikolsky, Yuri; Baranova, Ancha; Elhaik, Eran; Tatarinova, Tatiana V

    2016-08-01

    The term 'ancient DNA' (aDNA) is coming of age, with over 1,200 hits in the PubMed database, beginning in the early 1980s with the studies of 'molecular paleontology'. Rooted in cloning and limited sequencing of DNA from ancient remains during the pre-PCR era, the field has made incredible progress since the introduction of PCR and next-generation sequencing. Over the last decade, aDNA analysis ushered in a new era in genomics and became the method of choice for reconstructing the history of organisms, their biogeography, and migration routes, with applications in evolutionary biology, population genetics, archaeogenetics, paleo-epidemiology, and many other areas. This change was brought by development of new strategies for coping with the challenges in studying aDNA due to damage and fragmentation, scarce samples, significant historical gaps, and limited applicability of population genetics methods. In this review, we describe the state-of-the-art achievements in aDNA studies, with particular focus on human evolution and demographic history. We present the current experimental and theoretical procedures for handling and analysing highly degraded aDNA. We also review the challenges in the rapidly growing field of ancient epigenomics. Advancement of aDNA tools and methods signifies a new era in population genetics and evolutionary medicine research. PMID:27436340

  9. Toward high-resolution population genomics using archaeological samples

    PubMed Central

    Morozova, Irina; Flegontov, Pavel; Mikheyev, Alexander S.; Bruskin, Sergey; Asgharian, Hosseinali; Ponomarenko, Petr; Klyuchnikov, Vladimir; ArunKumar, GaneshPrasad; Prokhortchouk, Egor; Gankin, Yuriy; Rogaev, Evgeny; Nikolsky, Yuri; Baranova, Ancha; Elhaik, Eran; Tatarinova, Tatiana V.

    2016-01-01

    The term ‘ancient DNA’ (aDNA) is coming of age, with over 1,200 hits in the PubMed database, beginning in the early 1980s with the studies of ‘molecular paleontology’. Rooted in cloning and limited sequencing of DNA from ancient remains during the pre-PCR era, the field has made incredible progress since the introduction of PCR and next-generation sequencing. Over the last decade, aDNA analysis ushered in a new era in genomics and became the method of choice for reconstructing the history of organisms, their biogeography, and migration routes, with applications in evolutionary biology, population genetics, archaeogenetics, paleo-epidemiology, and many other areas. This change was brought by development of new strategies for coping with the challenges in studying aDNA due to damage and fragmentation, scarce samples, significant historical gaps, and limited applicability of population genetics methods. In this review, we describe the state-of-the-art achievements in aDNA studies, with particular focus on human evolution and demographic history. We present the current experimental and theoretical procedures for handling and analysing highly degraded aDNA. We also review the challenges in the rapidly growing field of ancient epigenomics. Advancement of aDNA tools and methods signifies a new era in population genetics and evolutionary medicine research. PMID:27436340

  10. Proteogenomic Analysis of Mycobacterium smegmatis Using High Resolution Mass Spectrometry

    PubMed Central

    Potgieter, Matthys G.; Nakedi, Kehilwe C.; Ambler, Jon M.; Nel, Andrew J. M.; Garnett, Shaun; Soares, Nelson C.; Mulder, Nicola; Blackburn, Jonathan M.

    2016-01-01

    Biochemical evidence is vital for accurate genome annotation. The integration of experimental data collected at the proteome level using high resolution mass spectrometry allows for improvements in genome annotation by providing evidence for novel gene models, while validating or modifying others. Here, we report the results of a proteogenomic analysis of a reference strain of Mycobacterium smegmatis (mc2155), a fast growing model organism for the pathogenic Mycobacterium tuberculosis—the causative agent for Tuberculosis. By integrating high throughput LC/MS/MS proteomic data with genomic six frame translation and ab initio gene prediction databases, a total of 2887 ORFs were identified, including 2810 ORFs annotated to a Reference protein, and 63 ORFs not previously annotated to a Reference protein. Further, the translational start site (TSS) was validated for 558 Reference proteome gene models, while upstream translational evidence was identified for 81. In addition, N-terminus derived peptide identifications allowed for downstream TSS modification of a further 24 gene models. We validated the existence of six previously described interrupted coding sequences at the peptide level, and provide evidence for four novel frameshift positions. Analysis of peptide posterior error probability (PEP) scores indicates high-confidence novel peptide identifications and shows that the genome of M. smegmatis mc2155 is not yet fully annotated. Data are available via ProteomeXchange with identifier PXD003500. PMID:27092112

  11. The Cancer Genome Atlas ovarian cancer analysis

    Cancer.gov

    An analysis of genomic changes in ovarian cancer has provided the most comprehensive and integrated view of cancer genes for any cancer type to date. Ovarian serous adenocarcinoma tumors from 500 patients were examined by The Cancer Genome Atlas (TCGA) Re

  12. High-resolution functional profiling of the norovirus genome.

    PubMed

    Thorne, Lucy; Bailey, Dalan; Goodfellow, Ian

    2012-11-01

    Human noroviruses (HuNoV) are a major cause of nonbacterial gastroenteritis worldwide, yet details of the life cycle and replication of HuNoV are relatively unknown due to the lack of an efficient cell culture system. Studies with murine norovirus (MNV), which can be propagated in permissive cells, have begun to probe different aspects of the norovirus life cycle; however, our understanding of the specific functions of the viral proteins lags far behind that of other RNA viruses. Genome-wide functional profiling by insertional mutagenesis can reveal protein domains essential for replication and can lead to generation of tagged viruses, which has not yet been achieved for noroviruses. Here, transposon-mediated insertional mutagenesis was used to create 5 libraries of mutagenized MNV infectious clones, each containing a 15-nucleotide sequence randomly inserted within a defined region of the genome. Infectious virus was recovered from each library and was subsequently passaged in cell culture to determine the effect of each insertion by insertion-specific fluorescent PCR profiling. Genome-wide profiling of over 2,000 insertions revealed essential protein domains and confirmed known functional motifs. As validation, several insertion sites were introduced into a wild-type clone, successfully allowing the recovery of infectious virus. Screening of a number of reporter proteins and epitope tags led to the generation of the first infectious epitope-tagged noroviruses carrying the FLAG epitope tag in either NS4 or VP2. Subsequent work confirmed that epitope-tagged fully infectious noroviruses may be of use in the dissection of the molecular interactions that occur within the viral replication complex. PMID:22915807

  13. Genome Sequencing and Analysis Conference IV

    SciTech Connect

    Not Available

    1993-12-31

    J. Craig Venter and C. Thomas Caskey co-chaired Genome Sequencing and Analysis Conference IV held at Hilton Head, South Carolina from September 26--30, 1992. Venter opened the conference by noting that approximately 400 researchers from 16 nations were present four times as many participants as at Genome Sequencing Conference I in 1989. Venter also introduced the Data Fair, a new component of the conference allowing exchange and on-site computer analysis of unpublished sequence data.

  14. High-resolution FISH of the entire integrated Epstein-Barr virus genome on extended human DNA.

    PubMed

    Lestou, V S; Strehl, S; Lion, T; Gadner, H; Ambros, P F

    1996-01-01

    Here we report a high-resolution fluorescence in situ hybridization (FISH) analysis of the integrated Epstein-Barr virus (EBV) genome in chromosomes, decondensed interphase nuclear chromatin, and linearly extended chromatin fibers. We analyzed the EBV DNA integrated into the human genome in the well-characterized Burkitt's lymphoma cell line Namalwa, which contains two complete EBV genomes. The integration occurs via the terminal repeats of the virus and was always detectable at chromosome band 1p35. Using the biotinylated BamHIW fragment of the viral DNA, we observed distinct pairs of signals or small nuclear RNA "tracks" within interphase nuclei. FISH to stretched DNA fibers has a higher resolving power and; therefore, enables analysis of the structural organization of DNA. Application of this methodology to linearly extended chromatin of Namalwa cells using different EBV fragments allowed us to visualize the ordered arrangement of the integrated virus. Based on the predicted span of 0.34 nm per base pair for relaxed DNA, length measurements of 30 images showed a good correlation between the mean physical length of hybridized EBV DNA of 52.8 microns (158 kb) without the terminal repeats, and the EBV genomic length of 172 kb, including the terminal repeats. This DNA mapping procedure represents a useful tool for studying the structural organization of integrated viral genomes, and its application will have implications for the understanding of integration processes. PMID:8941376

  15. Whole-Genome Sequencing in Outbreak Analysis

    PubMed Central

    Turner, Stephen D.; Riley, Margaret F.; Petri, William A.; Hewlett, Erik L.

    2015-01-01

    SUMMARY In addition to the ever-present concern of medical professionals about epidemics of infectious diseases, the relative ease of access and low cost of obtaining, producing, and disseminating pathogenic organisms or biological toxins mean that bioterrorism activity should also be considered when facing a disease outbreak. Utilization of whole-genome sequencing (WGS) in outbreak analysis facilitates the rapid and accurate identification of virulence factors of the pathogen and can be used to identify the path of disease transmission within a population and provide information on the probable source. Molecular tools such as WGS are being refined and advanced at a rapid pace to provide robust and higher-resolution methods for identifying, comparing, and classifying pathogenic organisms. If these methods of pathogen characterization are properly applied, they will enable an improved public health response whether a disease outbreak was initiated by natural events or by accidental or deliberate human activity. The current application of next-generation sequencing (NGS) technology to microbial WGS and microbial forensics is reviewed. PMID:25876885

  16. Whole-genome sequencing in outbreak analysis.

    PubMed

    Gilchrist, Carol A; Turner, Stephen D; Riley, Margaret F; Petri, William A; Hewlett, Erik L

    2015-07-01

    In addition to the ever-present concern of medical professionals about epidemics of infectious diseases, the relative ease of access and low cost of obtaining, producing, and disseminating pathogenic organisms or biological toxins mean that bioterrorism activity should also be considered when facing a disease outbreak. Utilization of whole-genome sequencing (WGS) in outbreak analysis facilitates the rapid and accurate identification of virulence factors of the pathogen and can be used to identify the path of disease transmission within a population and provide information on the probable source. Molecular tools such as WGS are being refined and advanced at a rapid pace to provide robust and higher-resolution methods for identifying, comparing, and classifying pathogenic organisms. If these methods of pathogen characterization are properly applied, they will enable an improved public health response whether a disease outbreak was initiated by natural events or by accidental or deliberate human activity. The current application of next-generation sequencing (NGS) technology to microbial WGS and microbial forensics is reviewed. PMID:25876885

  17. Comparative Genome Analysis of Enterobacter cloacae

    PubMed Central

    Liu, Wing-Yee; Wong, Chi-Fat; Chung, Karl Ming-Kar; Jiang, Jing-Wei; Leung, Frederick Chi-Ching

    2013-01-01

    The Enterobacter cloacae species includes an extremely diverse group of bacteria that are associated with plants, soil and humans. Publication of the complete genome sequence of the plant growth-promoting endophytic E. cloacae subsp. cloacae ENHKU01 provided an opportunity to perform the first comparative genome analysis between strains of this dynamic species. Examination of the pan-genome of E. cloacae showed that the conserved core genome retains the general physiological and survival genes of the species, while genomic factors in plasmids and variable regions determine the virulence of the human pathogenic E. cloacae strain; additionally, the diversity of fimbriae contributes to variation in colonization and host determination of different E. cloacae strains. Comparative genome analysis further illustrated that E. cloacae strains possess multiple mechanisms for antagonistic action against other microorganisms, which involve the production of siderophores and various antimicrobial compounds, such as bacteriocins, chitinases and antibiotic resistance proteins. The presence of Type VI secretion systems is expected to provide further fitness advantages for E. cloacae in microbial competition, thus allowing it to survive in different environments. Competition assays were performed to support our observations in genomic analysis, where E. cloacae subsp. cloacae ENHKU01 demonstrated antagonistic activities against a wide range of plant pathogenic fungal and bacterial species. PMID:24069314

  18. The Caenorhabditis elegans genome: a multifractal analysis.

    PubMed

    Vélez, P E; Garreta, L E; Martínez, E; Díaz, N; Amador, S; Tischer, I; Gutiérrez, J M; Moreno, P A

    2010-05-25

    The Caenorhabditis elegans genome has several regular and irregular characteristics in its nucleotide composition; these are observed within and between chromosomes. To study these particularities, we carried out a multifractal analysis, which requires a large number of exponents to characterize scaling properties. We looked for a relationship between the genetic information content of the chromosomes and multifractal parameters and found less multifractality compared to the human genome. Differences in multifractality among chromosomes and in regions of chromosomes, and two group averages of chromosome regions were observed. All these differences were mainly dependent on differences in the contents of repetitive DNA. Based on these properties, we propose a nonlinear model for the structure of the C. elegans genome, with some biological implications. These results suggest that examining differences in multifractality is a viable approach for measuring local variations of genomic information contents along chromosomes. This approach could be extended to other genomes in order to characterize structural and functional regions of chromosomes.

  19. Resolution or Analysis Scale: What Matters Most?

    NASA Astrophysics Data System (ADS)

    Miller, Bradley

    2016-04-01

    Identifying the scale at which different covariates best explain the variation of soil properties reflects the geographic strategy of using map generalization (relative size of map delineations) to identify the scale at which phenomena occur. The size of map delineations corresponds to resolution in raster data models. Although not always considered in digital soil mapping studies, resolution is widely recognized as an important factor in identifying covariates in digital spatial analysis. However, many variables that are useful as predictors in digital soil mapping are dependent upon spatial context. For example, the slope gradient at a specific location can only be calculated by considering the surrounding area. In these cases, an analysis neighborhood is used when calculating such variables using a raster data model. The context or area considered is then dependent upon both the resolution and the number of cells (window size) used to define the neighborhood. This presentation explores the difference between resolution and analysis scale, then tests which concept is most important for identifying optimal scales of correlation for digital soil informatics.

  20. A high-resolution radiation hybrid map of the bovine genome

    Technology Transfer Automated Retrieval System (TEKTRAN)

    We are building high-resolution radiation hybrid maps of all 29 bovine autosomes and chromosome X, using a 58,000-marker genotyping assay, and a 12,000-rad whole-genome radiation hybrid (RH) panel. To accommodate the large number of markers, and to automate the map building procedure, a software pip...

  1. Comprehensive prediction of chromosome dimer resolution sites in bacterial genomes

    PubMed Central

    2011-01-01

    Background During the replication process of bacteria with circular chromosomes, an odd number of homologous recombination events results in concatenated dimer chromosomes that cannot be partitioned into daughter cells. However, many bacteria harbor a conserved dimer resolution machinery consisting of one or two tyrosine recombinases, XerC and XerD, and their 28-bp target site, dif. Results To study the evolution of the dif/XerCD system and its relationship with replication termination, we report the comprehensive prediction of dif sequences in silico using a phylogenetic prediction approach based on iterated hidden Markov modeling. Using this method, dif sites were identified in 641 organisms among 16 phyla, with a 97.64% identification rate for single-chromosome strains. The dif sequence positions were shown to be strongly correlated with the GC skew shift-point that is induced by replicational mutation/selection pressures, but the difference in the positions of the predicted dif sites and the GC skew shift-points did not correlate with the degree of replicational mutation/selection pressures. Conclusions The sequence of dif sites is widely conserved among many bacterial phyla, and they can be computationally identified using our method. The lack of correlation between dif position and the degree of GC skew suggests that replication termination does not occur strictly at dif sites. PMID:21223577

  2. Comparative analysis of methods for genome-wide nucleosome cartography.

    PubMed

    Quintales, Luis; Vázquez, Enrique; Antequera, Francisco

    2015-07-01

    Nucleosomes contribute to compacting the genome into the nucleus and regulate the physical access of regulatory proteins to DNA either directly or through the epigenetic modifications of the histone tails. Precise mapping of nucleosome positioning across the genome is, therefore, essential to understanding the genome regulation. In recent years, several experimental protocols have been developed for this purpose that include the enzymatic digestion, chemical cleavage or immunoprecipitation of chromatin followed by next-generation sequencing of the resulting DNA fragments. Here, we compare the performance and resolution of these methods from the initial biochemical steps through the alignment of the millions of short-sequence reads to a reference genome to the final computational analysis to generate genome-wide maps of nucleosome occupancy. Because of the lack of a unified protocol to process data sets obtained through the different approaches, we have developed a new computational tool (NUCwave), which facilitates their analysis, comparison and assessment and will enable researchers to choose the most suitable method for any particular purpose. NUCwave is freely available at http://nucleosome.usal.es/nucwave along with a step-by-step protocol for its use. PMID:25296770

  3. Comparative analysis of methods for genome-wide nucleosome cartography.

    PubMed

    Quintales, Luis; Vázquez, Enrique; Antequera, Francisco

    2015-07-01

    Nucleosomes contribute to compacting the genome into the nucleus and regulate the physical access of regulatory proteins to DNA either directly or through the epigenetic modifications of the histone tails. Precise mapping of nucleosome positioning across the genome is, therefore, essential to understanding the genome regulation. In recent years, several experimental protocols have been developed for this purpose that include the enzymatic digestion, chemical cleavage or immunoprecipitation of chromatin followed by next-generation sequencing of the resulting DNA fragments. Here, we compare the performance and resolution of these methods from the initial biochemical steps through the alignment of the millions of short-sequence reads to a reference genome to the final computational analysis to generate genome-wide maps of nucleosome occupancy. Because of the lack of a unified protocol to process data sets obtained through the different approaches, we have developed a new computational tool (NUCwave), which facilitates their analysis, comparison and assessment and will enable researchers to choose the most suitable method for any particular purpose. NUCwave is freely available at http://nucleosome.usal.es/nucwave along with a step-by-step protocol for its use.

  4. Microarray analysis at single molecule resolution

    PubMed Central

    Mureşan, Leila; Jacak, Jarosław; Klement, Erich Peter; Hesse, Jan; Schütz, Gerhard J.

    2010-01-01

    Bioanalytical chip-based assays have been enormously improved in sensitivity in the recent years; detection of trace amounts of substances down to the level of individual fluorescent molecules has become state of the art technology. The impact of such detection methods, however, has yet not fully been exploited, mainly due to a lack in appropriate mathematical tools for robust data analysis. One particular example relates to the analysis of microarray data. While classical microarray analysis works at resolutions of two to 20 micrometers and quantifies the abundance of target molecules by determining average pixel intensities, a novel high resolution approach [1] directly visualizes individual bound molecules as diffraction limited peaks. The now possible quantification via counting is less susceptible to labeling artifacts and background noise. We have developed an approach for the analysis of high-resolution microarray images. It consists first of a single molecule detection step, based on undecimated wavelet transforms, and second, of a spot identification step via spatial statistics approach (corresponding to the segmentation step in the classical microarray analysis). The detection method was tested on simulated images with a concentration range of 0.001 to 0.5 molecules per square micron and signal-to-noise ratio (SNR) between 0.9 and 31.6. For SNR above 15 the false negatives relative error was below 15%. Separation of foreground/background proved reliable, in case foreground density exceeds background by a factor of 2. The method has also been applied to real data from high-resolution microarray measurements. PMID:20123580

  5. High-Resolution Genetic Map for Understanding the Effect of Genome-Wide Recombination Rate on Nucleotide Diversity in Watermelon

    PubMed Central

    Reddy, Umesh K.; Nimmakayala, Padma; Levi, Amnon; Abburi, Venkata Lakshmi; Saminathan, Thangasamy; Tomason, Yan. R.; Vajja, Gopinath; Reddy, Rishi; Abburi, Lavanya; Wehner, Todd C.; Ronin, Yefim; Karol, Abraham

    2014-01-01

    We used genotyping by sequencing to identify a set of 10,480 single nucleotide polymorphism (SNP) markers for constructing a high-resolution genetic map of 1096 cM for watermelon. We assessed the genome-wide variation in recombination rate (GWRR) across the map and found an association between GWRR and genome-wide nucleotide diversity. Collinearity between the map and the genome-wide reference sequence for watermelon was studied to identify inconsistency and chromosome rearrangements. We assessed genome-wide nucleotide diversity, linkage disequilibrium (LD), and selective sweep for wild, semi-wild, and domesticated accessions of Citrullus lanatus var. lanatus to track signals of domestication. Principal component analysis combined with chromosome-wide phylogenetic study based on 1563 SNPs obtained after LD pruning with minor allele frequency of 0.05 resolved the differences between semi-wild and wild accessions as well as relationships among worldwide sweet watermelon. Population structure analysis revealed predominant ancestries for wild, semi-wild, and domesticated watermelons as well as admixture of various ancestries that were important for domestication. Sliding window analysis of Tajima’s D across various chromosomes was used to resolve selective sweep. LD decay was estimated for various chromosomes. We identified a strong selective sweep on chromosome 3 consisting of important genes that might have had a role in sweet watermelon domestication. PMID:25227227

  6. High-resolution genetic map for understanding the effect of genome-wide recombination rate on nucleotide diversity in watermelon.

    PubMed

    Reddy, Umesh K; Nimmakayala, Padma; Levi, Amnon; Abburi, Venkata Lakshmi; Saminathan, Thangasamy; Tomason, Yan R; Vajja, Gopinath; Reddy, Rishi; Abburi, Lavanya; Wehner, Todd C; Ronin, Yefim; Karol, Abraham

    2014-09-15

    We used genotyping by sequencing to identify a set of 10,480 single nucleotide polymorphism (SNP) markers for constructing a high-resolution genetic map of 1096 cM for watermelon. We assessed the genome-wide variation in recombination rate (GWRR) across the map and found an association between GWRR and genome-wide nucleotide diversity. Collinearity between the map and the genome-wide reference sequence for watermelon was studied to identify inconsistency and chromosome rearrangements. We assessed genome-wide nucleotide diversity, linkage disequilibrium (LD), and selective sweep for wild, semi-wild, and domesticated accessions of Citrullus lanatus var. lanatus to track signals of domestication. Principal component analysis combined with chromosome-wide phylogenetic study based on 1563 SNPs obtained after LD pruning with minor allele frequency of 0.05 resolved the differences between semi-wild and wild accessions as well as relationships among worldwide sweet watermelon. Population structure analysis revealed predominant ancestries for wild, semi-wild, and domesticated watermelons as well as admixture of various ancestries that were important for domestication. Sliding window analysis of Tajima's D across various chromosomes was used to resolve selective sweep. LD decay was estimated for various chromosomes. We identified a strong selective sweep on chromosome 3 consisting of important genes that might have had a role in sweet watermelon domestication.

  7. Shape-based alignment of genomic landscapes in multi-scale resolution.

    PubMed

    Ashida, Hiroki; Asai, Kiyoshi; Hamada, Michiaki

    2012-08-01

    Due to dramatic advances in DNA technology, quantitative measures of annotation data can now be obtained in continuous coordinates across the entire genome, allowing various heterogeneous 'genomic landscapes' to emerge. Although much effort has been devoted to comparing DNA sequences, not much attention has been given to comparing these large quantities of data comprehensively. In this article, we introduce a method for rapidly detecting local regions that show high correlations between genomic landscapes. We overcame the size problem for genome-wide data by converting the data into series of symbols and then carrying out sequence alignment. We also decomposed the oscillation of the landscape data into different frequency bands before analysis, since the real genomic landscape is a mixture of embedded and confounded biological processes working at different scales in the cell nucleus. To verify the usefulness and generality of our method, we applied our approach to well investigated landscapes from the human genome, including several histone modifications. Furthermore, by applying our method to over 20 genomic landscapes in human and 12 in mouse, we found that DNA replication timing and the density of Alu insertions are highly correlated genome-wide in both species, even though the Alu elements have amplified independently in the two genomes. To our knowledge, this is the first method to align genomic landscapes at multiple scales according to their shape.

  8. A Distance Measure for Genome Phylogenetic Analysis

    NASA Astrophysics Data System (ADS)

    Cao, Minh Duc; Allison, Lloyd; Dix, Trevor

    Phylogenetic analyses of species based on single genes or parts of the genomes are often inconsistent because of factors such as variable rates of evolution and horizontal gene transfer. The availability of more and more sequenced genomes allows phylogeny construction from complete genomes that is less sensitive to such inconsistency. For such long sequences, construction methods like maximum parsimony and maximum likelihood are often not possible due to their intensive computational requirement. Another class of tree construction methods, namely distance-based methods, require a measure of distances between any two genomes. Some measures such as evolutionary edit distance of gene order and gene content are computational expensive or do not perform well when the gene content of the organisms are similar. This study presents an information theoretic measure of genetic distances between genomes based on the biological compression algorithm expert model. We demonstrate that our distance measure can be applied to reconstruct the consensus phylogenetic tree of a number of Plasmodium parasites from their genomes, the statistical bias of which would mislead conventional analysis methods. Our approach is also used to successfully construct a plausible evolutionary tree for the γ-Proteobacteria group whose genomes are known to contain many horizontally transferred genes.

  9. Comparative Genome Analysis in the Integrated Microbial Genomes(IMG) System

    SciTech Connect

    Kyrpides, Nikos C.; Markowitz, Victor M.

    2006-03-01

    Comparative genome analysis is critical for the effectiveexploration of a rapidly growing number of complete and draft sequencesfor microbial genomes. The Integrated Microbial Genomes (IMG) system(img.jgi.doe.gov) has been developed as a community resource thatprovides support for comparative analysis of microbial genomes in anintegrated context. IMG allows users to navigate the multidimensionalmicrobial genome data space and focus their analysis on a subset ofgenes, genomes, and functions of interest. IMG provides graphicalviewers, summaries and occurrence profile tools for comparing genes,pathways and functions (terms) across specific genomes. Genes can befurther examined using gene neighborhoods and compared with sequencealignment tools.

  10. High-Resolution DNA Melting Analysis in Plant Research.

    PubMed

    Simko, Ivan

    2016-06-01

    Genetic and genomic studies provide valuable insight into the inheritance, structure, organization, and function of genes. The knowledge gained from the analysis of plant genes is beneficial to all aspects of plant research, including crop improvement. New methods and tools are continually being developed to facilitate rapid and accurate mapping, sequencing, and analyzing of genes. Here, I review the recent progress in the application of high-resolution melting (HRM) analysis of DNA, a method that allows detecting polymorphism in double-stranded DNA by comparing profiles of melting curves. Use of HRM has expanded considerably in the past few years as the method was successfully applied for high-throughput genotyping, mapping genes, testing food products and seeds, and other areas of plant research. PMID:26827247

  11. Bioinformatics for analysis of poxvirus genomes.

    PubMed

    Da Silva, Melissa; Upton, Chris

    2012-01-01

    In recent years, there have been numerous unprecedented technological advances in the field of molecular biology; these include DNA sequencing, mass spectrometry of proteins, and microarray analysis of mRNA transcripts. Perhaps, however, it is the area of genomics, which has now generated the complete genome sequences of more than 100 poxviruses, that has had the greatest impact on the average virology researcher because the DNA sequence data is in constant use in many different ways by almost all molecular virologists. As this data resource grows, so does the importance of the availability of databases and software tools to enable the bench virologist to work with and make use of this (valuable/expensive) DNA sequence information. Thus, providing researchers with intuitive software to first select and reformat genomics data from large databases, second, to compare/analyze genomics data, and third, to view and interpret large and complex sets of results has become pivotal in enabling progress to be made in modern virology. This chapter is directed at the bench virologist and describes the software required for a number of common bioinformatics techniques that are useful for comparing and analyzing poxvirus genomes. In a number of examples, we also highlight the Viral Orthologous Clusters database system and integrated tools that we developed for the management and analysis of complete viral genomes.

  12. Sequencing and Analysis of Neanderthal Genomic DNA

    PubMed Central

    Noonan, James P.; Coop, Graham; Kudaravalli, Sridhar; Smith, Doug; Krause, Johannes; Alessi, Joe; Chen, Feng; Platt, Darren; Pääbo, Svante; Pritchard, Jonathan K.; Rubin, Edward M.

    2008-01-01

    Our knowledge of Neanderthals is based on a limited number of remains and artifacts from which we must make inferences about their biology, behavior, and relationship to ourselves. Here, we describe the characterization of these extinct hominids from a new perspective, based on the development of a Neanderthal metagenomic library and its high-throughput sequencing and analysis. Several lines of evidence indicate that the 65,250 base pairs of hominid sequence so far identified in the library are of Neanderthal origin, the strongest being the ascertainment of sequence identities between Neanderthal and chimpanzee at sites where the human genomic sequence is different. These results enabled us to calculate the human-Neanderthal divergence time based on multiple randomly distributed autosomal loci. Our analyses suggest that on average the Neanderthal genomic sequence we obtained and the reference human genome sequence share a most recent common ancestor ~706,000 years ago, and that the human and Neanderthal ancestral populations split ~370,000 years ago, before the emergence of anatomically modern humans. Our finding that the Neanderthal and human genomes are at least 99.5% identical led us to develop and successfully implement a targeted method for recovering specific ancient DNA sequences from metagenomic libraries. This initial analysis of the Neanderthal genome advances our understanding of the evolutionary relationship of Homo sapiens and Homo neanderthalensis and signifies the dawn of Neanderthal genomics. PMID:17110569

  13. Genomic signal analysis of pathogen variability

    NASA Astrophysics Data System (ADS)

    Cristea, Paul Dan

    2006-02-01

    The paper presents results in the study of pathogen variability by using genomic signals. The conversion of symbolic nucleotide sequences into digital signals offers the possibility to apply signal processing methods to the analysis of genomic data. The method is particularly well suited to characterize small size genomic sequences, such as those found in viruses and bacteria, being a promising tool in tracking the variability of pathogens, especially in the context of developing drug resistance. The paper is based on data downloaded from GenBank [32], and comprises results on the variability of the eight segments of the influenza type A, subtype H5N1, virus genome, and of the Hemagglutinin (HA) gene, for the H1, H2, H3, H4, H5 and H16 types. Data from human and avian virus isolates are used.

  14. Comparative analysis of the Borrelia garinii genome.

    PubMed

    Glöckner, G; Lehmann, R; Romualdi, A; Pradella, S; Schulte-Spechtel, U; Schilhabel, M; Wilske, B; Sühnel, J; Platzer, M

    2004-01-01

    Three members of the genus Borrelia (B.burgdorferi, B.garinii, B.afzelii) cause tick-borne borreliosis. Depending on the Borrelia species involved, the borreliosis differs in its clinical symptoms. Comparative genomics opens up a way to elucidate the underlying differences in Borrelia species. We analysed a low redundancy whole-genome shotgun (WGS) assembly of a B.garinii strain isolated from a patient with neuroborreliosis in comparison to the B.burgdorferi genome. This analysis reveals that most of the chromosome is conserved (92.7% identity on DNA as well as on amino acid level) in the two species, and no chromosomal rearrangement or larger insertions/deletions could be observed. Furthermore, two collinear plasmids (lp54 and cp26) seem to belong to the basic genome inventory of Borrelia species. These three collinear parts of the Borrelia genome encode 861 genes, which are orthologous in the two species examined. The majority of the genetic information of the other plasmids of B.burgdorferii is also present in B.garinii although orthology is not easy to define due to a high redundancy of the plasmid fraction. Yet, we did not find counterparts of the B.burgdorferi plasmids lp36 and lp38 or their respective gene repertoire in the B.garinii genome. Thus, phenotypic differences between the two species could be attributable to the presence or absence of these two plasmids as well as to the potentially positively selected genes. PMID:15547252

  15. Whole genome sequence analysis of Mycobacterium suricattae.

    PubMed

    Dippenaar, Anzaan; Parsons, Sven David Charles; Sampson, Samantha Leigh; van der Merwe, Ruben Gerhard; Drewe, Julian Ashley; Abdallah, Abdallah Musa; Siame, Kabengele Keith; Gey van Pittius, Nicolaas Claudius; van Helden, Paul David; Pain, Arnab; Warren, Robin Mark

    2015-12-01

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi.

  16. Whole genome sequence analysis of Mycobacterium suricattae.

    PubMed

    Dippenaar, Anzaan; Parsons, Sven David Charles; Sampson, Samantha Leigh; van der Merwe, Ruben Gerhard; Drewe, Julian Ashley; Abdallah, Abdallah Musa; Siame, Kabengele Keith; Gey van Pittius, Nicolaas Claudius; van Helden, Paul David; Pain, Arnab; Warren, Robin Mark

    2015-12-01

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi. PMID:26542221

  17. Design and bioinformatics analysis of genome-wide CLIP experiments

    PubMed Central

    Wang, Tao; Xiao, Guanghua; Chu, Yongjun; Zhang, Michael Q.; Corey, David R.; Xie, Yang

    2015-01-01

    The past decades have witnessed a surge of discoveries revealing RNA regulation as a central player in cellular processes. RNAs are regulated by RNA-binding proteins (RBPs) at all post-transcriptional stages, including splicing, transportation, stabilization and translation. Defects in the functions of these RBPs underlie a broad spectrum of human pathologies. Systematic identification of RBP functional targets is among the key biomedical research questions and provides a new direction for drug discovery. The advent of cross-linking immunoprecipitation coupled with high-throughput sequencing (genome-wide CLIP) technology has recently enabled the investigation of genome-wide RBP–RNA binding at single base-pair resolution. This technology has evolved through the development of three distinct versions: HITS-CLIP, PAR-CLIP and iCLIP. Meanwhile, numerous bioinformatics pipelines for handling the genome-wide CLIP data have also been developed. In this review, we discuss the genome-wide CLIP technology and focus on bioinformatics analysis. Specifically, we compare the strengths and weaknesses, as well as the scopes, of various bioinformatics tools. To assist readers in choosing optimal procedures for their analysis, we also review experimental design and procedures that affect bioinformatics analyses. PMID:25958398

  18. MethylC-seq library preparation for base-resolution whole-genome bisulfite sequencing.

    PubMed

    Urich, Mark A; Nery, Joseph R; Lister, Ryan; Schmitz, Robert J; Ecker, Joseph R

    2015-03-01

    Current high-throughput DNA sequencing technologies enable acquisition of billions of data points through which myriad biological processes can be interrogated, including genetic variation, chromatin structure, gene expression patterns, small RNAs and protein-DNA interactions. Here we describe the MethylC-sequencing (MethylC-seq) library preparation method, a 2-d protocol that enables the genome-wide identification of cytosine DNA methylation states at single-base resolution. The technique involves fragmentation of genomic DNA followed by adapter ligation, bisulfite conversion and limited amplification using adapter-specific PCR primers in preparation for sequencing. To date, this protocol has been successfully applied to genomic DNA isolated from primary cell culture, sorted cells and fresh tissue from over a thousand plant and animal samples.

  19. Towards construction of a high resolution map of the mouse genome using PCR-analysed microsatellites.

    PubMed Central

    Love, J M; Knight, A M; McAleer, M A; Todd, J A

    1990-01-01

    Fifty sequences from the mouse genome database containing simple sequence repeats or microsatellites have been analysed for size variation using the polymerase chain reaction and gel electrophoresis. 88% of the sequences, most of which contain the dinucleotide repeat, CA/GT, showed size variations between different inbred strains of mice and the wild mouse, Mus spretus. 62% of sequences had 3 or more alleles. GA/CT and AT/TA-containing sequences were also variable. About half of these size variants were detectable by agarose gel electrophoresis. This simple approach is extremely useful in linkage and genome mapping studies and will facilitate construction of high resolution maps of both the mouse and human genomes. Images PMID:2377456

  20. AGAPE (Automated Genome Analysis PipelinE) for Pan-Genome Analysis of Saccharomyces cerevisiae

    PubMed Central

    Song, Giltae; Dickins, Benjamin J. A.; Demeter, Janos; Engel, Stacia; Dunn, Barbara; Cherry, J. Michael

    2015-01-01

    The characterization and public release of genome sequences from thousands of organisms is expanding the scope for genetic variation studies. However, understanding the phenotypic consequences of genetic variation remains a challenge in eukaryotes due to the complexity of the genotype-phenotype map. One approach to this is the intensive study of model systems for which diverse sources of information can be accumulated and integrated. Saccharomyces cerevisiae is an extensively studied model organism, with well-known protein functions and thoroughly curated phenotype data. To develop and expand the available resources linking genomic variation with function in yeast, we aim to model the pan-genome of S. cerevisiae. To initiate the yeast pan-genome, we newly sequenced or re-sequenced the genomes of 25 strains that are commonly used in the yeast research community using advanced sequencing technology at high quality. We also developed a pipeline for automated pan-genome analysis, which integrates the steps of assembly, annotation, and variation calling. To assign strain-specific functional annotations, we identified genes that were not present in the reference genome. We classified these according to their presence or absence across strains and characterized each group of genes with known functional and phenotypic features. The functional roles of novel genes not found in the reference genome and associated with strains or groups of strains appear to be consistent with anticipated adaptations in specific lineages. As more S. cerevisiae strain genomes are released, our analysis can be used to collate genome data and relate it to lineage-specific patterns of genome evolution. Our new tool set will enhance our understanding of genomic and functional evolution in S. cerevisiae, and will be available to the yeast genetics and molecular biology community. PMID:25781462

  1. AGAPE (Automated Genome Analysis PipelinE) for pan-genome analysis of Saccharomyces cerevisiae.

    PubMed

    Song, Giltae; Dickins, Benjamin J A; Demeter, Janos; Engel, Stacia; Gallagher, Jennifer; Choe, Kisurb; Dunn, Barbara; Snyder, Michael; Cherry, J Michael

    2015-01-01

    The characterization and public release of genome sequences from thousands of organisms is expanding the scope for genetic variation studies. However, understanding the phenotypic consequences of genetic variation remains a challenge in eukaryotes due to the complexity of the genotype-phenotype map. One approach to this is the intensive study of model systems for which diverse sources of information can be accumulated and integrated. Saccharomyces cerevisiae is an extensively studied model organism, with well-known protein functions and thoroughly curated phenotype data. To develop and expand the available resources linking genomic variation with function in yeast, we aim to model the pan-genome of S. cerevisiae. To initiate the yeast pan-genome, we newly sequenced or re-sequenced the genomes of 25 strains that are commonly used in the yeast research community using advanced sequencing technology at high quality. We also developed a pipeline for automated pan-genome analysis, which integrates the steps of assembly, annotation, and variation calling. To assign strain-specific functional annotations, we identified genes that were not present in the reference genome. We classified these according to their presence or absence across strains and characterized each group of genes with known functional and phenotypic features. The functional roles of novel genes not found in the reference genome and associated with strains or groups of strains appear to be consistent with anticipated adaptations in specific lineages. As more S. cerevisiae strain genomes are released, our analysis can be used to collate genome data and relate it to lineage-specific patterns of genome evolution. Our new tool set will enhance our understanding of genomic and functional evolution in S. cerevisiae, and will be available to the yeast genetics and molecular biology community.

  2. Tools for sea urchin genomic analysis.

    PubMed

    Cameron, R Andrew

    2014-01-01

    The Sea Urchin Genome Project Web site, SpBase ( http://SpBase.org ), in association with a suite of publicly available sequence comparison tools provides a platform from which to analyze genes and genomic sequences of sea urchin. This information system is specifically designed to support laboratory bench studies in cell and molecular biology. In particular these tools and datasets have supported the description of the gene regulatory networks of the purple sea urchin S. purpuratus. This chapter details methods to undertake in the first steps to find genes and noncoding regulatory sequences for further analysis.

  3. Comparative genome analysis of Basidiomycete fungi

    SciTech Connect

    Riley, Robert; Salamov, Asaf; Henrissat, Bernard; Nagy, Laszlo; Brown, Daren; Held, Benjamin; Baker, Scott; Blanchette, Robert; Boussau, Bastien; Doty, Sharon L.; Fagnan, Kirsten; Floudas, Dimitris; Levasseur, Anthony; Manning, Gerard; Martin, Francis; Morin, Emmanuelle; Otillar, Robert; Pisabarro, Antonio; Walton, Jonathan; Wolfe, Ken; Hibbett, David; Grigoriev, Igor

    2013-08-07

    Fungi of the phylum Basidiomycota (basidiomycetes), make up some 37percent of the described fungi, and are important in forestry, agriculture, medicine, and bioenergy. This diverse phylum includes symbionts, pathogens, and saprotrophs including the majority of wood decaying and ectomycorrhizal species. To better understand the genetic diversity of this phylum we compared the genomes of 35 basidiomycetes including 6 newly sequenced genomes. These genomes span extremes of genome size, gene number, and repeat content. Analysis of core genes reveals that some 48percent of basidiomycete proteins are unique to the phylum with nearly half of those (22percent) found in only one organism. Correlations between lifestyle and certain gene families are evident. Phylogenetic patterns of plant biomass-degrading genes in Agaricomycotina suggest a continuum rather than a dichotomy between the white rot and brown rot modes of wood decay. Based on phylogenetically-informed PCA analysis of wood decay genes, we predict that that Botryobasidium botryosum and Jaapia argillacea have properties similar to white rot species, although neither has typical ligninolytic class II fungal peroxidases (PODs). This prediction is supported by growth assays in which both fungi exhibit wood decay with white rot-like characteristics. Based on this, we suggest that the white/brown rot dichotomy may be inadequate to describe the full range of wood decaying fungi. Analysis of the rate of discovery of proteins with no or few homologs suggests the value of continued sequencing of basidiomycete fungi.

  4. Resolution analysis in compressive multidimensional microscopy

    NASA Astrophysics Data System (ADS)

    Rodriguez, A. D.; Clemente, P.; Irles, E.; Soldevila, F.; Salvador, E.; Tajahuerce, E.; Lancis, J.

    2015-03-01

    Despite imaging systems that scan a single-element benefit from mature technology, they suffer from acquisition times linearly proportional to the spatial resolution. A promising option is to use a single-pixel system that benefits from data collection strategies based on compressive sampling. Single-pixel systems also offer the possibility to use dedicated sensors such as a fiber spectrometer for multispectral imaging or a distribution of photodiodes for 3D imaging. The image is obtained by lighting the scene with microstructured masks implemented onto a programmable spatial light modulator. The masks are used as generalized measurement modes where the object information is expressed and the image is recovered through algebraic optimization. The fundamental reason why the bucket detection strategy can outperform conventional optical array detection is the use of a single channel detector that simultaneously integrates all the photons transmitted through the patterned scene. Spatial frequencies that are not transmitted through this low-quality optics are demonstrated to be present in the retrieved image. Our work makes two specific contributions within the field of single-pixel imaging through patterned illumination. First, we demonstrate that single-pixel imaging improves the resolution of conventional imaging systems overcoming the Rayleigh criterion. An analysis of resolution using a low NA microscope objective for imaging at a CCD camera shows that single-pixel cameras are not limited at all by the optical quality of the collection optics. Second, we experimentally demonstrate the capability of our technique to properly recover an image even when an optical diffuser is located in between the sample and the bucket detector.

  5. Image analysis in comparative genomic hybridization

    SciTech Connect

    Lundsteen, C.; Maahr, J.; Christensen, B.

    1995-01-01

    Comparative genomic hybridization (CGH) is a new technique by which genomic imbalances can be detected by combining in situ suppression hybridization of whole genomic DNA and image analysis. We have developed software for rapid, quantitative CGH image analysis by a modification and extension of the standard software used for routine karyotyping of G-banded metaphase spreads in the Magiscan chromosome analysis system. The DAPI-counterstained metaphase spread is karyotyped interactively. Corrections for image shifts between the DAPI, FITC, and TRITC images are done manually by moving the three images relative to each other. The fluorescence background is subtracted. A mean filter is applied to smooth the FITC and TRITC images before the fluorescence ratio between the individual FITC and TRITC-stained chromosomes is computed pixel by pixel inside the area of the chromosomes determined by the DAPI boundaries. Fluorescence intensity ratio profiles are generated, and peaks and valleys indicating possible gains and losses of test DNA are marked if they exceed ratios below 0.75 and above 1.25. By combining the analysis of several metaphase spreads, consistent findings of gains and losses in all or almost all spreads indicate chromosomal imbalance. Chromosomal imbalances are detected either by visual inspection of fluorescence ratio (FR) profiles or by a statistical approach that compares FR measurements of the individual case with measurements of normal chromosomes. The complete analysis of one metaphase can be carried out in approximately 10 minutes. 8 refs., 7 figs., 1 tab.

  6. Comparative Analysis of Genome Sequences with VISTA

    DOE Data Explorer

    Dubchak, Inna

    VISTA is a comprehensive suite of programs and databases developed by and hosted at the Genomics Division of Lawrence Berkeley National Laboratory. They provide information and tools designed to facilitate comparative analysis of genomic sequences. Users have two ways to interact with the suite of applications at the VISTA portal. They can submit their own sequences and alignments for analysis (VISTA servers) or examine pre-computed whole-genome alignments of different species. A key menu option is the Enhancer Browser and Database at http://enhancer.lbl.gov/. The VISTA Enhancer Browser is a central resource for experimentally validated human noncoding fragments with gene enhancer activity as assessed in transgenic mice. Most of these noncoding elements were selected for testing based on their extreme conservation with other vertebrates. The results of this enhancer screen are provided through this publicly available website. The browser also features relevant results by external contributors and a large collection of additional genome-wide conserved noncoding elements which are candidate enhancer sequences. The LBL developers invite external groups to submit computational predictions of developmental enhancers. As of 10/19/2009 the database contains information on 1109 in vivo tested elements - 508 elements with enhancer activity.

  7. High resolution analysis of satellite gradiometry

    NASA Technical Reports Server (NTRS)

    Colombo, O. L.

    1989-01-01

    Satellite gravity gradiometry is a technique now under development which, by the middle of the next decade, may be used for the high resolution charting from space of the gravity field of the earth and, afterwards, of other planets. Some data analysis schemes are reviewed for getting detailed gravity maps from gradiometry on both a global and a local basis. It also presents estimates of the likely accuracies of such maps, in terms of normalized spherical harmonics expansions, both using gradiometry alone and in combination with data from a Global Positioning System (GPS) receiver carried on the same spacecraft. It compares these accuracies with those of current and future maps obtained from other data (conventional tracking, satellite-satellite tracking, etc.), and also with the spectra of various signals of geophysical interest.

  8. Genomic signal analysis of Mycobacterium tuberculosis

    NASA Astrophysics Data System (ADS)

    Cristea, Paul Dan; Banica, Dorina; Tuduce, Rodica

    2007-02-01

    As previously shown the conversion of nucleotide sequences into digital signals offers the possibility to apply signal processing methods for the analysis of genomic data. Genomic Signal Analysis (GSA) has been used to analyze large scale features of DNA sequences, at the scale of whole chromosomes, including both coding and non-coding regions. The striking regularities of genomic signals reveal restrictions in the way nucleotides and pairs of nucleotides are distributed along nucleotide sequences. Structurally, a chromosome appears to be less of a "plain text", corresponding to certain semantic and grammar rules, but more of a "poem", satisfying additional symmetry restrictions that evoke the "rhythm" and "rhyme". Recurrent patterns in nucleotide sequences are reflected in simple mathematical regularities observed in genomic signals. GSA has also been used to track pathogen variability, especially concerning their resistance to drugs. Previous work has been dedicated to the study of HIV-1, Clade F and Avian Flu. The present paper applies GSA methodology to study Mycobacterium tuberculosis (MT) rpoB gene variability, relevant to its resistance to antibiotics. Isolates from 50 Romanian patients have been studied both by rapid LightCycler PCR and by sequencing of a segment of 190-250 nucleotides covering the region of interest. The variability is caused by SNPs occurring at specific sites along the gene strand, as well as by inclusions. Because of the mentioned symmetry restrictions, the GS variations tend to compensate. An important result is that MT can act as a vector for HIV virus, which is able to retrotranscribe its specific genes both into human and MT genomes.

  9. Differentiation of Staphylococcus spp. by high-resolution melting analysis.

    PubMed

    Slany, Michal; Vanerkova, Martina; Nemcova, Eva; Zaloudikova, Barbora; Ruzicka, Filip; Freiberger, Tomas

    2010-12-01

    High-resolution melting analysis (HRMA) is a fast (post-PCR) high-throughput method to scan for sequence variations in a target gene. The aim of this study was to test the potential of HRMA to distinguish particular bacterial species of the Staphylococcus genus even when using a broad-range PCR within the 16S rRNA gene where sequence differences are minimal. Genomic DNA samples isolated from 12 reference staphylococcal strains (Staphylococcus aureus, Staphylococcus capitis, Staphylococcus caprae, Staphylococcus epidermidis, Staphylococcus haemolyticus, Staphylococcus hominis, Staphylococcus intermedius, Staphylococcus saprophyticus, Staphylococcus sciuri, Staphylococcus simulans, Staphylococcus warneri, and Staphylococcus xylosus) were subjected to a real-time PCR amplification of the 16S rRNA gene in the presence of fluorescent dye EvaGreen™, followed by HRMA. Melting profiles were used as molecular fingerprints for bacterial species differentiation. HRMA of S. saprophyticus and S. xylosus resulted in undistinguishable profiles because of their identical sequences in the analyzed 16S rRNA region. The remaining reference strains were fully differentiated either directly or via high-resolution plots obtained by heteroduplex formation between coamplified PCR products of the tested staphylococcal strain and phylogenetically unrelated strain.

  10. A High Resolution Genetic Map Anchoring Scaffolds of the Sequenced Watermelon Genome

    PubMed Central

    Kou, Qinghe; Jiang, Jiao; Guo, Shaogui; Zhang, Haiying; Hou, Wenju; Zou, Xiaohua; Sun, Honghe; Gong, Guoyi; Levi, Amnon; Xu, Yong

    2012-01-01

    As part of our ongoing efforts to sequence and map the watermelon (Citrullus spp.) genome, we have constructed a high density genetic linkage map. The map positioned 234 watermelon genome sequence scaffolds (an average size of 1.41 Mb) that cover about 330 Mb and account for 93.5% of the 353 Mb of the assembled genomic sequences of the elite Chinese watermelon line 97103 (Citrullus lanatus var. lanatus). The genetic map was constructed using an F8 population of 103 recombinant inbred lines (RILs). The RILs are derived from a cross between the line 97103 and the United States Plant Introduction (PI) 296341-FR (C. lanatus var. citroides) that contains resistance to fusarium wilt (races 0, 1, and 2). The genetic map consists of eleven linkage groups that include 698 simple sequence repeat (SSR), 219 insertion-deletion (InDel) and 36 structure variation (SV) markers and spans ∼800 cM with a mean marker interval of 0.8 cM. Using fluorescent in situ hybridization (FISH) with 11 BACs that produced chromosome-specifc signals, we have depicted watermelon chromosomes that correspond to the eleven linkage groups constructed in this study. The high resolution genetic map developed here should be a useful platform for the assembly of the watermelon genome, for the development of sequence-based markers used in breeding programs, and for the identification of genes associated with important agricultural traits. PMID:22247776

  11. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping.

    PubMed

    Rao, Suhas S P; Huntley, Miriam H; Durand, Neva C; Stamenova, Elena K; Bochkov, Ivan D; Robinson, James T; Sanborn, Adrian L; Machol, Ido; Omer, Arina D; Lander, Eric S; Aiden, Erez Lieberman

    2014-12-18

    We use in situ Hi-C to probe the 3D architecture of genomes, constructing haploid and diploid maps of nine cell types. The densest, in human lymphoblastoid cells, contains 4.9 billion contacts, achieving 1 kb resolution. We find that genomes are partitioned into contact domains (median length, 185 kb), which are associated with distinct patterns of histone marks and segregate into six subcompartments. We identify ∼10,000 loops. These loops frequently link promoters and enhancers, correlate with gene activation, and show conservation across cell types and species. Loop anchors typically occur at domain boundaries and bind CTCF. CTCF sites at loop anchors occur predominantly (>90%) in a convergent orientation, with the asymmetric motifs "facing" one another. The inactive X chromosome splits into two massive domains and contains large loops anchored at CTCF-binding repeats. PMID:25497547

  12. TBV-361 RESOLUTION ANALYSIS: EMPLACEMENT DRIFT ORIENTATION

    SciTech Connect

    M. Lin; D.C. Kicker; M.D. Sellers

    1999-07-17

    The purpose of this To Be Verified/To Be Determined (TBX) resolution analysis is to release ''To Be Verified'' (TBV)-361 related to the emplacement drift orientation. The system design criterion in ''Subsurface Facility System Description Document'' (CRWMS M&O 1998a, p.9) specifies that the emplacement drift orientation relative to the dominant joint orientations should be at least 30 degrees. The specific objectives for this analysis include the following: (1) Collect and evaluate key block data developed for the repository host horizon rock mass. (2) Assess the dominant joint orientations based on available fracture data. (3) Document the maximum block size as a function of drift orientation. (4) Assess the applicability of the drift orientation/joint orientation offset criterion in the ''Subsurface Facility System Description Document'' (CRWMS M&O 1998a, p.9). (5) Consider the effects of seepage on drift orientation. (6) Verify that the viability assessment (VA) drift orientation complies with the drift orientation/joint orientation offset criterion, or provide justifications and make recommendations for modifying the VA emplacement drift layout. In addition to providing direct support to the System Description Document (SDD), the release of TBV-361 will provide support to the Repository Subsurface Design Department. The results from this activity may also provide data and information needs to support the MGR Requirements Department, the MGR Safety Assurance Department, and the Performance Assessment Organization.

  13. GRETINA commissioning and engineering run resolution analysis

    NASA Astrophysics Data System (ADS)

    Tarlow, Thomas; Beausang, Con; Ross, Tim; Hughes, Richard; Gell, Kristen; Good, Erin

    2012-10-01

    GRETINA, the first stage in the full Gamma Ray Energy Tracking Array (GRETA), consists of seven modules covering approximately 1 solid angle. Each module is made up of four large, highly-segmented germanium detectors capable of measuring the interaction points of individual gamma-rays. GRETINA has recently been assembled and commissioned in LBNL via a series of engineering and commissioning runs. Here we report on an analysis of data from the first engineering run (ER01) which was intended to probe the response of the data acquisition system to high multiplicity gamma-ray cascades. For this experiment the 122Sn(40Ar, 4n) reaction at a beam energy of 210 MeV was utilized to populate high spin states in 158Er. A variety of beam currents, targets and trigger conditions were utilized to test the acquisition. Here we report on the measured energy resolution, both with calibration and in-beam sources as well as a gamma-gamma coincidence analysis to confirm the known level scheme and the capability of the data acquisition system for high fold coincidence measurements. This work was partly supported by the US Department of Energy via grant numbers DE-FG52-09NA29454 and DE-FG02-05-ER41379.

  14. Optical Nano-mapping and Analysis of Plant Genomes.

    PubMed

    Luo, Ming-Cheng; Deal, Karin R; Murray, Armond; Zhu, Tingting; Hastie, Alex R; Stedman, Will; Sadowski, Henry; Saghbini, Michael

    2016-01-01

    Application of optical mapping based on BioNano Genomics Irys(®) technology ( http://www.bionanogenomics.com/ ) is growing rapidly since its debut in November 2012. The technology can be used to facilitate genome sequence assembly and analysis of genome structural variations. We describe here the detailed protocol that we used to generate a whole genome BioNano map for Aegilops tauschii, the D genome progenitor of hexaploid wheat (Triticum aestivum). We are using the whole genome BioNano map to validate sequence assembly based on the next-generation sequencing, order sequence scaffolds, and ultimately build pseudomolecules for the genome. PMID:27511170

  15. Ensemble analysis of adaptive compressed genome sequencing strategies

    PubMed Central

    2014-01-01

    Background Acquiring genomes at single-cell resolution has many applications such as in the study of microbiota. However, deep sequencing and assembly of all of millions of cells in a sample is prohibitively costly. A property that can come to rescue is that deep sequencing of every cell should not be necessary to capture all distinct genomes, as the majority of cells are biological replicates. Biologically important samples are often sparse in that sense. In this paper, we propose an adaptive compressed method, also known as distilled sensing, to capture all distinct genomes in a sparse microbial community with reduced sequencing effort. As opposed to group testing in which the number of distinct events is often constant and sparsity is equivalent to rarity of an event, sparsity in our case means scarcity of distinct events in comparison to the data size. Previously, we introduced the problem and proposed a distilled sensing solution based on the breadth first search strategy. We simulated the whole process which constrained our ability to study the behavior of the algorithm for the entire ensemble due to its computational intensity. Results In this paper, we modify our previous breadth first search strategy and introduce the depth first search strategy. Instead of simulating the entire process, which is intractable for a large number of experiments, we provide a dynamic programming algorithm to analyze the behavior of the method for the entire ensemble. The ensemble analysis algorithm recursively calculates the probability of capturing every distinct genome and also the expected total sequenced nucleotides for a given population profile. Our results suggest that the expected total sequenced nucleotides grows proportional to log of the number of cells and proportional linearly with the number of distinct genomes. The probability of missing a genome depends on its abundance and the ratio of its size over the maximum genome size in the sample. The modified resource

  16. Rif1 Is Required for Resolution of Ultrafine DNA Bridges in Anaphase to Ensure Genomic Stability.

    PubMed

    Hengeveld, Rutger C C; de Boer, H Rudolf; Schoonen, Pepijn M; de Vries, Elisabeth G E; Lens, Susanne M A; van Vugt, Marcel A T M

    2015-08-24

    Sister-chromatid disjunction in anaphase requires the resolution of DNA catenanes by topoisomerase II together with Plk1-interacting checkpoint helicase (PICH) and Bloom's helicase (BLM). We here identify Rif1 as a factor involved in the resolution of DNA catenanes that are visible as ultrafine DNA bridges (UFBs) in anaphase to which PICH and BLM localize. Rif1, which during interphase functions downstream of 53BP1 in DNA repair, is recruited to UFBs in a PICH-dependent fashion, but independently of 53BP1 or BLM. Similar to PICH and BLM, Rif1 promotes the resolution of UFBs: its depletion increases the frequency of nucleoplasmic bridges and RPA70-positive UFBs in late anaphase. Moreover, in the absence of Rif1, PICH, or BLM, more nuclear bodies with damaged DNA arise in ensuing G1 cells, when chromosome decatenation is impaired. Our data reveal a thus far unrecognized function for Rif1 in the resolution of UFBs during anaphase to protect genomic integrity. PMID:26256213

  17. Genome-centric resolution of microbial diversity, metabolism and interactions in anaerobic digestion.

    PubMed

    Vanwonterghem, Inka; Jensen, Paul D; Rabaey, Korneel; Tyson, Gene W

    2016-09-01

    Our understanding of the complex interconnected processes performed by microbial communities is hindered by our inability to culture the vast majority of microorganisms. Metagenomics provides a way to bypass this cultivation bottleneck and recent advances in this field now allow us to recover a growing number of genomes representing previously uncultured populations from increasingly complex environments. In this study, a temporal genome-centric metagenomic analysis was performed of lab-scale anaerobic digesters that host complex microbial communities fulfilling a series of interlinked metabolic processes to enable the conversion of cellulose to methane. In total, 101 population genomes that were moderate to near-complete were recovered based primarily on differential coverage binning. These populations span 19 phyla, represent mostly novel species and expand the genomic coverage of several rare phyla. Classification into functional guilds based on their metabolic potential revealed metabolic networks with a high level of functional redundancy as well as niche specialization, and allowed us to identify potential roles such as hydrolytic specialists for several rare, uncultured populations. Genome-centric analyses of complex microbial communities across diverse environments provide the key to understanding the phylogenetic and metabolic diversity of these interactive communities. PMID:27317862

  18. Does complete plastid genome sequencing improve species discrimination and phylogenetic resolution in Araucaria?

    PubMed

    Ruhsam, Markus; Rai, Hardeep S; Mathews, Sarah; Ross, T Gregory; Graham, Sean W; Raubeson, Linda A; Mei, Wenbin; Thomas, Philip I; Gardner, Martin F; Ennos, Richard A; Hollingsworth, Peter M

    2015-09-01

    Obtaining accurate phylogenies and effective species discrimination using a small standardized set of plastid genes is challenging in evolutionarily young lineages. Complete plastid genome sequencing offers an increasingly easy-to-access source of characters that helps address this. The usefulness of this approach, however, depends on the extent to which plastid haplotypes track morphological species boundaries. We have tested the power of complete plastid genomes to discriminate among multiple accessions of 11 of 13 New Caledonian Araucaria species, an evolutionarily young lineage where the standard DNA barcoding approach has so far failed and phylogenetic relationships have remained elusive. Additionally, 11 nuclear gene regions were Sanger sequenced for all accessions to ascertain the success of species discrimination using a moderate number of nuclear genes. Overall, fewer than half of the New Caledonian Araucaria species with multiple accessions were monophyletic in the plastid or nuclear trees. However, the plastid data retrieved a phylogeny with a higher resolution compared to any previously published tree of this clade and supported the monophyly of about twice as many species and nodes compared to the nuclear data set. Modest gains in discrimination thus are possible, but using complete plastid genomes or a small number of nuclear genes in DNA barcoding may not substantially raise species discriminatory power in many evolutionarily young lineages. The big challenge therefore remains to develop techniques that allow routine access to large numbers of nuclear markers scaleable to thousands of individuals from phylogenetically disparate sample sets. PMID:25611173

  19. Does complete plastid genome sequencing improve species discrimination and phylogenetic resolution in Araucaria?

    PubMed

    Ruhsam, Markus; Rai, Hardeep S; Mathews, Sarah; Ross, T Gregory; Graham, Sean W; Raubeson, Linda A; Mei, Wenbin; Thomas, Philip I; Gardner, Martin F; Ennos, Richard A; Hollingsworth, Peter M

    2015-09-01

    Obtaining accurate phylogenies and effective species discrimination using a small standardized set of plastid genes is challenging in evolutionarily young lineages. Complete plastid genome sequencing offers an increasingly easy-to-access source of characters that helps address this. The usefulness of this approach, however, depends on the extent to which plastid haplotypes track morphological species boundaries. We have tested the power of complete plastid genomes to discriminate among multiple accessions of 11 of 13 New Caledonian Araucaria species, an evolutionarily young lineage where the standard DNA barcoding approach has so far failed and phylogenetic relationships have remained elusive. Additionally, 11 nuclear gene regions were Sanger sequenced for all accessions to ascertain the success of species discrimination using a moderate number of nuclear genes. Overall, fewer than half of the New Caledonian Araucaria species with multiple accessions were monophyletic in the plastid or nuclear trees. However, the plastid data retrieved a phylogeny with a higher resolution compared to any previously published tree of this clade and supported the monophyly of about twice as many species and nodes compared to the nuclear data set. Modest gains in discrimination thus are possible, but using complete plastid genomes or a small number of nuclear genes in DNA barcoding may not substantially raise species discriminatory power in many evolutionarily young lineages. The big challenge therefore remains to develop techniques that allow routine access to large numbers of nuclear markers scaleable to thousands of individuals from phylogenetically disparate sample sets.

  20. Applying Genomic Analysis to Newborn Screening

    PubMed Central

    Solomon, B.D.; Pineda-Alvarez, D.E.; Bear, K.A.; Mullikin, J.C.; Evans, J.P.

    2012-01-01

    Large-scale genomic analysis such as whole-exome and whole-genome sequencing is becoming increasingly prevalent in the research arena. Clinically, many potential uses of this technology have been proposed. One such application is the extension or augmentation of newborn screening. In order to explore this application, we examined data from 3 children with normal newborn screens who underwent whole-exome sequencing as part of research participation. We analyzed sequence information for 151 selected genes associated with conditions ascertained by newborn screening. We compared findings with publicly available databases and results from over 500 individuals who underwent whole-exome sequencing at the same facility. Novel variants were confirmed through bidirectional dideoxynucleotide sequencing. High-density microarrays (Illumina Omni1-Quad) were also performed to detect potential copy number variations affecting these genes. We detected an average of 87 genetic variants per individual. After excluding artifacts, 96% of the variants were found to be reported in public databases and have no evidence of pathogenicity. No variants were identified that would predict disease in the tested individuals, which is in accordance with their normal newborn screens. However, we identified 6 previously reported variants and 2 novel variants that, according to published literature, could result in affected offspring if the reproductive partner were also a mutation carrier; other specific molecular findings highlight additional means by which genomic testing could augment newborn screening. PMID:23112750

  1. Single base resolution analysis of 5-methylcytosine and 5-hydroxymethylcytosine by RRBS and TAB-RRBS.

    PubMed

    Hahn, Maria A; Li, Arthur X; Wu, Xiwei; Pfeifer, Gerd P

    2015-01-01

    Sodium bisulfite-assisted deamination of cytosine forms the basis for conducting single base resolution analysis of 5-methylcytosine in DNA. The TET family of proteins represents a group of enzymes that can oxidize 5-methylcytosine to 5-hydroxymethylcytosine. A modification of the bisulfite-based DNA methylation mapping technique employs TET1-mediated oxidation of 5-methylcytosine (TET-assisted bisulfite sequencing) for single base analysis of 5-hydroxymethylcytosine. Whole genome analysis of cytosine modifications with bisulfite sequencing techniques still is challenging and expensive. Reduced representation bisulfite sequencing (RRBS) has been used to limit the complexity of the analysis to mostly CpG-rich genomic fragments flanked by restriction enzyme cleavage sites, for example MspI (5'CCGG). In this chapter, we describe detailed methods used in our laboratory for analysis of 5-methylcytosine and 5-hydroxymethylcytosine combined (RRBS) and for specific analysis of 5-hydroxymethylcytosine (TAB-RRBS). PMID:25421665

  2. High range resolution micro-Doppler analysis

    NASA Astrophysics Data System (ADS)

    Cammenga, Zachary A.; Smith, Graeme E.; Baker, Christopher J.

    2015-05-01

    This paper addresses use of the micro-Doppler effect and the use of high range-resolution profiles to observe complex targets in complex target scenes. The combination of micro-Doppler and high range-resolution provides the ability to separate the motion of complex targets from one another. This ability leads to the differentiation of targets based on their micro-Doppler signatures. Without the high-range resolution, this would not be possible because the individual signatures would not be separable. This paper also addresses the use of the micro-Doppler information and high range-resolution profiles to generate an approximation of the scattering properties of a complex target. This approximation gives insight into the structure of the complex target and, critically, is created without using a pre-determined target model.

  3. Optimizing Phytoplasma DNA purification for genome analysis.

    PubMed

    Tran-Nguyen, L T T; Gibb, K S

    2007-04-01

    Genome analysis of uncultivable plant pathogenic phytoplasmas is hindered by the difficulty in obtaining sufficient quantities of phytoplasma enriched DNA. We investigated a combination of conventional enrichment techniques such as cesium chloride (CsCl) buoyant gradient centrifugation, and new methods such as rolling circle amplification (RCA), suppression subtractive hybridization (SSH), and mirror orientation selection (MOS) to obtain DNA with a high phytoplasma:host ratio as the major first step in genome analysis of Candidatus Phytoplasma australiense. The phytoplasma:host ratio was calculated for five different plasmid libraries. Based on sequence data, 90% of clones from CsCl DNA enrichment contained chromosomal phytoplasma DNA, compared to 60% from RCA CsCl DNA and 20% from SSH subtracted libraries. Based on an analysis of representative libraries, none contained plant DNA. A high percentage of clones (80-100%) from SSH libraries contained extrachromosomal DNA (eDNA), and we speculate that eDNA in the original DNA preparation was amplified in subsequent SSH manipulations. Despite the availability of new techniques for nucleic acid amplification, we found that conventional CsCl gradient centrifugation was the best enrichment method for obtaining chromosomal phytoplasma DNA with low host DNA content.

  4. Dynamics of genomic clones in breast cancer patient xenografts at single cell resolution

    PubMed Central

    Eirew, Peter; Steif, Adi; Khattra, Jaswinder; Ha, Gavin; Yap, Damian; Farahani, Hossein; Gelmon, Karen; Chia, Stephen; Mar, Colin; Wan, Adrian; Laks, Emma; Biele, Justina; Shumansky, Karey; Rosner, Jamie; McPherson, Andrew; Nielsen, Cydney; Roth, Andrew J. L.; Lefebvre, Calvin; Bashashati, Ali; de Souza, Camila; Siu, Celia; Aniba, Radhouane; Brimhall, Jazmine; Oloumi, Arusha; Osako, Tomo; Bruna, Alejandra; Sandoval, Jose; Algara, Teresa; Greenwood, Wendy; Leung, Kaston; Cheng, Hongwei; Xue, Hui; Wang, Yuzhuo; Lin, Dong; Mungall, Andrew J.; Moore, Richard; Zhao, Yongjun; Lorette, Julie; Nguyen, Long; Huntsman, David; Eaves, Connie J.; Hansen, Carl; Marra, Marco A.; Caldas, Carlos; Shah, Sohrab P.; Aparicio, Samuel

    2016-01-01

    Human cancers, including breast cancers, are comprised of clones differing in mutation content. Clones evolve dynamically in space and time following principles of Darwinian evolution1,2, underpinning important emergent features such as drug resistance and metastasis3–7. Human breast cancer xenoengraftment is used as a means of capturing and studying tumour biology, and breast tumour xenografts are generally assumed to be reasonable models of the originating tumours8–10. However the consequences and reproducibility of engraftment and propagation on the genomic clonal architecture of tumours has not been systematically examined at single cell resolution. Here we show by both deep genome and single cell sequencing methods, the clonal dynamics of initial engraftment and subsequent serial propagation of primary and metastatic human breast cancers in immunodeficient mice. In all 15 cases examined, clonal selection on engraftment was observed in both primary and metastatic breast tumours, varying in degree from extreme selective engraftment of minor (<5% of starting population) clones to moderate, polyclonal engraftment. Furthermore, ongoing clonal dynamics during serial passaging is a feature of tumours experiencing modest initial selection. Through single cell sequencing, we show that major mutation clusters estimated from tumour population sequencing relate predictably to the most abundant clonal genotypes, even in clonally complex and rapidly evolving cases. Finally, we show that similar clonal expansion patterns can emerge in independent grafts of the same starting tumour population, indicating that genomic aberrations can be reproducible determinants of evolutionary trajectories. Our results show that measurement of genomically defined clonal population dynamics will be highly informative for functional studies utilizing patient-derived breast cancer xenoengraftment. PMID:25470049

  5. Comparative Genomic Analysis of Meningitis- and Bacteremia-Causing Pneumococci Identifies a Common Core Genome.

    PubMed

    Kulohoma, Benard W; Cornick, Jennifer E; Chaguza, Chrispin; Yalcin, Feyruz; Harris, Simon R; Gray, Katherine J; Kiran, Anmol M; Molyneux, Elizabeth; French, Neil; Parkhill, Julian; Faragher, Brian E; Everett, Dean B; Bentley, Stephen D; Heyderman, Robert S

    2015-10-01

    Streptococcus pneumoniae is a nasopharyngeal commensal that occasionally invades normally sterile sites to cause bloodstream infection and meningitis. Although the pneumococcal population structure and evolutionary genetics are well defined, it is not clear whether pneumococci that cause meningitis are genetically distinct from those that do not. Here, we used whole-genome sequencing of 140 isolates of S. pneumoniae recovered from bloodstream infection (n = 70) and meningitis (n = 70) to compare their genetic contents. By fitting a double-exponential decaying-function model, we show that these isolates share a core of 1,427 genes (95% confidence interval [CI], 1,425 to 1,435 genes) and that there is no difference in the core genome or accessory gene content from these disease manifestations. Gene presence/absence alone therefore does not explain the virulence behavior of pneumococci that reach the meninges. Our analysis, however, supports the requirement of a range of previously described virulence factors and vaccine candidates for both meningitis- and bacteremia-causing pneumococci. This high-resolution view suggests that, despite considerable competency for genetic exchange, all pneumococci are under considerable pressure to retain key components advantageous for colonization and transmission and that these components are essential for access to and survival in sterile sites. PMID:26259813

  6. Comparative Genomic Analysis of Meningitis- and Bacteremia-Causing Pneumococci Identifies a Common Core Genome.

    PubMed

    Kulohoma, Benard W; Cornick, Jennifer E; Chaguza, Chrispin; Yalcin, Feyruz; Harris, Simon R; Gray, Katherine J; Kiran, Anmol M; Molyneux, Elizabeth; French, Neil; Parkhill, Julian; Faragher, Brian E; Everett, Dean B; Bentley, Stephen D; Heyderman, Robert S

    2015-10-01

    Streptococcus pneumoniae is a nasopharyngeal commensal that occasionally invades normally sterile sites to cause bloodstream infection and meningitis. Although the pneumococcal population structure and evolutionary genetics are well defined, it is not clear whether pneumococci that cause meningitis are genetically distinct from those that do not. Here, we used whole-genome sequencing of 140 isolates of S. pneumoniae recovered from bloodstream infection (n = 70) and meningitis (n = 70) to compare their genetic contents. By fitting a double-exponential decaying-function model, we show that these isolates share a core of 1,427 genes (95% confidence interval [CI], 1,425 to 1,435 genes) and that there is no difference in the core genome or accessory gene content from these disease manifestations. Gene presence/absence alone therefore does not explain the virulence behavior of pneumococci that reach the meninges. Our analysis, however, supports the requirement of a range of previously described virulence factors and vaccine candidates for both meningitis- and bacteremia-causing pneumococci. This high-resolution view suggests that, despite considerable competency for genetic exchange, all pneumococci are under considerable pressure to retain key components advantageous for colonization and transmission and that these components are essential for access to and survival in sterile sites.

  7. Initial sequencing and comparative analysis of the mouse genome

    SciTech Connect

    Waterston, Robert H.; Lindblad-Toh, Kerstin; Birney, Ewan; Rogers, Jane; Abril, Josep F.; Agarwal, Pankaj; Agarwala, Richa; Ainscough, Rachel; Alexandersson, Marina; An, Peter; Antonarakis, Stylianos E.; Attwood, John; Baertsch, Robert; Bailey, Jonathon; Barlow, Karen; Beck, Stephan; Berry, Eric; Birren, Bruce; Bloom, Toby; Bork, Peer; Botcherby, Marc; Bray, Nicolas; Brent, Michael R.; Brown, Daniel G.; Brown, Stephen D.; Bult, Carol; Burton, John; Butler, Jonathan; Campbell, Robert D.; Carninci, Piero; Cawley, Simon; Chiaromonte, Francesca; Chinwalla, Asif T.; Church, Deanna M.; Clamp, Michele; Clee, Christopher; Collins, Francis S.; Cook, Lisa L.; Copley, Richard R.; Coulson, Alan; Couronne, Olivier; Cuff, James; Curwen, Val; Cutts, Tim; Daly, Mark; David, Robert; Davies, Joy; Delehaunty, Kimberly D.; Deri, Justin; Dermitzakis, Emmanouil T.; Dewey, Colin; Dickens, Nicholas J.; Diekhans, Mark; Dodge, Sheila; Dubchak, Inna; Dunn, Diane M.; Eddy, Sean R.; Elnitski, Laura; Emes, Richard D.; Eswara, Pallavi; Eyras, Eduardo; Felsenfeld, Adam; Fewell, Ginger A.; Flicek, Paul; Foley, Karen; Frankel, Wayne N.; Fulton, Lucinda A.; Fulton, Robert S.; Furey, Terrence S.; Gage, Diane; Gibbs, Richard A.; Glusman, Gustavo; Gnerre, Sante; Goldman, Nick; Goodstadt, Leo; Grafham, Darren; Graves, Tina A.; Green, Eric D.; Gregory, Simon; Guigo, Roderic; Guyer, Mark; Hardison, Ross C.; Haussler, David; Hayashizaki, Yoshihide; Hillier, LaDeana W.; Hinrichs, Angela; Hlavina, Wratko; Holzer, Timothy; Hsu, Fan; Hua, Axin; Hubbard, Tim; Hunt, Adrienne; Jackson, Ian; Jaffe, David B.; Johnson, L. Steven; Jones, Matthew; Jones, Thomas A.; Joy, Ann; Kamal, Michael; Karlsson, Elinor K.; Karolchik, Donna; Kasprzyk, Arkadiusz; Kawai, Jun; Keibler, Evan; Kells, Cristyn; Kent, W. James; Kirby, Andrew; Kolbe, Diana L.; Korf, Ian; Kucherlapati, Raju S.; Kulbokas III, Edward J.; Kulp, David; Landers, Tom; Leger, J.P.; Leonard, Steven; Letunic, Ivica; Levine, Rosie; et al.

    2002-12-15

    The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.

  8. Genome-wide analysis correlates Ayurveda Prakriti

    PubMed Central

    Govindaraj, Periyasamy; Nizamuddin, Sheikh; Sharath, Anugula; Jyothi, Vuskamalla; Rotti, Harish; Raval, Ritu; Nayak, Jayakrishna; Bhat, Balakrishna K.; Prasanna, B. V.; Shintre, Pooja; Sule, Mayura; Joshi, Kalpana S.; Dedge, Amrish P.; Bharadwaj, Ramachandra; Gangadharan, G. G.; Nair, Sreekumaran; Gopinath, Puthiya M.; Patwardhan, Bhushan; Kondaiah, Paturu; Satyamoorthy, Kapaettu; Valiathan, Marthanda Varma Sankaran; Thangaraj, Kumarasamy

    2015-01-01

    The practice of Ayurveda, the traditional medicine of India, is based on the concept of three major constitutional types (Vata, Pitta and Kapha) defined as “Prakriti”. To the best of our knowledge, no study has convincingly correlated genomic variations with the classification of Prakriti. In the present study, we performed genome-wide SNP (single nucleotide polymorphism) analysis (Affymetrix, 6.0) of 262 well-classified male individuals (after screening 3416 subjects) belonging to three Prakritis. We found 52 SNPs (p ≤ 1 × 10−5) were significantly different between Prakritis, without any confounding effect of stratification, after 106 permutations. Principal component analysis (PCA) of these SNPs classified 262 individuals into their respective groups (Vata, Pitta and Kapha) irrespective of their ancestry, which represent its power in categorization. We further validated our finding with 297 Indian population samples with known ancestry. Subsequently, we found that PGM1 correlates with phenotype of Pitta as described in the ancient text of Caraka Samhita, suggesting that the phenotypic classification of India’s traditional medicine has a genetic basis; and its Prakriti-based practice in vogue for many centuries resonates with personalized medicine. PMID:26511157

  9. Genome-wide analysis correlates Ayurveda Prakriti.

    PubMed

    Govindaraj, Periyasamy; Nizamuddin, Sheikh; Sharath, Anugula; Jyothi, Vuskamalla; Rotti, Harish; Raval, Ritu; Nayak, Jayakrishna; Bhat, Balakrishna K; Prasanna, B V; Shintre, Pooja; Sule, Mayura; Joshi, Kalpana S; Dedge, Amrish P; Bharadwaj, Ramachandra; Gangadharan, G G; Nair, Sreekumaran; Gopinath, Puthiya M; Patwardhan, Bhushan; Kondaiah, Paturu; Satyamoorthy, Kapaettu; Valiathan, Marthanda Varma Sankaran; Thangaraj, Kumarasamy

    2015-10-29

    The practice of Ayurveda, the traditional medicine of India, is based on the concept of three major constitutional types (Vata, Pitta and Kapha) defined as "Prakriti". To the best of our knowledge, no study has convincingly correlated genomic variations with the classification of Prakriti. In the present study, we performed genome-wide SNP (single nucleotide polymorphism) analysis (Affymetrix, 6.0) of 262 well-classified male individuals (after screening 3416 subjects) belonging to three Prakritis. We found 52 SNPs (p ≤ 1 × 10(-5)) were significantly different between Prakritis, without any confounding effect of stratification, after 10(6) permutations. Principal component analysis (PCA) of these SNPs classified 262 individuals into their respective groups (Vata, Pitta and Kapha) irrespective of their ancestry, which represent its power in categorization. We further validated our finding with 297 Indian population samples with known ancestry. Subsequently, we found that PGM1 correlates with phenotype of Pitta as described in the ancient text of Caraka Samhita, suggesting that the phenotypic classification of India's traditional medicine has a genetic basis; and its Prakriti-based practice in vogue for many centuries resonates with personalized medicine.

  10. Identification of transcribed sequences in Arabidopsis thaliana by using high-resolution genome tiling arrays

    NASA Technical Reports Server (NTRS)

    Stolc, Viktor; Samanta, Manoj Pratim; Tongprasit, Waraporn; Sethi, Himanshu; Liang, Shoudan; Nelson, David C.; Hegeman, Adrian; Nelson, Clark; Rancour, David; Bednarek, Sebastian; Ulrich, Eldon L.; Zhao, Qin; Wrobel, Russell L.; Newman, Craig S.; Fox, Brian G.; Phillips, George N Jr; Markley, John L.; Sussman, Michael R.

    2005-01-01

    Using a maskless photolithography method, we produced DNA oligonucleotide microarrays with probe sequences tiled throughout the genome of the plant Arabidopsis thaliana. RNA expression was determined for the complete nuclear, mitochondrial, and chloroplast genomes by tiling 5 million 36-mer probes. These probes were hybridized to labeled mRNA isolated from liquid grown T87 cells, an undifferentiated Arabidopsis cell culture line. Transcripts were detected from at least 60% of the nearly 26,330 annotated genes, which included 151 predicted genes that were not identified previously by a similar genome-wide hybridization study on four different cell lines. In comparison with previously published results with 25-mer tiling arrays produced by chromium masking-based photolithography technique, 36-mer oligonucleotide probes were found to be more useful in identifying intron-exon boundaries. Using two-dimensional HPLC tandem mass spectrometry, a small-scale proteomic analysis was performed with the same cells. A large amount of strongly hybridizing RNA was found in regions "antisense" to known genes. Similarity of antisense activities between the 25-mer and 36-mer data sets suggests that it is a reproducible and inherent property of the experiments. Transcription activities were also detected for many of the intergenic regions and the small RNAs, including tRNA, small nuclear RNA, small nucleolar RNA, and microRNA. Expression of tRNAs correlates with genome-wide amino acid usage.

  11. Application of Metabolomics for High Resolution Phenotype Analysis

    PubMed Central

    Fukusaki, Eiichiro

    2014-01-01

    Metabolome, a total profile of whole metabolites, is placed on downstream of proteome. Metabolome is thought to be results of implementation of genomic information. In other words, metabolome can be called as high resolution phenotype. The easiest operation of metabolomics is the integration to the upstream ome information including transcriptome and/or proteome. Those trials have been reported at a certain scientific level. In addition, metabolomics can be operated in stand-alone mode without any other ome information. Among metabolomics tactics, the author’s group is particularly focusing on metabolic fingerprinting, in which metabolome information is employed as explanatory variant to evaluate response variant. Metabolic fingerprinting technique is expected not only for analyzing slight difference depending on genotype difference but also for expressing dynamic variation of living organisms. The author introduces several good examples which he performed. Those are useful for easy understanding of the power of metabolomics. In addition, the author mentions the latest technology for analysis of metabolic dynamism. The author’s group developed a facile analytical method for semi-quantitative metabolic dynamism. The author introduces the novel method that uses time dependent variation of isotope distribution based on stable isotope dilution. PMID:26819889

  12. Enhancing genomics information retrieval through dimensional analysis.

    PubMed

    Hu, Qinmin; Huang, Jimmy Xiangji

    2013-06-01

    We propose a novel dimensional analysis approach to employing meta information in order to find the relationships within the unstructured or semi-structured document/passages for improving genomics information retrieval performance. First, we make use of the auxiliary information as three basic dimensions, namely "temporal", "journal", and "author". The reference section is treated as a commensurable quantity of the three basic dimensions. Then, the sample space and subspaces are built up and a set of events are defined to meet the basic requirement of dimensional homogeneity to be commensurable quantities. After that, the classic graph analysis algorithm in the Web environments is applied on each dimension respectively to calculate the importance of each dimension. Finally, we integrate all the dimension networks and re-rank the outputs for evaluation. Our experimental results show the proposed approach is superior and promising.

  13. High Resolution Genomic Scans Reveal Genetic Architecture Controlling Alcohol Preference in Bidirectionally Selected Rat Model.

    PubMed

    Lo, Chiao-Ling; Lossie, Amy C; Liang, Tiebing; Liu, Yunlong; Xuei, Xiaoling; Lumeng, Lawrence; Zhou, Feng C; Muir, William M

    2016-08-01

    Investigations on the influence of nature vs. nurture on Alcoholism (Alcohol Use Disorder) in human have yet to provide a clear view on potential genomic etiologies. To address this issue, we sequenced a replicated animal model system bidirectionally-selected for alcohol preference (AP). This model is uniquely suited to map genetic effects with high reproducibility, and resolution. The origin of the rat lines (an 8-way cross) resulted in small haplotype blocks (HB) with a corresponding high level of resolution. We sequenced DNAs from 40 samples (10 per line of each replicate) to determine allele frequencies and HB. We achieved ~46X coverage per line and replicate. Excessive differentiation in the genomic architecture between lines, across replicates, termed signatures of selection (SS), were classified according to gene and region. We identified SS in 930 genes associated with AP. The majority (50%) of the SS were confined to single gene regions, the greatest numbers of which were in promoters (284) and intronic regions (169) with the least in exon's (4), suggesting that differences in AP were primarily due to alterations in regulatory regions. We confirmed previously identified genes and found many new genes associated with AP. Of those newly identified genes, several demonstrated neuronal function involved in synaptic memory and reward behavior, e.g. ion channels (Kcnf1, Kcnn3, Scn5a), excitatory receptors (Grin2a, Gria3, Grip1), neurotransmitters (Pomc), and synapses (Snap29). This study not only reveals the polygenic architecture of AP, but also emphasizes the importance of regulatory elements, consistent with other complex traits. PMID:27490364

  14. High Resolution Genomic Scans Reveal Genetic Architecture Controlling Alcohol Preference in Bidirectionally Selected Rat Model

    PubMed Central

    Lo, Chiao-Ling; Liang, Tiebing; Liu, Yunlong; Lumeng, Lawrence; Zhou, Feng C.; Muir, William M.

    2016-01-01

    Investigations on the influence of nature vs. nurture on Alcoholism (Alcohol Use Disorder) in human have yet to provide a clear view on potential genomic etiologies. To address this issue, we sequenced a replicated animal model system bidirectionally-selected for alcohol preference (AP). This model is uniquely suited to map genetic effects with high reproducibility, and resolution. The origin of the rat lines (an 8-way cross) resulted in small haplotype blocks (HB) with a corresponding high level of resolution. We sequenced DNAs from 40 samples (10 per line of each replicate) to determine allele frequencies and HB. We achieved ~46X coverage per line and replicate. Excessive differentiation in the genomic architecture between lines, across replicates, termed signatures of selection (SS), were classified according to gene and region. We identified SS in 930 genes associated with AP. The majority (50%) of the SS were confined to single gene regions, the greatest numbers of which were in promoters (284) and intronic regions (169) with the least in exon's (4), suggesting that differences in AP were primarily due to alterations in regulatory regions. We confirmed previously identified genes and found many new genes associated with AP. Of those newly identified genes, several demonstrated neuronal function involved in synaptic memory and reward behavior, e.g. ion channels (Kcnf1, Kcnn3, Scn5a), excitatory receptors (Grin2a, Gria3, Grip1), neurotransmitters (Pomc), and synapses (Snap29). This study not only reveals the polygenic architecture of AP, but also emphasizes the importance of regulatory elements, consistent with other complex traits. PMID:27490364

  15. High Resolution Genomic Scans Reveal Genetic Architecture Controlling Alcohol Preference in Bidirectionally Selected Rat Model.

    PubMed

    Lo, Chiao-Ling; Lossie, Amy C; Liang, Tiebing; Liu, Yunlong; Xuei, Xiaoling; Lumeng, Lawrence; Zhou, Feng C; Muir, William M

    2016-08-01

    Investigations on the influence of nature vs. nurture on Alcoholism (Alcohol Use Disorder) in human have yet to provide a clear view on potential genomic etiologies. To address this issue, we sequenced a replicated animal model system bidirectionally-selected for alcohol preference (AP). This model is uniquely suited to map genetic effects with high reproducibility, and resolution. The origin of the rat lines (an 8-way cross) resulted in small haplotype blocks (HB) with a corresponding high level of resolution. We sequenced DNAs from 40 samples (10 per line of each replicate) to determine allele frequencies and HB. We achieved ~46X coverage per line and replicate. Excessive differentiation in the genomic architecture between lines, across replicates, termed signatures of selection (SS), were classified according to gene and region. We identified SS in 930 genes associated with AP. The majority (50%) of the SS were confined to single gene regions, the greatest numbers of which were in promoters (284) and intronic regions (169) with the least in exon's (4), suggesting that differences in AP were primarily due to alterations in regulatory regions. We confirmed previously identified genes and found many new genes associated with AP. Of those newly identified genes, several demonstrated neuronal function involved in synaptic memory and reward behavior, e.g. ion channels (Kcnf1, Kcnn3, Scn5a), excitatory receptors (Grin2a, Gria3, Grip1), neurotransmitters (Pomc), and synapses (Snap29). This study not only reveals the polygenic architecture of AP, but also emphasizes the importance of regulatory elements, consistent with other complex traits.

  16. Cisplatin DNA damage and repair maps of the human genome at single-nucleotide resolution

    PubMed Central

    Hu, Jinchuan; Lieb, Jason D.; Sancar, Aziz; Adar, Sheera

    2016-01-01

    Cisplatin is a major anticancer drug that kills cancer cells by damaging their DNA. Cancer cells cope with the drug by removal of the damages with nucleotide excision repair. We have developed methods to measure cisplatin adduct formation and its repair at single-nucleotide resolution. “Damage-seq” relies on the replication-blocking properties of the bulky base lesions to precisely map their location. “XR-seq” independently maps the removal of these damages by capturing and sequencing the excised oligomer released during repair. The damage and repair maps we generated reveal that damage distribution is essentially uniform and is dictated mostly by the underlying sequence. In contrast, cisplatin repair is heterogeneous in the genome and is affected by multiple factors including transcription and chromatin states. Thus, the overall effect of damages in the genome is primarily driven not by damage formation but by the repair efficiency. The combination of the Damage-seq and XR-seq methods has the potential for developing novel cancer therapeutic strategies. PMID:27688757

  17. Genome Data Exploration Using Correspondence Analysis.

    PubMed

    Tekaia, Fredj

    2016-01-01

    Recent developments of sequencing technologies that allow the production of massive amounts of genomic and genotyping data have highlighted the need for synthetic data representation and pattern recognition methods that can mine and help discovering biologically meaningful knowledge included in such large data sets. Correspondence analysis (CA) is an exploratory descriptive method designed to analyze two-way data tables, including some measure of association between rows and columns. It constructs linear combinations of variables, known as factors. CA has been used for decades to study high-dimensional data, and remarkable inferences from large data tables were obtained by reducing the dimensionality to a few orthogonal factors that correspond to the largest amount of variability in the data. Herein, I review CA and highlight its use by considering examples in handling high-dimensional data that can be constructed from genomic and genetic studies. Examples in amino acid compositions of large sets of species (viruses, phages, yeast, and fungi) as well as an example related to pairwise shared orthologs in a set of yeast and fungal species, as obtained from their proteome comparisons, are considered. For the first time, results show striking segregations between yeasts and fungi as well as between viruses and phages. Distributions obtained from shared orthologs show clusters of yeast and fungal species corresponding to their phylogenetic relationships. A direct comparison with the principal component analysis method is discussed using a recently published example of genotyping data related to newly discovered traces of an ancient hominid that was compared to modern human populations in the search for ancestral similarities. CA offers more detailed results highlighting links between modern humans and the ancient hominid and their characterizations. Compared to the popular principal component analysis method, CA allows easier and more effective interpretation of results

  18. Genome Data Exploration Using Correspondence Analysis

    PubMed Central

    Tekaia, Fredj

    2016-01-01

    Recent developments of sequencing technologies that allow the production of massive amounts of genomic and genotyping data have highlighted the need for synthetic data representation and pattern recognition methods that can mine and help discovering biologically meaningful knowledge included in such large data sets. Correspondence analysis (CA) is an exploratory descriptive method designed to analyze two-way data tables, including some measure of association between rows and columns. It constructs linear combinations of variables, known as factors. CA has been used for decades to study high-dimensional data, and remarkable inferences from large data tables were obtained by reducing the dimensionality to a few orthogonal factors that correspond to the largest amount of variability in the data. Herein, I review CA and highlight its use by considering examples in handling high-dimensional data that can be constructed from genomic and genetic studies. Examples in amino acid compositions of large sets of species (viruses, phages, yeast, and fungi) as well as an example related to pairwise shared orthologs in a set of yeast and fungal species, as obtained from their proteome comparisons, are considered. For the first time, results show striking segregations between yeasts and fungi as well as between viruses and phages. Distributions obtained from shared orthologs show clusters of yeast and fungal species corresponding to their phylogenetic relationships. A direct comparison with the principal component analysis method is discussed using a recently published example of genotyping data related to newly discovered traces of an ancient hominid that was compared to modern human populations in the search for ancestral similarities. CA offers more detailed results highlighting links between modern humans and the ancient hominid and their characterizations. Compared to the popular principal component analysis method, CA allows easier and more effective interpretation of results

  19. Genome Data Exploration Using Correspondence Analysis.

    PubMed

    Tekaia, Fredj

    2016-01-01

    Recent developments of sequencing technologies that allow the production of massive amounts of genomic and genotyping data have highlighted the need for synthetic data representation and pattern recognition methods that can mine and help discovering biologically meaningful knowledge included in such large data sets. Correspondence analysis (CA) is an exploratory descriptive method designed to analyze two-way data tables, including some measure of association between rows and columns. It constructs linear combinations of variables, known as factors. CA has been used for decades to study high-dimensional data, and remarkable inferences from large data tables were obtained by reducing the dimensionality to a few orthogonal factors that correspond to the largest amount of variability in the data. Herein, I review CA and highlight its use by considering examples in handling high-dimensional data that can be constructed from genomic and genetic studies. Examples in amino acid compositions of large sets of species (viruses, phages, yeast, and fungi) as well as an example related to pairwise shared orthologs in a set of yeast and fungal species, as obtained from their proteome comparisons, are considered. For the first time, results show striking segregations between yeasts and fungi as well as between viruses and phages. Distributions obtained from shared orthologs show clusters of yeast and fungal species corresponding to their phylogenetic relationships. A direct comparison with the principal component analysis method is discussed using a recently published example of genotyping data related to newly discovered traces of an ancient hominid that was compared to modern human populations in the search for ancestral similarities. CA offers more detailed results highlighting links between modern humans and the ancient hominid and their characterizations. Compared to the popular principal component analysis method, CA allows easier and more effective interpretation of results

  20. Optimizing cancer genome sequencing and analysis

    PubMed Central

    Griffith, Malachi; Miller, Christopher A.; Griffith, Obi L.; Krysiak, Kilannin; Skidmore, Zachary L.; Ramu, Avinash; Walker, Jason R.; Dang, Ha X.; Trani, Lee; Larson, David E.; Demeter, Ryan T.; Wendl, Michael C.; McMichael, Joshua F.; Austin, Rachel E.; Magrini, Vincent; McGrath, Sean D.; Ly, Amy; Kulkarni, Shashikant; Cordes, Matthew G.; Fronick, Catrina C.; Fulton, Robert S.; Maher, Christopher A.; Ding, Li; Klco, Jeffery M.; Mardis, Elaine R.; Ley, Timothy J.; Wilson, Richard K.

    2015-01-01

    Summary Tumors are typically sequenced to depths of 75–100× (exome) or 30–50× (whole genome). We demonstrate that current sequencing paradigms are inadequate for tumors that are impure, aneuploid or clonally heterogeneous. To reassess optimal sequencing strategies, we performed ultra-deep (up to ~312×) whole genome sequencing (WGS) and exome capture (up to ~433×) of a primary acute myeloid leukemia, its subsequent relapse, and a matched normal skin sample. We tested multiple alignment and variant calling algorithms and validated ~200,000 putative SNVs by sequencing them to depths of ~1,000×. Additional targeted sequencing provided over 10,000× coverage and ddPCR assays provided up to ~250,000× sampling of selected sites. We evaluated the effects of different library generation approaches, depth of sequencing, and analysis strategies on the ability to effectively characterize a complex tumor. This dataset, representing the most comprehensively sequenced tumor described to date, will serve as an invaluable community resource (dbGaP accession id phs000159). PMID:26645048

  1. Asymmetric Genome Organization in an RNA Virus Revealed via Graph-Theoretical Analysis of Tomographic Data

    PubMed Central

    Geraets, James A.; Dykeman, Eric C.; Stockley, Peter G.; Ranson, Neil A.; Twarock, Reidun

    2015-01-01

    Cryo-electron microscopy permits 3-D structures of viral pathogens to be determined in remarkable detail. In particular, the protein containers encapsulating viral genomes have been determined to high resolution using symmetry averaging techniques that exploit the icosahedral architecture seen in many viruses. By contrast, structure determination of asymmetric components remains a challenge, and novel analysis methods are required to reveal such features and characterize their functional roles during infection. Motivated by the important, cooperative roles of viral genomes in the assembly of single-stranded RNA viruses, we have developed a new analysis method that reveals the asymmetric structural organization of viral genomes in proximity to the capsid in such viruses. The method uses geometric constraints on genome organization, formulated based on knowledge of icosahedrally-averaged reconstructions and the roles of the RNA-capsid protein contacts, to analyse cryo-electron tomographic data. We apply this method to the low-resolution tomographic data of a model virus and infer the unique asymmetric organization of its genome in contact with the protein shell of the capsid. This opens unprecedented opportunities to analyse viral genomes, revealing conserved structural features and mechanisms that can be targeted in antiviral drug design. PMID:25793998

  2. Tandem repeat regions within the Burkholderia pseudomallei genome and their application for high resolution genotyping

    PubMed Central

    U'Ren, Jana M; Schupp, James M; Pearson, Talima; Hornstra, Heidie; Friedman, Christine L Clark; Smith, Kimothy L; Daugherty, Rebecca R Leadem; Rhoton, Shane D; Leadem, Ben; Georgia, Shalamar; Cardon, Michelle; Huynh, Lynn Y; DeShazer, David; Harvey, Steven P; Robison, Richard; Gal, Daniel; Mayo, Mark J; Wagner, David; Currie, Bart J; Keim, Paul

    2007-01-01

    Background The facultative, intracellular bacterium Burkholderia pseudomallei is the causative agent of melioidosis, a serious infectious disease of humans and animals. We identified and categorized tandem repeat arrays and their distribution throughout the genome of B. pseudomallei strain K96243 in order to develop a genetic typing method for B. pseudomallei. We then screened 104 of the potentially polymorphic loci across a diverse panel of 31 isolates including B. pseudomallei, B. mallei and B. thailandensis in order to identify loci with varying degrees of polymorphism. A subset of these tandem repeat arrays were subsequently developed into a multiple-locus VNTR analysis to examine 66 B. pseudomallei and 21 B. mallei isolates from around the world, as well as 95 lineages from a serial transfer experiment encompassing ~18,000 generations. Results B. pseudomallei contains a preponderance of tandem repeat loci throughout its genome, many of which are duplicated elsewhere in the genome. The majority of these loci are composed of repeat motif lengths of 6 to 9 bp with 4 to 10 repeat units and are predominately located in intergenic regions of the genome. Across geographically diverse B. pseudomallei and B.mallei isolates, the 32 VNTR loci displayed between 7 and 28 alleles, with Nei's diversity values ranging from 0.47 and 0.94. Mutation rates for these loci are comparable (>10-5 per locus per generation) to that of the most diverse tandemly repeated regions found in other less diverse bacteria. Conclusion The frequency, location and duplicate nature of tandemly repeated regions within the B. pseudomallei genome indicate that these tandem repeat regions may play a role in generating and maintaining adaptive genomic variation. Multiple-locus VNTR analysis revealed extensive diversity within the global isolate set containing B. pseudomallei and B. mallei, and it detected genotypic differences within clonal lineages of both species that were identical using previous

  3. Analysis of DOA estimation spatial resolution using MUSIC algorithm

    NASA Astrophysics Data System (ADS)

    Guo, Yue; Wang, Hongyuan; Luo, Bin

    2005-11-01

    This paper presents a performance analysis of the spatial resolution of the direction of arrival (DOA) estimates attained by the multiple signal classification (MUSIC) algorithm for uncorrelated sources. The confidence interval of estimation angle which is much more intuitionistic will be considered as the new evaluation standard for the spatial resolution. Then, based on the statistic method, the qualitative analysis reveals the factors influencing the performance of the MUSIC algorithm. At last, quantitative simulations prove the theoretical analysis result exactly.

  4. Genomic resolution of an aggressive, widespread, diverse and expanding meningococcal serogroup B, C and W lineage

    PubMed Central

    Lucidarme, Jay; Hill, Dorothea M.C.; Bratcher, Holly B.; Gray, Steve J.; du Plessis, Mignon; Tsang, Raymond S.W.; Vazquez, Julio A.; Taha, Muhamed-Kheir; Ceyhan, Mehmet; Efron, Adriana M.; Gorla, Maria C.; Findlow, Jamie; Jolley, Keith A.; Maiden, Martin C.J.; Borrow, Ray

    2015-01-01

    Summary Objectives Neisseria meningitidis is a leading cause of meningitis and septicaemia. The hyperinvasive ST-11 clonal complex (cc11) caused serogroup C (MenC) outbreaks in the US military in the 1960s and UK universities in the 1990s, a global Hajj-associated serogroup W (MenW) outbreak in 2000–2001, and subsequent MenW epidemics in sub-Saharan Africa. More recently, endemic MenW disease has expanded in South Africa, South America and the UK, and MenC cases have been reported among European and North American men who have sex with men (MSM). Routine typing schemes poorly resolve cc11 so we established the population structure at genomic resolution. Methods Representatives of these episodes and other geo-temporally diverse cc11 meningococci (n = 750) were compared across 1546 core genes and visualised on phylogenetic networks. Results MenW isolates were confined to a distal portion of one of two main lineages with MenB and MenC isolates interspersed elsewhere. An expanding South American/UK MenW strain was distinct from the ‘Hajj outbreak’ strain and a closely related endemic South African strain. Recent MenC isolates from MSM in France and the UK were closely related but distinct. Conclusions High resolution ‘genomic’ multilocus sequence typing is necessary to resolve and monitor the spread of diverse cc11 lineages globally. PMID:26226598

  5. Computational Methods for the Analysis of Array Comparative Genomic Hybridization

    PubMed Central

    Chari, Raj; Lockwood, William W.; Lam, Wan L.

    2006-01-01

    Array comparative genomic hybridization (array CGH) is a technique for assaying the copy number status of cancer genomes. The widespread use of this technology has lead to a rapid accumulation of high throughput data, which in turn has prompted the development of computational strategies for the analysis of array CGH data. Here we explain the principles behind array image processing, data visualization and genomic profile analysis, review currently available software packages, and raise considerations for future software development. PMID:17992253

  6. Enhancing cancer clonality analysis with integrative genomics

    PubMed Central

    2015-01-01

    Introduction It is understood that cancer is a clonal disease initiated by a single cell, and that metastasis, which is the spread of cancer from the primary site, is also initiated by a single cell. The seemingly natural capability of cancer to adapt dynamically in a Darwinian manner is a primary reason for therapeutic failures. Survival advantages may be induced by cancer therapies and also occur as a result of inherent cell and microenvironmental factors. The selected "more fit" clones outmatch their competition and then become dominant in the tumor via propagation of progeny. This clonal expansion leads to relapse, therapeutic resistance and eventually death. The goal of this study is to develop and demonstrate a more detailed clonality approach by utilizing integrative genomics. Methods Patient tumor samples were profiled by Whole Exome Sequencing (WES) and RNA-seq on an Illumina HiSeq 2500 and methylation profiling was performed on the Illumina Infinium 450K array. STAR and the Haplotype Caller were used for RNA-seq processing. Custom approaches were used for the integration of the multi-omic datasets. Results Reported are major enhancements to CloneViz, which now provides capabilities enabling a formal tumor multi-dimensional clonality analysis by integrating: i) DNA mutations, ii) RNA expressed mutations, and iii) DNA methylation data. RNA and DNA methylation integration were not previously possible, by CloneViz (previous version) or any other clonality method to date. This new approach, named iCloneViz (integrated CloneViz) employs visualization and quantitative methods, revealing an integrative genomic mutational dissection and traceability (DNA, RNA, epigenetics) thru the different layers of molecular structures. Conclusion The iCloneViz approach can be used for analysis of clonal evolution and mutational dynamics of multi-omic data sets. Revealing tumor clonal complexity in an integrative and quantitative manner facilitates improved mutational

  7. Super-resolution analysis of microwave image using WFIPOCS

    NASA Astrophysics Data System (ADS)

    Wang, Xue; Wu, Jin

    2013-03-01

    Microwave images are always blurred and distorted. Super-resolution analysis is crucial in microwave image processing. In this paper, we propose the WFIPOCS algorithm, which represents the wavelet-based fractal interpolation incorporates the improved projection onto convex sets (IPOCS) technique. Firstly, we apply down sampling and wiener filtering to a low resolution (LR) microwave image. Then, the wavelet-based fractal interpolation is applied to preprocess the LR image. Finally, the IPOCS technique is applied to solve the problems arisen by interpolation and to approach a high resolution (HR) image. The experimental results indicate that the WFIPOCS algorithm improves spatial resolution of microwave images.

  8. Barcode server: a visualization-based genome analysis system.

    PubMed

    Mao, Fenglou; Olman, Victor; Wang, Yan; Xu, Ying

    2013-01-01

    We have previously developed a computational method for representing a genome as a barcode image, which makes various genomic features visually apparent. We have demonstrated that this visual capability has made some challenging genome analysis problems relatively easy to solve. We have applied this capability to a number of challenging problems, including (a) identification of horizontally transferred genes, (b) identification of genomic islands with special properties and (c) binning of metagenomic sequences, and achieved highly encouraging results. These application results inspired us to develop this barcode-based genome analysis server for public service, which supports the following capabilities: (a) calculation of the k-mer based barcode image for a provided DNA sequence; (b) detection of sequence fragments in a given genome with distinct barcodes from those of the majority of the genome, (c) clustering of provided DNA sequences into groups having similar barcodes; and (d) homology-based search using Blast against a genome database for any selected genomic regions deemed to have interesting barcodes. The barcode server provides a job management capability, allowing processing of a large number of analysis jobs for barcode-based comparative genome analyses. The barcode server is accessible at http://csbl1.bmb.uga.edu/Barcode. PMID:23457606

  9. Integrative genomic analysis by interoperation of bioinformatics tools in GenomeSpace

    PubMed Central

    Thorvaldsdottir, Helga; Liefeld, Ted; Ocana, Marco; Borges-Rivera, Diego; Pochet, Nathalie; Robinson, James T.; Demchak, Barry; Hull, Tim; Ben-Artzi, Gil; Blankenberg, Daniel; Barber, Galt P.; Lee, Brian T.; Kuhn, Robert M.; Nekrutenko, Anton; Segal, Eran; Ideker, Trey; Reich, Michael; Regev, Aviv; Chang, Howard Y.; Mesirov, Jill P.

    2015-01-01

    Integrative analysis of multiple data types to address complex biomedical questions requires the use of multiple software tools in concert and remains an enormous challenge for most of the biomedical research community. Here we introduce GenomeSpace (http://www.genomespace.org), a cloud-based, cooperative community resource. Seeded as a collaboration of six of the most popular genomics analysis tools, GenomeSpace now supports the streamlined interaction of 20 bioinformatics tools and data resources. To facilitate the ability of non-programming users’ to leverage GenomeSpace in integrative analysis, it offers a growing set of ‘recipes’, short workflows involving a few tools and steps to guide investigators through high utility analysis tasks. PMID:26780094

  10. SMASH, a fragmentation and sequencing method for genomic copy number analysis

    PubMed Central

    Wang, Zihua; Andrews, Peter; Kendall, Jude; Ma, Beicong; Hakker, Inessa; Rodgers, Linda; Ronemus, Michael; Wigler, Michael; Levy, Dan

    2016-01-01

    Copy number variants (CNVs) underlie a significant amount of genetic diversity and disease. CNVs can be detected by a number of means, including chromosomal microarray analysis (CMA) and whole-genome sequencing (WGS), but these approaches suffer from either limited resolution (CMA) or are highly expensive for routine screening (both CMA and WGS). As an alternative, we have developed a next-generation sequencing-based method for CNV analysis termed SMASH, for short multiply aggregated sequence homologies. SMASH utilizes random fragmentation of input genomic DNA to create chimeric sequence reads, from which multiple mappable tags can be parsed using maximal almost-unique matches (MAMs). The SMASH tags are then binned and segmented, generating a profile of genomic copy number at the desired resolution. Because fewer reads are necessary relative to WGS to give accurate CNV data, SMASH libraries can be highly multiplexed, allowing large numbers of individuals to be analyzed at low cost. Increased genomic resolution can be achieved by sequencing to higher depth. PMID:27197213

  11. Detection of Indel Mutations in Drosophila by High-Resolution Melt Analysis (HRMA).

    PubMed

    Housden, Benjamin E; Perrimon, Norbert

    2016-01-01

    Although CRISPR technology allows specific genome alterations to be created with relative ease, detection of these events can be problematic. For example, CRISPR-induced double-strand breaks are often repaired imprecisely to generate unpredictable short indel mutations. Detection of these events requires the use of molecular screening techniques such as endonuclease assays, restriction profiling, or high-resolution melt analysis (HRMA). Here, we provide detailed protocols for HRMA-based mutation screening in Drosophila and analysis of the resulting data using the online tool HRMAnalyzer. PMID:27587781

  12. A rapid whole genome sequencing and analysis system supporting genomic epidemiology (7th Annual SFAF Meeting, 2012)

    SciTech Connect

    FitzGerald, Michael

    2012-06-01

    Michael FitzGerald on "A rapid whole genome sequencing and analysis system supporting genomic epidemiology" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

  13. A rapid whole genome sequencing and analysis system supporting genomic epidemiology (7th Annual SFAF Meeting, 2012)

    ScienceCinema

    FitzGerald, Michael [Broad Institute

    2016-07-12

    Michael FitzGerald on "A rapid whole genome sequencing and analysis system supporting genomic epidemiology" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

  14. Genome structure analysis of molluscs revealed whole genome duplication and lineage specific repeat variation.

    PubMed

    Yoshida, Masa-aki; Ishikura, Yukiko; Moritaki, Takeya; Shoguchi, Eiichi; Shimizu, Kentaro K; Sese, Jun; Ogura, Atsushi

    2011-09-01

    Comparative genome structure analysis allows us to identify novel genes, repetitive sequences and gene duplications. To explore lineage-specific genomic changes of the molluscs that is good model for development of nervous system in invertebrate, we conducted comparative genome structure analyses of three molluscs, pygmy squid, nautilus and scallops using partial genome shotgun sequencing. Most effective elements on the genome structural changes are repetitive elements (REs) causing expansion of genome size and whole genome duplication producing large amount of novel functional genes. Therefore, we investigated variation and proportion of REs and whole genome duplication. We, first, identified variations of REs in the three molluscan genomes by homology-based and de novo RE detection. Proportion of REs were 9.2%, 4.0%, and 3.8% in the pygmy squid, nautilus and scallop, respectively. We, then, estimated genome size of the species as 2.1, 4.2 and 1.8 Gb, respectively, with 2× coverage frequency and DNA sequencing theory. We also performed a gene duplication assay based on coding genes, and found that large-scale duplication events occurred after divergence from the limpet Lottia, an out-group of the three molluscan species. Comparison of all the results suggested that RE expansion did not relate to the increase in genome size of nautilus. Despite close relationships to nautilus, the squid has the largest portion of REs and smaller genome size than nautilus. We also identified lineage-specific RE and gene-family expansions, possibly relate to acquisition of the most complicated eye and brain systems in the three species.

  15. Analysis of low resolution mass spectra

    NASA Technical Reports Server (NTRS)

    Babst, R. W.; Shapiro, H.

    1971-01-01

    Computer program determines gas constituents from measurements of mass/peak-height spectrum from residual gas analyzer. Applications of program include residual gas analysis for work in space environmental simulators, space environment contamination, and air pollution monitoring.

  16. GENOME ANALYSIS OF BURKHOLDERIA CEPACIA AC1100

    EPA Science Inventory

    Burkholderia cepacia is an important organism in bioremediation of environmental pollutants and it is also of increasing interest as a human pathogen. The genomic organization of B. cepacia is being studied in order to better understand its unusual adaptive capacity and genome pl...

  17. The human genome: a multifractal analysis

    PubMed Central

    2011-01-01

    Background Several studies have shown that genomes can be studied via a multifractal formalism. Recently, we used a multifractal approach to study the genetic information content of the Caenorhabditis elegans genome. Here we investigate the possibility that the human genome shows a similar behavior to that observed in the nematode. Results We report here multifractality in the human genome sequence. This behavior correlates strongly on the presence of Alu elements and to a lesser extent on CpG islands and (G+C) content. In contrast, no or low relationship was found for LINE, MIR, MER, LTRs elements and DNA regions poor in genetic information. Gene function, cluster of orthologous genes, metabolic pathways, and exons tended to increase their frequencies with ranges of multifractality and large gene families were located in genomic regions with varied multifractality. Additionally, a multifractal map and classification for human chromosomes are proposed. Conclusions Based on these findings, we propose a descriptive non-linear model for the structure of the human genome, with some biological implications. This model reveals 1) a multifractal regionalization where many regions coexist that are far from equilibrium and 2) this non-linear organization has significant molecular and medical genetic implications for understanding the role of Alu elements in genome stability and structure of the human genome. Given the role of Alu sequences in gene regulation, genetic diseases, human genetic diversity, adaptation and phylogenetic analyses, these quantifications are especially useful. PMID:21999602

  18. Resolution analysis of high-resolution marine seismic data acquired off Yeosu, Korea

    NASA Astrophysics Data System (ADS)

    Lee, Ho-Young; Kim, Wonsik; Koo, Nam-Hyung; Park, Keun-Pil; Yoo, Dong-Geun; Kang, Dong-Hyo; Kim, Young-Gun; Seo, Gab-Seok; Hwang, Kyu-Duk

    2014-05-01

    High-resolution marine seismic surveys have been conducted for the mineral exploration and engineering purpose survey. To improve the quality of high-resolution seismic data, small-scaled multi-channel seismic techniques are used. In this study, we designed high-resolution marine seismic survey using a small airgun and an 8-channel streamer cable and analyzed the resolution of the seismic data related to acquisition and processing parameters. The field survey was conducted off Yeosu, Korea where the stratified thin sedimentary layers are deposited. We used a 30 in3 airgun and an 8-channel streamer cable with a 5 m group interval. We shoot the airgun with a 5 m shot interval and recorded digital data with a 0.1 ms sample interval and 1 s record length. The offset between the source and the first channel was 20 m. We processed the acquired data with simple procedure such as gain recovery, deconvolution, digital filtering, CMP sorting, NMO correction, static correction and stacking. To understand the effect of the acquisition parameters on the vertical and horizontal resolution, we resampled the acquired data using various sample intervals and CMP intervals and produced seismic sections. The analysis results show that the detailed subsurface structures can be imaged with good resolution and continuity using acquisition parameters with a sample interval shorter than 0.2 ms and a CMP interval shorter than 2.5 m. A high-resolution marine 8-channel airgun seismic survey using appropriate acquisition and processing parameters can be effective in imaging marine subsurface structure with a high resolution. This study is a part of a National Research Laboratory (NRL) project and a part of an Energy Technology Innovation (ETI) Project of the Korea Institute of Energy Technology Evaluation and Planning (KETEP), funded by the Ministry of Trade, Industry and Energy (MOTIE). The authors thank the officers and crew of the R/V Tamhae II for their efforts in the field survey.

  19. Complete genome sequencing and comparative genomic analysis of functionally diverse Lysinibacillus sphaericus III(3)7.

    PubMed

    Rey, Andrés; Silva-Quintero, Laura; Dussán, Jenny

    2016-09-01

    Lysinibacillus sphaericus III(3)7 is a native Colombian strain, the first one isolated from soil samples. This strain has shown high levels of pathogenic activity against Culex quinquefaciatus larvae in laboratory assays compared to other members of the same species. Using Pacific Biosciences sequencing technology we sequenced, annotated (de novo) and described the genome of strain III(3)7, achieving a complete genome sequence status. We then performed a comparative analysis between the newly sequenced genome and the ones previously reported for Colombian isolates L. sphaericus OT4b.31, CBAM5 and OT4b.25, with the inclusion of L. sphaericus C3-41 that has been used as a reference genome for most of previous genome sequencing projects. We concluded that L. sphaericus III(3)7 is highly similar with strain OT4b.25 and shares high levels of synteny with isolates CBAM5 and C3-41. PMID:27419068

  20. GenomePeek—an online tool for prokaryotic genome and metagenome analysis

    DOE PAGES

    McNair, Katelyn; Edwards, Robert A.

    2015-06-16

    As increases in prokaryotic sequencing take place, a method to quickly and accurately analyze this data is needed. Previous tools are mainly designed for metagenomic analysis and have limitations; such as long runtimes and significant false positive error rates. The online tool GenomePeek (edwards.sdsu.edu/GenomePeek) was developed to analyze both single genome and metagenome sequencing files, quickly and with low error rates. GenomePeek uses a sequence assembly approach where reads to a set of conserved genes are extracted, assembled and then aligned against the highly specific reference database. GenomePeek was found to be faster than traditional approaches while still keeping errormore » rates low, as well as offering unique data visualization options.« less

  1. GenomePeek—an online tool for prokaryotic genome and metagenome analysis

    SciTech Connect

    McNair, Katelyn; Edwards, Robert A.

    2015-06-16

    As increases in prokaryotic sequencing take place, a method to quickly and accurately analyze this data is needed. Previous tools are mainly designed for metagenomic analysis and have limitations; such as long runtimes and significant false positive error rates. The online tool GenomePeek (edwards.sdsu.edu/GenomePeek) was developed to analyze both single genome and metagenome sequencing files, quickly and with low error rates. GenomePeek uses a sequence assembly approach where reads to a set of conserved genes are extracted, assembled and then aligned against the highly specific reference database. GenomePeek was found to be faster than traditional approaches while still keeping error rates low, as well as offering unique data visualization options.

  2. Initial sequencing and analysis of the human genome.

    PubMed

    Lander, E S; Linton, L M; Birren, B; Nusbaum, C; Zody, M C; Baldwin, J; Devon, K; Dewar, K; Doyle, M; FitzHugh, W; Funke, R; Gage, D; Harris, K; Heaford, A; Howland, J; Kann, L; Lehoczky, J; LeVine, R; McEwan, P; McKernan, K; Meldrim, J; Mesirov, J P; Miranda, C; Morris, W; Naylor, J; Raymond, C; Rosetti, M; Santos, R; Sheridan, A; Sougnez, C; Stange-Thomann, Y; Stojanovic, N; Subramanian, A; Wyman, D; Rogers, J; Sulston, J; Ainscough, R; Beck, S; Bentley, D; Burton, J; Clee, C; Carter, N; Coulson, A; Deadman, R; Deloukas, P; Dunham, A; Dunham, I; Durbin, R; French, L; Grafham, D; Gregory, S; Hubbard, T; Humphray, S; Hunt, A; Jones, M; Lloyd, C; McMurray, A; Matthews, L; Mercer, S; Milne, S; Mullikin, J C; Mungall, A; Plumb, R; Ross, M; Shownkeen, R; Sims, S; Waterston, R H; Wilson, R K; Hillier, L W; McPherson, J D; Marra, M A; Mardis, E R; Fulton, L A; Chinwalla, A T; Pepin, K H; Gish, W R; Chissoe, S L; Wendl, M C; Delehaunty, K D; Miner, T L; Delehaunty, A; Kramer, J B; Cook, L L; Fulton, R S; Johnson, D L; Minx, P J; Clifton, S W; Hawkins, T; Branscomb, E; Predki, P; Richardson, P; Wenning, S; Slezak, T; Doggett, N; Cheng, J F; Olsen, A; Lucas, S; Elkin, C; Uberbacher, E; Frazier, M; Gibbs, R A; Muzny, D M; Scherer, S E; Bouck, J B; Sodergren, E J; Worley, K C; Rives, C M; Gorrell, J H; Metzker, M L; Naylor, S L; Kucherlapati, R S; Nelson, D L; Weinstock, G M; Sakaki, Y; Fujiyama, A; Hattori, M; Yada, T; Toyoda, A; Itoh, T; Kawagoe, C; Watanabe, H; Totoki, Y; Taylor, T; Weissenbach, J; Heilig, R; Saurin, W; Artiguenave, F; Brottier, P; Bruls, T; Pelletier, E; Robert, C; Wincker, P; Smith, D R; Doucette-Stamm, L; Rubenfield, M; Weinstock, K; Lee, H M; Dubois, J; Rosenthal, A; Platzer, M; Nyakatura, G; Taudien, S; Rump, A; Yang, H; Yu, J; Wang, J; Huang, G; Gu, J; Hood, L; Rowen, L; Madan, A; Qin, S; Davis, R W; Federspiel, N A; Abola, A P; Proctor, M J; Myers, R M; Schmutz, J; Dickson, M; Grimwood, J; Cox, D R; Olson, M V; Kaul, R; Raymond, C; Shimizu, N; Kawasaki, K; Minoshima, S; Evans, G A; Athanasiou, M; Schultz, R; Roe, B A; Chen, F; Pan, H; Ramser, J; Lehrach, H; Reinhardt, R; McCombie, W R; de la Bastide, M; Dedhia, N; Blöcker, H; Hornischer, K; Nordsiek, G; Agarwala, R; Aravind, L; Bailey, J A; Bateman, A; Batzoglou, S; Birney, E; Bork, P; Brown, D G; Burge, C B; Cerutti, L; Chen, H C; Church, D; Clamp, M; Copley, R R; Doerks, T; Eddy, S R; Eichler, E E; Furey, T S; Galagan, J; Gilbert, J G; Harmon, C; Hayashizaki, Y; Haussler, D; Hermjakob, H; Hokamp, K; Jang, W; Johnson, L S; Jones, T A; Kasif, S; Kaspryzk, A; Kennedy, S; Kent, W J; Kitts, P; Koonin, E V; Korf, I; Kulp, D; Lancet, D; Lowe, T M; McLysaght, A; Mikkelsen, T; Moran, J V; Mulder, N; Pollara, V J; Ponting, C P; Schuler, G; Schultz, J; Slater, G; Smit, A F; Stupka, E; Szustakowki, J; Thierry-Mieg, D; Thierry-Mieg, J; Wagner, L; Wallis, J; Wheeler, R; Williams, A; Wolf, Y I; Wolfe, K H; Yang, S P; Yeh, R F; Collins, F; Guyer, M S; Peterson, J; Felsenfeld, A; Wetterstrand, K A; Patrinos, A; Morgan, M J; de Jong, P; Catanese, J J; Osoegawa, K; Shizuya, H; Choi, S; Chen, Y J; Szustakowki, J

    2001-02-15

    The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

  3. Initial sequencing and analysis of the human genome.

    PubMed

    Lander, E S; Linton, L M; Birren, B; Nusbaum, C; Zody, M C; Baldwin, J; Devon, K; Dewar, K; Doyle, M; FitzHugh, W; Funke, R; Gage, D; Harris, K; Heaford, A; Howland, J; Kann, L; Lehoczky, J; LeVine, R; McEwan, P; McKernan, K; Meldrim, J; Mesirov, J P; Miranda, C; Morris, W; Naylor, J; Raymond, C; Rosetti, M; Santos, R; Sheridan, A; Sougnez, C; Stange-Thomann, Y; Stojanovic, N; Subramanian, A; Wyman, D; Rogers, J; Sulston, J; Ainscough, R; Beck, S; Bentley, D; Burton, J; Clee, C; Carter, N; Coulson, A; Deadman, R; Deloukas, P; Dunham, A; Dunham, I; Durbin, R; French, L; Grafham, D; Gregory, S; Hubbard, T; Humphray, S; Hunt, A; Jones, M; Lloyd, C; McMurray, A; Matthews, L; Mercer, S; Milne, S; Mullikin, J C; Mungall, A; Plumb, R; Ross, M; Shownkeen, R; Sims, S; Waterston, R H; Wilson, R K; Hillier, L W; McPherson, J D; Marra, M A; Mardis, E R; Fulton, L A; Chinwalla, A T; Pepin, K H; Gish, W R; Chissoe, S L; Wendl, M C; Delehaunty, K D; Miner, T L; Delehaunty, A; Kramer, J B; Cook, L L; Fulton, R S; Johnson, D L; Minx, P J; Clifton, S W; Hawkins, T; Branscomb, E; Predki, P; Richardson, P; Wenning, S; Slezak, T; Doggett, N; Cheng, J F; Olsen, A; Lucas, S; Elkin, C; Uberbacher, E; Frazier, M; Gibbs, R A; Muzny, D M; Scherer, S E; Bouck, J B; Sodergren, E J; Worley, K C; Rives, C M; Gorrell, J H; Metzker, M L; Naylor, S L; Kucherlapati, R S; Nelson, D L; Weinstock, G M; Sakaki, Y; Fujiyama, A; Hattori, M; Yada, T; Toyoda, A; Itoh, T; Kawagoe, C; Watanabe, H; Totoki, Y; Taylor, T; Weissenbach, J; Heilig, R; Saurin, W; Artiguenave, F; Brottier, P; Bruls, T; Pelletier, E; Robert, C; Wincker, P; Smith, D R; Doucette-Stamm, L; Rubenfield, M; Weinstock, K; Lee, H M; Dubois, J; Rosenthal, A; Platzer, M; Nyakatura, G; Taudien, S; Rump, A; Yang, H; Yu, J; Wang, J; Huang, G; Gu, J; Hood, L; Rowen, L; Madan, A; Qin, S; Davis, R W; Federspiel, N A; Abola, A P; Proctor, M J; Myers, R M; Schmutz, J; Dickson, M; Grimwood, J; Cox, D R; Olson, M V; Kaul, R; Raymond, C; Shimizu, N; Kawasaki, K; Minoshima, S; Evans, G A; Athanasiou, M; Schultz, R; Roe, B A; Chen, F; Pan, H; Ramser, J; Lehrach, H; Reinhardt, R; McCombie, W R; de la Bastide, M; Dedhia, N; Blöcker, H; Hornischer, K; Nordsiek, G; Agarwala, R; Aravind, L; Bailey, J A; Bateman, A; Batzoglou, S; Birney, E; Bork, P; Brown, D G; Burge, C B; Cerutti, L; Chen, H C; Church, D; Clamp, M; Copley, R R; Doerks, T; Eddy, S R; Eichler, E E; Furey, T S; Galagan, J; Gilbert, J G; Harmon, C; Hayashizaki, Y; Haussler, D; Hermjakob, H; Hokamp, K; Jang, W; Johnson, L S; Jones, T A; Kasif, S; Kaspryzk, A; Kennedy, S; Kent, W J; Kitts, P; Koonin, E V; Korf, I; Kulp, D; Lancet, D; Lowe, T M; McLysaght, A; Mikkelsen, T; Moran, J V; Mulder, N; Pollara, V J; Ponting, C P; Schuler, G; Schultz, J; Slater, G; Smit, A F; Stupka, E; Szustakowki, J; Thierry-Mieg, D; Thierry-Mieg, J; Wagner, L; Wallis, J; Wheeler, R; Williams, A; Wolf, Y I; Wolfe, K H; Yang, S P; Yeh, R F; Collins, F; Guyer, M S; Peterson, J; Felsenfeld, A; Wetterstrand, K A; Patrinos, A; Morgan, M J; de Jong, P; Catanese, J J; Osoegawa, K; Shizuya, H; Choi, S; Chen, Y J; Szustakowki, J

    2001-02-15

    The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence. PMID:11237011

  4. Comparative DNA Sequence Analysis of Wheat and Rice Genomes

    PubMed Central

    Sorrells, Mark E.; La Rota, Mauricio; Bermudez-Kandianis, Catherine E.; Greene, Robert A.; Kantety, Ramesh; Munkvold, Jesse D.; Miftahudin; Mahmoud, Ahmed; Ma, Xuefeng; Gustafson, Perry J.; Qi, Lili L.; Echalier, Benjamin; Gill, Bikram S.; Matthews, David E.; Lazo, Gerard R.; Chao, Shiaoman; Anderson, Olin D.; Edwards, Hugh; Linkiewicz, Anna M.; Dubcovsky, Jorge; Akhunov, Eduard D.; Dvorak, Jan; Zhang, Deshui; Nguyen, Henry T.; Peng, Junhua; Lapitan, Nora L.V.; Gonzalez-Hernandez, Jose L.; Anderson, James A.; Hossain, Khwaja; Kalavacharla, Venu; Kianian, Shahryar F.; Choi, Dong-Woog; Close, Timothy J.; Dilbirligi, Muharrem; Gill, Kulvinder S.; Steber, Camille; Walker-Simmons, Mary K.; McGuire, Patrick E.; Qualset, Calvin O.

    2003-01-01

    The use of DNA sequence-based comparative genomics for evolutionary studies and for transferring information from model species to crop species has revolutionized molecular genetics and crop improvement strategies. This study compared 4485 expressed sequence tags (ESTs) that were physically mapped in wheat chromosome bins, to the public rice genome sequence data from 2251 ordered BAC/PAC clones using BLAST. A rice genome view of homologous wheat genome locations based on comparative sequence analysis revealed numerous chromosomal rearrangements that will significantly complicate the use of rice as a model for cross-species transfer of information in nonconserved regions. PMID:12902377

  5. Bioinformatic tools for using whole genome sequencing as a rapid high resolution diagnostic typing tool when tracing bioterror organisms in the food and feed chain.

    PubMed

    Segerman, Bo; De Medici, Dario; Ehling Schulz, Monika; Fach, Patrick; Fenicia, Lucia; Fricker, Martina; Wielinga, Peter; Van Rotterdam, Bart; Knutsson, Rickard

    2011-03-01

    The rapid technological development in the field of parallel sequencing offers new opportunities when tracing and tracking microorganisms in the food and feed chain. If a bioterror organism is deliberately spread it is of crucial importance to get as much information as possible regarding the strain as fast as possible to aid the decision process and select suitable controls, tracing and tracking tools. A lot of efforts have been made to sequence multiple strains of potential bioterror organisms so there is a relatively large set of reference genomes available. This study is focused on how to use parallel sequencing for rapid phylogenomic analysis and screen for genetic modifications. A bioinformatic methodology has been developed to rapidly analyze sequence data with minimal post-processing. Instead of assembling the genome, defining genes, defining orthologous relations and calculating distances, the present method can achieve a similar high resolution directly from the raw sequence data. The method defines orthologous sequence reads instead of orthologous genes and the average similarity of the core genome (ASC) is calculated. The sequence reads from the core and from the non-conserved genomic regions can also be separated for further analysis. Finally, the comparison algorithm is used to visualize the phylogenomic diversity of the bacterial bioterror organisms Bacillus anthracis and Clostridium botulinum using heat plot diagrams.

  6. Analysis of inserts in prokaryote genomes

    NASA Astrophysics Data System (ADS)

    Cristea, Paul Dan; Tuduce, Rodica Aurora

    2008-02-01

    Nucleotide genomic signals satisfy regularities that reveal restrictions in the distribution of nucleotides and pairs of nucleotides along DNA sequences. Structurally, a chromosome appears to be more than a plain text, by satisfying symmetry constrains that evoke the rhythm and rhyme in poems. These regularities make it easy to identify exogenous inserts in the genomes of prokaryotes, because such inserts obey different regularities than the background sequence. The paper presents instances of inserts found in the genomes of Bacillus subtilis, Mycobacterium tuberculosis and other prokaryotes. Inserts of exogenous material are frequently accompanied by complementary inserts tending to restore the original constrains.

  7. Software tool for the analysis and visualization of whole genome alignments

    2011-08-01

    GenomeVISTA is a tool which performs and displays pairwise and multiple whole genome DNA alignments. The tools provides a graphical user interface by which users can navigate alignments and multiple levels of resolution and get imformation about individual aligned regions. Users can load their own sequences into GenomeVISTA or view pre-computed alignments for genomes in the VISTA database.

  8. Supercomputing for the parallelization of whole genome analysis

    PubMed Central

    Puckelwartz, Megan J.; Pesce, Lorenzo L.; Nelakuditi, Viswateja; Dellefave-Castillo, Lisa; Golbus, Jessica R.; Day, Sharlene M.; Cappola, Thomas P.; Dorn, Gerald W.; Foster, Ian T.; McNally, Elizabeth M.

    2014-01-01

    Motivation: The declining cost of generating DNA sequence is promoting an increase in whole genome sequencing, especially as applied to the human genome. Whole genome analysis requires the alignment and comparison of raw sequence data, and results in a computational bottleneck because of limited ability to analyze multiple genomes simultaneously. Results: We now adapted a Cray XE6 supercomputer to achieve the parallelization required for concurrent multiple genome analysis. This approach not only markedly speeds computational time but also results in increased usable sequence per genome. Relying on publically available software, the Cray XE6 has the capacity to align and call variants on 240 whole genomes in ∼50 h. Multisample variant calling is also accelerated. Availability and implementation: The MegaSeq workflow is designed to harness the size and memory of the Cray XE6, housed at Argonne National Laboratory, for whole genome analysis in a platform designed to better match current and emerging sequencing volume. Contact: emcnally@uchicago.edu PMID:24526712

  9. Whole Genome Amplification in Genomic Analysis of Single Circulating Tumor Cells.

    PubMed

    Gasch, Christin; Pantel, Klaus; Riethdorf, Sabine

    2015-01-01

    Investigation of the genome of organisms is one of the major basics in molecular biology to understand the complex organization of cells. While genomic DNA can easily be isolated from tissues or cell cultures of plant, animal or human origin, DNA extraction from single cells is still challenging. Here, we describe three techniques for the amplification of genomic DNA of fixed single circulating tumor cells (CTC) isolated from blood of cancer patients. This amplification is aimed to increase DNA amounts from those of one cell to yields sufficient for different DNA analyses such as mutational analysis including next-generation sequencing, array-comparative genome hybridization (CGH), and quantitative measurement of gene amplifications. Molecular analysis of CTC as liquid biopsy can be used to identify therapeutic targets in personalized medicine directed, e.g. against human epidermal growth factor receptor 2 (HER2) or epidermal growth factor receptor (EGFR) and to stratify the patients to those therapies.

  10. Structural and functional genome analysis using extended chromatin

    SciTech Connect

    Heaf, T.; Ward, D.C.

    1994-09-01

    Highly extended linear chromatin fibers (ECFs) produced by detergent and high-salt lysis and stretching of nuclear chromatin across the surface of a glass slide can by hybridized over physical distances of at least several Mb. This allows long-range FISH analysis of the human genome with excellent DNA resolution (<10 kb/{mu}m). The insertion of Alu elements which are more than 50-fold underrepresented in centromeres can be seen within and near long tandem arrays of alpha-satellite DNA. Long tracts of trinucleotide repeats, i.e. (CCA){sub n}, can be localized within larger genomic regions. The combined application of BrdU incorporation and ECFs allows one to study the spatio-temporal distribution of DNA replication sites in finer detail. DNA synthesis occurs at multiple discrete sites within Mb arrays of alpha-satellite. Replicating DNA is tightly associated with the nuclear matrix and highly resistant to stretching out, while ECFs containing newly replicated DNA are easily released. Asynchrony in replication timing is accompanied by differences in condensation of homologous DNA segments. Extended chromatin reveals differential packaging of active and inactive DNA. Upon transcriptional inactivation by AMD, the normally compact rRNA genes become much more susceptible to decondensation procedures. By extending the chromatin from pachytene spermatocytes, meiotic pairing and genetic exchange between homologs can be visualized directly. Histone depletion by high salt and detergent produces loop chromatin surrounding the nuclear matrix in a halo-like fashion. DNA halos can be used to map nuclear matrix attachment sites in somatic cells and in mature sperm. Alpha-satellite containing DNA loops appear to be attached to the sperm-cell matrix by CENP-B boxes, short 17 bp sequences found in a subset of alpha satellite monomers. Sperm telomeres almost always appear as hybridization doublets, suggesting the presence of already replicated chromosome ends.

  11. Genomic Analysis of wig-1 Pathways

    PubMed Central

    Sedaghat, Yalda; Mazur, Curt; Sabripour, Mahyar; Hung, Gene; Monia, Brett P.

    2012-01-01

    Background Wig-1 is a transcription factor regulated by p53 that can interact with hnRNP A2/B1, RNA Helicase A, and dsRNAs, which plays an important role in RNA and protein stabilization. in vitro studies have shown that wig-1 binds p53 mRNA and stabilizes it by protecting it from deadenylation. Furthermore, p53 has been implicated as a causal factor in neurodegenerative diseases based in part on its selective regulatory function on gene expression, including genes which, in turn, also possess regulatory functions on gene expression. In this study we focused on the wig-1 transcription factor as a downstream p53 regulated gene and characterized the effects of wig-1 down regulation on gene expression in mouse liver and brain. Methods and Results Antisense oligonucleotides (ASOs) were identified that specifically target mouse wig-1 mRNA and produce a dose-dependent reduction in wig-1 mRNA levels in cell culture. These wig-1 ASOs produced marked reductions in wig-1 levels in liver following intraperitoneal administration and in brain tissue following ASO administration through a single striatal bolus injection in FVB and BACHD mice. Wig-1 suppression was well tolerated and resulted in the reduction of mutant Htt protein levels in BACHD mouse brain but had no effect on normal Htt protein levels nor p53 mRNA or protein levels. Expression microarray analysis was employed to determine the effects of wig-1 suppression on genome-wide expression in mouse liver and brain. Reduction of wig-1 caused both down regulation and up regulation of several genes, and a number of wig-1 regulated genes were identified that potentially links wig-1 various signaling pathways and diseases. Conclusion Antisense oligonucleotides can effectively reduce wig-1 levels in mouse liver and brain, which results in specific changes in gene expression for pathways relevant to both the nervous system and cancer. PMID:22347364

  12. Mechanisms of assembly and genome packaging in an RNA virus revealed by high-resolution cryo-EM.

    PubMed

    Hesketh, Emma L; Meshcheriakova, Yulia; Dent, Kyle C; Saxena, Pooja; Thompson, Rebecca F; Cockburn, Joseph J; Lomonossoff, George P; Ranson, Neil A

    2015-01-01

    Cowpea mosaic virus is a plant-infecting member of the Picornavirales and is of major interest in the development of biotechnology applications. Despite the availability of >100 crystal structures of Picornavirales capsids, relatively little is known about the mechanisms of capsid assembly and genome encapsidation. Here we have determined cryo-electron microscopy reconstructions for the wild-type virus and an empty virus-like particle, to 3.4 Å and 3.0 Å resolution, respectively, and built de novo atomic models of their capsids. These new structures reveal the C-terminal region of the small coat protein subunit, which is essential for virus assembly and which was missing from previously determined crystal structures, as well as residues that bind to the viral genome. These observations allow us to develop a new model for genome encapsidation and capsid assembly. PMID:26657148

  13. Mechanisms of assembly and genome packaging in an RNA virus revealed by high-resolution cryo-EM

    PubMed Central

    Hesketh, Emma L.; Meshcheriakova, Yulia; Dent, Kyle C.; Saxena, Pooja; Thompson, Rebecca F.; Cockburn, Joseph J.; Lomonossoff, George P.; Ranson, Neil A.

    2015-01-01

    Cowpea mosaic virus is a plant-infecting member of the Picornavirales and is of major interest in the development of biotechnology applications. Despite the availability of >100 crystal structures of Picornavirales capsids, relatively little is known about the mechanisms of capsid assembly and genome encapsidation. Here we have determined cryo-electron microscopy reconstructions for the wild-type virus and an empty virus-like particle, to 3.4 Å and 3.0 Å resolution, respectively, and built de novo atomic models of their capsids. These new structures reveal the C-terminal region of the small coat protein subunit, which is essential for virus assembly and which was missing from previously determined crystal structures, as well as residues that bind to the viral genome. These observations allow us to develop a new model for genome encapsidation and capsid assembly. PMID:26657148

  14. A Comparative Genomic Analysis of Diverse Clonal Types of Enterotoxigenic Escherichia coli Reveals Pathovar-Specific Conservation▿ †

    PubMed Central

    Sahl, Jason W.; Steinsland, Hans; Redman, Julia C.; Angiuoli, Samuel V.; Nataro, James P.; Sommerfelt, Halvor; Rasko, David A.

    2011-01-01

    Enterotoxigenic Escherichia coli (ETEC) is a major cause of diarrheal illness in children less than 5 years of age in low- and middle-income nations, whereas it is an emerging enteric pathogen in industrialized nations. Despite being an important cause of diarrhea, little is known about the genomic composition of ETEC. To address this, we sequenced the genomes of five ETEC isolates obtained from children in Guinea-Bissau with diarrhea. These five isolates represent distinct and globally dominant ETEC clonal groups. Comparative genomic analyses utilizing a gene-independent whole-genome alignment method demonstrated that sequenced ETEC strains share approximately 2.7 million bases of genomic sequence. Phylogenetic analysis of this “core genome” confirmed the diverse history of the ETEC pathovar and provides a finer resolution of the E. coli relationships than multilocus sequence typing. No identified genomic regions were conserved exclusively in all ETEC genomes; however, we identified more genomic content conserved among ETEC genomes than among non-ETEC E. coli genomes, suggesting that ETEC isolates share a genomic core. Comparisons of known virulence and of surface-exposed and colonization factor genes across all sequenced ETEC genomes not only identified variability but also indicated that some antigens are restricted to the ETEC pathovar. Overall, the generation of these five genome sequences, in addition to the two previously generated ETEC genomes, highlights the genomic diversity of ETEC. These studies increase our understanding of ETEC evolution, as well as provide insight into virulence factors and conserved proteins, which may be targets for vaccine development. PMID:21078854

  15. The Arabidopsis TAC Position Viewer: a high-resolution map of transformation-competent artificial chromosome (TAC) clones aligned with the Arabidopsis thaliana Columbia-0 genome.

    PubMed

    Hirose, Yoshitsugu; Suda, Kunihiro; Liu, Yao-Guang; Sato, Shusei; Nakamura, Yukino; Yokoyama, Koji; Yamamoto, Naoki; Hanano, Shigeru; Takita, Eiji; Sakurai, Nozomu; Suzuki, Hideyuki; Nakamura, Yasukazu; Kaneko, Takakazu; Yano, Kentaro; Tabata, Satoshi; Shibata, Daisuke

    2015-09-01

    We present a high-resolution map of genomic transformation-competent artificial chromosome (TAC) clones extending over all Arabidopsis thaliana (Arabidopsis) chromosomes. The Arabidopsis genomic TAC clones have been valuable genetic tools. Previously, we constructed an Arabidopsis genomic TAC library consisting of more than 10,000 TAC clones harboring large genomic DNA fragments extending over the whole Arabidopsis genome. Here, we determined 13,577 end sequences from 6987 Arabidopsis TAC clones and mapped 5937 TAC clones to precise locations, covering approximately 90% of the Arabidopsis chromosomes. We present the large-scale data set of TAC clones with high-resolution mapping information as a Java application tool, the Arabidopsis TAC Position Viewer, which provides ready-to-go transformable genomic DNA clones corresponding to certain loci on Arabidopsis chromosomes. The TAC clone resources will accelerate genomic DNA cloning, positional walking, complementation of mutants and DNA transformation for heterologous gene expression. PMID:26227242

  16. High-resolution physical mapping in Pennisetum squamulatum reveals extensive chromosomal heteromorphism of the genomic region associated with apomixis.

    PubMed

    Akiyama, Yukio; Conner, Joann A; Goel, Shailendra; Morishige, Daryl T; Mullet, John E; Hanna, Wayne W; Ozias-Akins, Peggy

    2004-04-01

    Gametophytic apomixis is asexual reproduction as a consequence of parthenogenetic development of a chromosomally unreduced egg. The trait leads to the production of embryos with a maternal genotype, i.e. progeny are clones of the maternal plant. The application of the trait in agriculture could be a tremendous tool for crop improvement through conventional and nonconventional breeding methods. Unfortunately, there are no major crops that reproduce by apomixis, and interspecific hybridization with wild relatives has not yet resulted in commercially viable germplasm. Pennisetum squamulatum is an aposporous apomict from which the gene(s) for apomixis has been transferred to sexual pearl millet by backcrossing. Twelve molecular markers that are linked with apomixis coexist in a tight linkage block called the apospory-specific genomic region (ASGR), and several of these markers have been shown to be hemizygous in the polyploid genome of P. squamulatum. High resolution genetic mapping of these markers has not been possible because of low recombination in this region of the genome. We now show the physical arrangement of bacterial artificial chromosomes containing apomixis-linked molecular markers by high resolution fluorescence in situ hybridization on pachytene chromosomes. The size of the ASGR, currently defined as the entire hemizygous region that hybridizes with apomixis-linked bacterial artificial chromosomes, was estimated on pachytene and mitotic chromosomes to be approximately 50 Mbp (a quarter of the chromosome). The ASGR includes highly repetitive sequences from an Opie-2-like retrotransposon family that are particularly abundant in this region of the genome.

  17. Identification of novel non-coding small RNAs from Streptococcus pneumoniae TIGR4 using high-resolution genome tiling arrays

    PubMed Central

    2010-01-01

    Background The identification of non-coding transcripts in human, mouse, and Escherichia coli has revealed their widespread occurrence and functional importance in both eukaryotic and prokaryotic life. In prokaryotes, studies have shown that non-coding transcripts participate in a broad range of cellular functions like gene regulation, stress and virulence. However, very little is known about non-coding transcripts in Streptococcus pneumoniae (pneumococcus), an obligate human respiratory pathogen responsible for significant worldwide morbidity and mortality. Tiling microarrays enable genome wide mRNA profiling as well as identification of novel transcripts at a high-resolution. Results Here, we describe a high-resolution transcription map of the S. pneumoniae clinical isolate TIGR4 using genomic tiling arrays. Our results indicate that approximately 66% of the genome is expressed under our experimental conditions. We identified a total of 50 non-coding small RNAs (sRNAs) from the intergenic regions, of which 36 had no predicted function. Half of the identified sRNA sequences were found to be unique to S. pneumoniae genome. We identified eight overrepresented sequence motifs among sRNA sequences that correspond to sRNAs in different functional categories. Tiling arrays also identified approximately 202 operon structures in the genome. Conclusions In summary, the pneumococcal operon structures and novel sRNAs identified in this study enhance our understanding of the complexity and extent of the pneumococcal 'expressed' genome. Furthermore, the results of this study open up new avenues of research for understanding the complex RNA regulatory network governing S. pneumoniae physiology and virulence. PMID:20525227

  18. High-resolution Brillouin analysis of composite materials beams

    NASA Astrophysics Data System (ADS)

    London, Yosef; Antman, Yair; Silbiger, Maayan; Efraim, Liel; Froochzad, Avihay; Adler, Gadi; Levenberg, Eyal; Zadok, Avi

    2015-09-01

    High-resolution Brillouin optical correlation domain analysis of fibers embedded within beams of composite materials is performed with 4 cm resolution and 0.5 MHz sensitivity. Two new contributions are presented. First, analysis was carried out continuously over 30 hours following the production of a beam, observing heating during exothermal curing and buildup of residual strains. Second, the bending stiffness and Young's modulus of the composite beam were extracted based on distributed strain measurements, taken during a static three-point bending experiment. The calculated parameters were used to forecast the beam deflections. The latter were favorably compared against external displacement measurements.

  19. Resequencing of the common marmoset genome improves genome assemblies and gene-coding sequence analysis

    PubMed Central

    Sato, Kengo; Kuroki, Yoko; Kumita, Wakako; Fujiyama, Asao; Toyoda, Atsushi; Kawai, Jun; Iriki, Atsushi; Sasaki, Erika; Okano, Hideyuki; Sakakibara, Yasubumi

    2015-01-01

    The first draft of the common marmoset (Callithrix jacchus) genome was published by the Marmoset Genome Sequencing and Analysis Consortium. The draft was based on whole-genome shotgun sequencing, and the current assembly version is Callithrix_jacches-3.2.1, but there still exist 187,214 undetermined gap regions and supercontigs and relatively short contigs that are unmapped to chromosomes in the draft genome. We performed resequencing and assembly of the genome of common marmoset by deep sequencing with high-throughput sequencing technology. Several different sequence runs using Illumina sequencing platforms were executed, and 181 Gbp of high-quality bases including mate-pairs with long insert lengths of 3, 8, 20, and 40 Kbp were obtained, that is, approximately 60× coverage. The resequencing significantly improved the MGSAC draft genome sequence. The N50 of the contigs, which is a statistical measure used to evaluate assembly quality, doubled. As a result, 51% of the contigs (total length: 299 Mbp) that were unmapped to chromosomes in the MGSAC draft were merged with chromosomal contigs, and the improved genome sequence helped to detect 5,288 new genes that are homologous to human cDNAs and the gaps in 5,187 transcripts of the Ensembl gene annotations were completely filled. PMID:26586576

  20. Resequencing of the common marmoset genome improves genome assemblies and gene-coding sequence analysis.

    PubMed

    Sato, Kengo; Kuroki, Yoko; Kumita, Wakako; Fujiyama, Asao; Toyoda, Atsushi; Kawai, Jun; Iriki, Atsushi; Sasaki, Erika; Okano, Hideyuki; Sakakibara, Yasubumi

    2015-11-20

    The first draft of the common marmoset (Callithrix jacchus) genome was published by the Marmoset Genome Sequencing and Analysis Consortium. The draft was based on whole-genome shotgun sequencing, and the current assembly version is Callithrix_jacches-3.2.1, but there still exist 187,214 undetermined gap regions and supercontigs and relatively short contigs that are unmapped to chromosomes in the draft genome. We performed resequencing and assembly of the genome of common marmoset by deep sequencing with high-throughput sequencing technology. Several different sequence runs using Illumina sequencing platforms were executed, and 181 Gbp of high-quality bases including mate-pairs with long insert lengths of 3, 8, 20, and 40 Kbp were obtained, that is, approximately 60× coverage. The resequencing significantly improved the MGSAC draft genome sequence. The N50 of the contigs, which is a statistical measure used to evaluate assembly quality, doubled. As a result, 51% of the contigs (total length: 299 Mbp) that were unmapped to chromosomes in the MGSAC draft were merged with chromosomal contigs, and the improved genome sequence helped to detect 5,288 new genes that are homologous to human cDNAs and the gaps in 5,187 transcripts of the Ensembl gene annotations were completely filled.

  1. Comparative analysis of rosaceous genomes and the reconstruction of a putative ancestral genome for the family

    PubMed Central

    2011-01-01

    Background Comparative genome mapping studies in Rosaceae have been conducted until now by aligning genetic maps within the same genus, or closely related genera and using a limited number of common markers. The growing body of genomics resources and sequence data for both Prunus and Fragaria permits detailed comparisons between these genera and the recently released Malus × domestica genome sequence. Results We generated a comparative analysis using 806 molecular markers that are anchored genetically to the Prunus and/or Fragaria reference maps, and physically to the Malus genome sequence. Markers in common for Malus and Prunus, and Malus and Fragaria, respectively were 784 and 148. The correspondence between marker positions was high and conserved syntenic blocks were identified among the three genera in the Rosaceae. We reconstructed a proposed ancestral genome for the Rosaceae. Conclusions A genome containing nine chromosomes is the most likely candidate for the ancestral Rosaceae progenitor. The number of chromosomal translocations observed between the three genera investigated was low. However, the number of inversions identified among Malus and Prunus was much higher than any reported genome comparisons in plants, suggesting that small inversions have played an important role in the evolution of these two genera or of the Rosaceae. PMID:21226921

  2. Mycobacterial species as case-study of comparative genome analysis.

    PubMed

    Zakham, F; Belayachi, L; Ussery, D; Akrim, M; Benjouad, A; El Aouad, R; Ennaji, M M

    2011-02-08

    The genus Mycobacterium represents more than 120 species including important pathogens of human and cause major public health problems and illnesses. Further, with more than 100 genome sequences from this genus, comparative genome analysis can provide new insights for better understanding the evolutionary events of these species and improving drugs, vaccines, and diagnostics tools for controlling Mycobacterial diseases. In this present study we aim to outline a comparative genome analysis of fourteen Mycobacterial genomes: M. avium subsp. paratuberculosis K—10, M. bovis AF2122/97, M. bovis BCG str. Pasteur 1173P2, M. leprae Br4923, M. marinum M, M. sp. KMS, M. sp. MCS, M. tuberculosis CDC1551, M. tuberculosis F11, M. tuberculosis H37Ra, M. tuberculosis H37Rv, M. tuberculosis KZN 1435 , M. ulcerans Agy99,and M. vanbaalenii PYR—1, For this purpose a comparison has been done based on their length of genomes, GC content, number of genes in different data bases (Genbank, Refseq, and Prodigal). The BLAST matrix of these genomes has been figured to give a lot of information about the similarity between species in a simple scheme. As a result of multiple genome analysis, the pan and core genome have been defined for twelve Mycobacterial species. We have also introduced the genome atlas of the reference strain M. tuberculosis H37Rv which can give a good overview of this genome. And for examining the phylogenetic relationships among these bacteria, a phylogenic tree has been constructed from 16S rRNA gene for tuberculosis and non tuberculosis Mycobacteria to understand the evolutionary events of these species.

  3. Power, resolution and bias: recent advances in insect phylogeny driven by the genomic revolution.

    PubMed

    Yeates, David K; Meusemann, Karen; Trautwein, Michelle; Wiegmann, Brian; Zwick, Andreas

    2016-02-01

    Our understanding on the phylogenetic relationships of insects has been revolutionised in the last decade by the proliferation of next generation sequencing technologies (NGS). NGS has allowed insect systematists to assemble very large molecular datasets that include both model and non-model organisms. Such datasets often include a large proportion of the total number of protein coding sequences available for phylogenetic comparison. We review some early entomological phylogenomic studies that employ a range of different data sampling protocols and analyses strategies, illustrating a fundamental renaissance in our understanding of insect evolution all driven by the genomic revolution. The analysis of phylogenomic datasets is challenging because of their size and complexity, and it is obvious that the increasing size alone does not ensure that phylogenetic signal overcomes systematic biases in the data. Biases can be due to various factors such as the method of data generation and assembly, or intrinsic biological feature of the data per se, such as similarities due to saturation or compositional heterogeneity. Such biases often cause violations in the underlying assumptions of phylogenetic models. We review some of the bioinformatics tools available and being developed to detect and minimise systematic biases in phylogenomic datasets. Phylogenomic-scale data coupled with sophisticated analyses will revolutionise our understanding of insect functional genomics. This will illuminate the relationship between the vast range of insect phenotypic diversity and underlying genetic diversity. In combination with rapidly developing methods to estimate divergence times, these analyses will also provide a compelling view of the rates and patterns of lineagenesis (birth of lineages) over the half billion years of insect evolution.

  4. Power, resolution and bias: recent advances in insect phylogeny driven by the genomic revolution.

    PubMed

    Yeates, David K; Meusemann, Karen; Trautwein, Michelle; Wiegmann, Brian; Zwick, Andreas

    2016-02-01

    Our understanding on the phylogenetic relationships of insects has been revolutionised in the last decade by the proliferation of next generation sequencing technologies (NGS). NGS has allowed insect systematists to assemble very large molecular datasets that include both model and non-model organisms. Such datasets often include a large proportion of the total number of protein coding sequences available for phylogenetic comparison. We review some early entomological phylogenomic studies that employ a range of different data sampling protocols and analyses strategies, illustrating a fundamental renaissance in our understanding of insect evolution all driven by the genomic revolution. The analysis of phylogenomic datasets is challenging because of their size and complexity, and it is obvious that the increasing size alone does not ensure that phylogenetic signal overcomes systematic biases in the data. Biases can be due to various factors such as the method of data generation and assembly, or intrinsic biological feature of the data per se, such as similarities due to saturation or compositional heterogeneity. Such biases often cause violations in the underlying assumptions of phylogenetic models. We review some of the bioinformatics tools available and being developed to detect and minimise systematic biases in phylogenomic datasets. Phylogenomic-scale data coupled with sophisticated analyses will revolutionise our understanding of insect functional genomics. This will illuminate the relationship between the vast range of insect phenotypic diversity and underlying genetic diversity. In combination with rapidly developing methods to estimate divergence times, these analyses will also provide a compelling view of the rates and patterns of lineagenesis (birth of lineages) over the half billion years of insect evolution. PMID:27436549

  5. Tiling resolution array CGH and high density expression profiling of urothelial carcinomas delineate genomic amplicons and candidate target genes specific for advanced tumors

    PubMed Central

    Heidenblad, Markus; Lindgren, David; Jonson, Tord; Liedberg, Fredrik; Veerla, Srinivas; Chebil, Gunilla; Gudjonsson, Sigurdur; Borg, Åke; Månsson, Wiking; Höglund, Mattias

    2008-01-01

    Background Urothelial carcinoma (UC) is characterized by nonrandom chromosomal aberrations, varying from one or a few changes in early-stage and low-grade tumors, to highly rearranged karyotypes in muscle-invasive lesions. Recent array-CGH analyses have shed further light on the genomic changes underlying the neoplastic development of UC, and have facilitated the molecular delineation amplified and deleted regions to the level of specific candidate genes. In the present investigation we combine detailed genomic information with expression information to identify putative target genes for genomic amplifications. Methods We analyzed 38 urothelial carcinomas by whole-genome tiling resolution array-CGH and high density expression profiling to identify putative target genes in common genomic amplifications. When necessary expression profiling was complemented with Q-PCR of individual genes. Results Three genomic segments were frequently and exclusively amplified in high grade tumors; 1q23, 6p22 and 8q22, respectively. Detailed mapping of the 1q23 segment showed a heterogeneous amplification pattern and no obvious commonly amplified region. The 6p22 amplicon was defined by a 1.8 Mb core region present in all amplifications, flanked both distally and proximally by segments amplified to a lesser extent. By combining genomic profiles with expression profiles we could show that amplification of E2F3, CDKAL1, SOX4, and MBOAT1 as well as NUP153, AOF1, FAM8A1 and DEK in 6p22 was associated with increased gene expression. Amplification of the 8q22 segment was primarily associated with YWHAZ (14-3-3-zeta) and POLR2K over expression. The possible importance of the YWHA genes in the development of urothelial carcinomas was supported by another recurrent amplicon paralogous to 8q22, in 2p25, where increased copy numbers lead to enhanced expression of YWHAQ (14-3-3-theta). Homozygous deletions were identified at 10 different genomic locations, most frequently affecting CDKN2A/CDKN2B

  6. Genome sequencing and analysis of the model grass Brachypodium distachyon

    SciTech Connect

    Yang, Xiaohan; Kalluri, Udaya C; Tuskan, Gerald A

    2010-01-01

    Three subfamilies of grasses, the Ehrhartoideae, Panicoideae and Pooideae, provide the bulk of human nutrition and are poised to become major sources of renewable energy. Here we describe the genome sequence of the wild grass Brachypodium distachyon (Brachypodium), which is, to our knowledge, the first member of the Pooideae subfamily to be sequenced. Comparison of the Brachypodium, rice and sorghum genomes shows a precise history of genome evolution across a broad diversity of the grasses, and establishes a template for analysis of the large genomes of economically important pooid grasses such as wheat. The high-quality genome sequence, coupled with ease of cultivation and transformation, small size and rapid life cycle, will help Brachypodium reach its potential as an important model system for developing new energy and food crops.

  7. Somatic alterations in the melanoma genome: a high-resolution array-based comparative genomic hybridization study.

    PubMed

    Gast, Andreas; Scherer, Dominique; Chen, Bowang; Bloethner, Sandra; Melchert, Stephanie; Sucker, Antje; Hemminki, Kari; Schadendorf, Dirk; Kumar, Rajiv

    2010-08-01

    We performed DNA microarray-based comparative genomic hybridization to identify somatic alterations specific to melanoma genome in 60 human cell lines from metastasized melanoma and from 44 corresponding peripheral blood mononuclear cells. Our data showed gross but nonrandom somatic changes specific to the tumor genome. Although the CDKN2A (78%) and PTEN (70%) loci were the major targets of mono-allelic and bi-allelic deletions, amplifications affected loci with BRAF (53%) and NRAS (12%) as well as EGFR (52%), MITF (40%), NOTCH2 (35%), CCND1 (18%), MDM2 (18%), CCNE1 (10%), and CDK4 (8%). The amplified loci carried additional genes, many of which could potentially play a role in melanoma. Distinct patterns of copy number changes showed that alterations in CDKN2A tended to be more clustered in cell lines with mutations in the BRAF and NRAS genes; the PTEN locus was targeted mainly in conjunction with BRAF mutations. Amplification of CCND1, CDK4, and other loci was significantly increased in cell lines without BRAF-NRAS mutations and so was the loss of chromosome arms 13q and 16q. Our data suggest involvement of distinct genetic pathways that are driven either through oncogenic BRAF and NRAS mutations complemented by aberrations in the CDKN2A and PTEN genes or involve amplification of oncogenic genomic loci and loss of 13q and 16q. It also emerges that each tumor besides being affected by major and most common somatic genetic alterations also acquires additional genetic alterations that could be crucial in determining response to small molecular inhibitors that are being currently pursued. PMID:20544847

  8. Nucleotide-Resolution Profiling of RNA Recombination in the Encapsidated Genome of a Eukaryotic RNAVirus by Next-Generation Sequencing

    PubMed Central

    Routh, Andrew; Ordoukhanian, Phillip; Johnson, John E.

    2012-01-01

    Next-Generation Sequencing has been used in numerous investigations to characterize andquantifythe genetic diversity of a virus samplethrough the mapping of polymorphisms and measurement of mutation frequencies.Next-Generation Sequencing has also been employed to identifyrecombinationevents occurring within the genomes of higher organisms, for example, detecting alternative RNA splicing events and oncogenic chromosomal rearrangements. Here, we combine these two approaches toprofile RNA recombination within the encapsidated genome of a eukaryotic RNA virus, Flock House Virus. We detect hundreds of thousands of recombination events, with single-nucleotide resolution, which result indiversity in the encapsidated genome rivaling that due to mismatch mutation. We detect previously identified Defective-RNAs as well as many other abundant and novel Defective-RNAs. Our approach is exceptionally sensitive, unbiased, and requires no prior knowledge beyond the virus genome sequence. RNA recombination is a powerful driving force behind the evolution and adaptation of RNA viruses. The strategy implemented here is widely applicable and provides a highly detailed description of the complex mutational landscape of the transmissible viral genome. PMID:23069247

  9. Copy Number Variation Analysis by Array Analysis of Single Cells Following Whole Genome Amplification.

    PubMed

    Dimitriadou, Eftychia; Zamani Esteki, Masoud; Vermeesch, Joris Robert

    2015-01-01

    Whole genome amplification is required to ensure the availability of sufficient material for copy number variation analysis of a genome deriving from an individual cell. Here, we describe the protocols we use for copy number variation analysis of non-fixed single cells by array-based approaches following single-cell isolation and whole genome amplification. We are focusing on two alternative protocols, an isothermal and a PCR-based whole genome amplification method, followed by either comparative genome hybridization (aCGH) or SNP array analysis, respectively.

  10. Microarray Comparative Genomic Hybridisation Analysis Incorporating Genomic Organisation, and Application to Enterobacterial Plant Pathogens

    PubMed Central

    Pritchard, Leighton; Liu, Hui; Booth, Clare; Douglas, Emma; François, Patrice; Schrenzel, Jacques; Hedley, Peter E.; Birch, Paul R. J.; Toth, Ian K.

    2009-01-01

    Microarray comparative genomic hybridisation (aCGH) provides an estimate of the relative abundance of genomic DNA (gDNA) taken from comparator and reference organisms by hybridisation to a microarray containing probes that represent sequences from the reference organism. The experimental method is used in a number of biological applications, including the detection of human chromosomal aberrations, and in comparative genomic analysis of bacterial strains, but optimisation of the analysis is desirable in each problem domain. We present a method for analysis of bacterial aCGH data that encodes spatial information from the reference genome in a hidden Markov model. This technique is the first such method to be validated in comparisons of sequenced bacteria that diverge at the strain and at the genus level: Pectobacterium atrosepticum SCRI1043 (Pba1043) and Dickeya dadantii 3937 (Dda3937); and Lactococcus lactis subsp. lactis IL1403 and L. lactis subsp. cremoris MG1363. In all cases our method is found to outperform common and widely used aCGH analysis methods that do not incorporate spatial information. This analysis is applied to comparisons between commercially important plant pathogenic soft-rotting enterobacteria (SRE) Pba1043, P. atrosepticum SCRI1039, P. carotovorum 193, and Dda3937. Our analysis indicates that it should not be assumed that hybridisation strength is a reliable proxy for sequence identity in aCGH experiments, and robustly extends the applicability of aCGH to bacterial comparisons at the genus level. Our results in the SRE further provide evidence for a dynamic, plastic ‘accessory’ genome, revealing major genomic islands encoding gene products that provide insight into, and may play a direct role in determining, variation amongst the SRE in terms of their environmental survival, host range and aetiology, such as phytotoxin synthesis, multidrug resistance, and nitrogen fixation. PMID:19696881

  11. A high resolution genetic map anchoring scaffolds of the sequenced watermelon genome

    Technology Transfer Automated Retrieval System (TEKTRAN)

    As part of our ongoing efforts to sequence and map the watermelon (Citrullus spp.) genome, we have constructed a high-density genetic linkage map. The map positioned 234 watermelon genome sequence scaffolds (an average size of 1.41 Mb) that cover about 330 Mb and account for 93.5% of the 353 Mb of ...

  12. Evacuee Compliance Behavior Analysis using High Resolution Demographic Information

    SciTech Connect

    Lu, Wei; Han, Lee; Liu, Cheng; Tuttle, Mark A; Bhaduri, Budhendra L

    2014-01-01

    The purpose of this study is to examine whether evacuee compliance behavior with route assignments from different resolutions of demographic data would impact the evacuation performance. Most existing evacuation strategies assume that travelers will follow evacuation instructions, while in reality a certain percent of evacuees do not comply with prescribed instructions. In this paper, a comparison study of evacuation assignment based on Traffic Analysis Zones (TAZ) and high resolution LandScan USA Population Cells (LPC) were conducted for the detailed road network representing Alexandria, Virginia. A revised platform for evacuation modeling built on high resolution demographic data and activity-based microscopic traffic simulation is proposed. The results indicate that evacuee compliance behavior affects evacuation efficiency with traditional TAZ assignment, but it does not significantly compromise the efficiency with high resolution LPC assignment. The TAZ assignment also underestimates the real travel time during evacuation, especially for high compliance simulations. This suggests that conventional evacuation studies based on TAZ assignment might not be effective at providing efficient guidance to evacuees. From the high resolution data perspective, traveler compliance behavior is an important factor but it does not impact the system performance significantly. The highlight of evacuee compliance behavior analysis should be emphasized on individual evacuee level route/shelter assignments, rather than the whole system performance.

  13. Genomic analysis of dairy starter culture Streptococcus thermophilus MTCC 5461.

    PubMed

    Prajapati, Jashbhai B; Nathani, Neelam M; Patel, Amrutlal K; Senan, Suja; Joshi, Chaitanya G

    2013-04-01

    The lactic acid bacterium Streptococcus thermophilus is widely used as a starter culture for the production of dairy products. Whole-genome sequencing is expected to utilize the genetic basis behind the metabolic functioning of lactic acid bacterium (LAB), for development of their use in biotechnological and probiotic applications. We sequenced the whole genome of Streptococcus thermophilus MTCC 5461, the strain isolated from a curd source, by 454 GS-FLX titanium and Ion Torrent PGM. We performed comparative genome analysis using the local BLAST and RDP for 16S rDNA comparison and by the RAST server for functional comparison against the published genome sequence of Streptococcus thermophilus CNRZ 1066. The whole genome size of S. thermophilus MTCC 5461 is of 1.73Mb size with a GC content of 39.3%. Streptococcal virulence-related genes are either inactivated or absent in the strain. The genome possesses coding sequences for features important for a probiotic organism such as adhesion, acid tolerance, bacteriocin production, and lactose utilization, which was found to be conserved among the strains MTCC 5461 and CNRZ 1066. Biochemical analysis revealed the utilization of 17 sugars by the bacterium, where the presence of genes encoding enzymes involved in metabolism for 16 of these 17 sugars were confirmed in the genome. This study supports the facts that the strain MTCC 5461 is nonpathogenic and harbors essential features that can be exploited for its probiotic potential.

  14. MIPS: analysis and annotation of proteins from whole genomes.

    PubMed

    Mewes, H W; Amid, C; Arnold, R; Frishman, D; Güldener, U; Mannhaupt, G; Münsterkötter, M; Pagel, P; Strack, N; Stümpflen, V; Warfsmann, J; Ruepp, A

    2004-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein-protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de). PMID:14681354

  15. Methodological challenges of genome-wide association analysis in Africa

    PubMed Central

    Teo, Yik-Ying; Small, Kerrin S.; Kwiatkowski, Dominic P.

    2013-01-01

    Medical research in Africa has yet to benefit from the advent of genome-wide association (GWA) analysis, partly because the genotyping tools and statistical methods that have been developed for European and Asian populations struggle to deal with the high levels of genome diversity and population structure in Africa. However, the haplotypic diversity of African populations might help to overcome one of the major roadblocks in GWA research, the fine mapping of causal variants. We review the methodological challenges and consider how GWA studies in Africa will be transformed by new approaches in statistical imputation and large-scale genome sequencing. PMID:20084087

  16. Meta-analysis of genome-wide linkage scans for renal function traits

    PubMed Central

    Rao, Madhumathi; Mottl, Amy K.; Cole, Shelley A.; Umans, Jason G.; Freedman, Barry I.; Bowden, Donald W.; Langefeld, Carl D.; Fox, Caroline S.; Yang, Qiong; Cupples, Adrienne; Iyengar, Sudha K.; Hunt, Steven C.

    2012-01-01

    Background. Several genome scans have explored the linkage of chronic kidney disease phenotypes to chromosomic regions with disparate results. Genome scan meta-analysis (GSMA) is a quantitative method to synthesize linkage results from independent studies and assess their concordance. Methods. We searched PubMed to identify genome linkage analyses of renal function traits in humans, such as estimated glomerular filtration rate (GFR), albuminuria, serum creatinine concentration and creatinine clearance. We contacted authors for numerical data and extracted information from individual studies. We applied the GSMA nonparametric approach to combine results across 14 linkage studies for GFR, 11 linkage studies for albumin creatinine ratio, 11 linkage studies for serum creatinine and 4 linkage studies for creatinine clearance. Results. No chromosomal region reached genome-wide statistical significance in the main analysis which included all scans under each phenotype; however, regions on Chromosomes 7, 10 and 16 reached suggestive significance for linkage to two or more phenotypes. Subgroup analyses by disease status or ethnicity did not yield additional information. Conclusions. While heterogeneity across populations, methodologies and study designs likely explain this lack of agreement, it is possible that linkage scan methodologies lack the resolution for investigating complex traits. Combining family-based linkage studies with genome-wide association studies may be a powerful approach to detect private mutations contributing to complex renal phenotypes. PMID:21622988

  17. Comparative Genomics via Wavelet Analysis for Closely Related Bacteria

    NASA Astrophysics Data System (ADS)

    Song, Jiuzhou; Ware, Tony; Liu, Shu-Lin; Surette, M.

    2004-12-01

    Comparative genomics has been a valuable method for extracting and extrapolating genome information among closely related bacteria. The efficiency of the traditional methods is extremely influenced by the software method used. To overcome the problem here, we propose using wavelet analysis to perform comparative genomics. First, global comparison using wavelet analysis gives the difference at a quantitative level. Then local comparison using keto-excess or purine-excess plots shows precise positions of inversions, translocations, and horizontally transferred DNA fragments. We firstly found that the level of energy spectra difference is related to the similarity of bacteria strains; it could be a quantitative index to describe the similarities of genomes. The strategy is described in detail by comparisons of closely related strains: S.typhi CT18, S.typhi Ty2, S.typhimurium LT2, H.pylori 26695, and H.pylori J99.

  18. A Mitochondrial Genome of Rhyparochromidae (Hemiptera: Heteroptera) and a Comparative Analysis of Related Mitochondrial Genomes

    PubMed Central

    Li, Teng; Yang, Jie; Li, Yinwan; Cui, Ying; Xie, Qiang; Bu, Wenjun; Hillis, David M.

    2016-01-01

    The Rhyparochromidae, the largest family of Lygaeoidea, encompasses more than 1,850 described species, but no mitochondrial genome has been sequenced to date. Here we describe the first mitochondrial genome for Rhyparochromidae: a complete mitochondrial genome of Panaorus albomaculatus (Scott, 1874). This mitochondrial genome is comprised of 16,345 bp, and contains the expected 37 genes and control region. The majority of the control region is made up of a large tandem-repeat region, which has a novel pattern not previously observed in other insects. The tandem-repeats region of P. albomaculatus consists of 53 tandem duplications (including one partial repeat), which is the largest number of tandem repeats among all the known insect mitochondrial genomes. Slipped-strand mispairing during replication is likely to have generated this novel pattern of tandem repeats. Comparative analysis of tRNA gene families in sequenced Pentatomomorpha and Lygaeoidea species shows that the pattern of nucleotide conservation is markedly higher on the J-strand. Phylogenetic reconstruction based on mitochondrial genomes suggests that Rhyparochromidae is not the sister group to all the remaining Lygaeoidea, and supports the monophyly of Lygaeoidea. PMID:27756915

  19. Toward a Comprehensive Genomic Analysis of Cancer - TCGA

    Cancer.gov

    The National Cancer Institute (NCI) and National Human Genome Research Institute (NHGRI) convened a "Toward a Comprehensive Genomic Analysis of Cancer" workshop in Washington, D.C. This workshop brought together physicians, basic scientists and other members of the U.S. and international cancer communities to assist in outlining the most effective strategies for the development of a successful project. Information about this workshop is reported in the Executive Summary.

  20. The genome sequence of Blochmannia floridanus: Comparative analysis of reduced genomes

    PubMed Central

    Gil, Rosario; Silva, Francisco J.; Zientz, Evelyn; Delmotte, François; González-Candelas, Fernando; Latorre, Amparo; Rausell, Carolina; Kamerbeek, Judith; Gadau, Jürgen; Hölldobler, Bert; van Ham, Roeland C. H. J.; Gross, Roy; Moya, Andrés

    2003-01-01

    Bacterial symbioses are widespread among insects, probably being one of the key factors of their evolutionary success. We present the complete genome sequence of Blochmannia floridanus, the primary endosymbiont of carpenter ants. Although these ants feed on a complex diet, this symbiosis very likely has a nutritional basis: Blochmannia is able to supply nitrogen and sulfur compounds to the host while it takes advantage of the host metabolic machinery. Remarkably, these bacteria lack all known genes involved in replication initiation (dnaA, priA, and recA). The phylogenetic analysis of a set of conserved protein-coding genes shows that Bl. floridanus is phylogenetically related to Buchnera aphidicola and Wigglesworthia glossinidia, the other endosymbiotic bacteria whose complete genomes have been sequenced so far. Comparative analysis of the five known genomes from insect endosymbiotic bacteria reveals they share only 313 genes, a number that may be close to the minimum gene set necessary to sustain endosymbiotic life. PMID:12886019

  1. The genome sequence of Blochmannia floridanus: comparative analysis of reduced genomes.

    PubMed

    Gil, Rosario; Silva, Francisco J; Zientz, Evelyn; Delmotte, François; González-Candelas, Fernando; Latorre, Amparo; Rausell, Carolina; Kamerbeek, Judith; Gadau, Jürgen; Hölldobler, Bert; van Ham, Roeland C H J; Gross, Roy; Moya, Andrés

    2003-08-01

    Bacterial symbioses are widespread among insects, probably being one of the key factors of their evolutionary success. We present the complete genome sequence of Blochmannia floridanus, the primary endosymbiont of carpenter ants. Although these ants feed on a complex diet, this symbiosis very likely has a nutritional basis: Blochmannia is able to supply nitrogen and sulfur compounds to the host while it takes advantage of the host metabolic machinery. Remarkably, these bacteria lack all known genes involved in replication initiation (dnaA, priA, and recA). The phylogenetic analysis of a set of conserved protein-coding genes shows that Bl. floridanus is phylogenetically related to Buchnera aphidicola and Wigglesworthia glossinidia, the other endosymbiotic bacteria whose complete genomes have been sequenced so far. Comparative analysis of the five known genomes from insect endosymbiotic bacteria reveals they share only 313 genes, a number that may be close to the minimum gene set necessary to sustain endosymbiotic life. PMID:12886019

  2. Using the Saccharomyces Genome Database (SGD) for analysis of genomic information.

    PubMed

    Skrzypek, Marek S; Hirschman, Jodi

    2011-09-01

    Analysis of genomic data requires access to software tools that place the sequence-derived information in the context of biology. The Saccharomyces Genome Database (SGD) integrates functional information about budding yeast genes and their products with a set of analysis tools that facilitate exploring their biological details. This unit describes how the various types of functional data available at SGD can be searched, retrieved, and analyzed. Starting with the guided tour of the SGD Home page and Locus Summary page, this unit highlights how to retrieve data using YeastMine, how to visualize genomic information with GBrowse, how to explore gene expression patterns with SPELL, and how to use Gene Ontology tools to characterize large-scale datasets.

  3. Private genome analysis through homomorphic encryption

    PubMed Central

    2015-01-01

    Background The rapid development of genome sequencing technology allows researchers to access large genome datasets. However, outsourcing the data processing o the cloud poses high risks for personal privacy. The aim of this paper is to give a practical solution for this problem using homomorphic encryption. In our approach, all the computations can be performed in an untrusted cloud without requiring the decryption key or any interaction with the data owner, which preserves the privacy of genome data. Methods We present evaluation algorithms for secure computation of the minor allele frequencies and χ2 statistic in a genome-wide association studies setting. We also describe how to privately compute the Hamming distance and approximate Edit distance between encrypted DNA sequences. Finally, we compare performance details of using two practical homomorphic encryption schemes - the BGV scheme by Gentry, Halevi and Smart and the YASHE scheme by Bos, Lauter, Loftus and Naehrig. Results The approach with the YASHE scheme analyzes data from 400 people within about 2 seconds and picks a variant associated with disease from 311 spots. For another task, using the BGV scheme, it took about 65 seconds to securely compute the approximate Edit distance for DNA sequences of size 5K and figure out the differences between them. Conclusions The performance numbers for BGV are better than YASHE when homomorphically evaluating deep circuits (like the Hamming distance algorithm or approximate Edit distance algorithm). On the other hand, it is more efficient to use the YASHE scheme for a low-degree computation, such as minor allele frequencies or χ2 test statistic in a case-control study. PMID:26733152

  4. Single-Cell Analysis in Cancer Genomics.

    PubMed

    Saadatpour, Assieh; Lai, Shujing; Guo, Guoji; Yuan, Guo-Cheng

    2015-10-01

    Genetic changes and environmental differences result in cellular heterogeneity among cancer cells within the same tumor, thereby complicating treatment outcomes. Recent advances in single-cell technologies have opened new avenues to characterize the intra-tumor cellular heterogeneity, identify rare cell types, measure mutation rates, and, ultimately, guide diagnosis and treatment. In this paper we review the recent single-cell technological and computational advances at the genomic, transcriptomic, and proteomic levels, and discuss their applications in cancer research. PMID:26450340

  5. Comparative genome analysis of Pseudomonas genomes including Populus-associated isolates

    SciTech Connect

    Jun, Se Ran; Wassenaar, Trudy; Nookaew, Intawat; Hauser, Loren John; Wanchai, Visanu; Land, Miriam L.; Timm, Collin M.; Lu, Tse-Yuan S.; Schadt, Christopher Warren; Doktycz, Mitchel John; Pelletier, Dale A; Ussery, David W

    2016-01-01

    The Pseudomonas genus contains a metabolically versatile group of organisms that are known to occupy numerous ecological niches including the rhizosphere and endosphere of many plants influencing phylogenetic diversity and heterogeneity. In this study, comparative genome analysis was performed on over one thousand Pseudomonas genomes, including 21 Pseudomonas strains isolated from the roots of native Populus deltoides. Based on average amino acid identity, genomic clusters were identified within the Pseudomonas genus, which showed agreements with clades by NCBI and cliques by IMG. The P. fluorescens group was organized into 20 distinct genomic clusters, representing enormous diversity and heterogeneity. The species P. aeruginosa showed clear distinction in their genomic relatedness compared to other Pseudomonas species groups based on the pan and core genome analysis. The 19 isolates of our 21 Populus-associated isolates formed three distinct subgroups within the P. fluorescens major group, supported by pathway profiles analysis, while two isolates were more closely related to P. chlororaphis and P. putida. The specific genes to Populus-associated subgroups were identified where genes specific to subgroup 1 include several sensory systems such as proteins which act in two-component signal transduction, a TonB-dependent receptor, and a phosphorelay sensor; specific genes to subgroup 2 contain unique hypothetical genes; and genes specific to subgroup 3 organisms have a different hydrolase activity. IMPORTANCE The comparative genome analyses of the genus Pseudomonas that included Populus-associated isolates resulted in novel insights into high diversity of Pseudomonas. Consistent and robust genomic clusters with phylogenetic homogeneity were identified, which resolved species-clades that are not clearly defined by 16S rRNA gene sequence analysis alone. The genomic clusters may be reflective of distinct ecological niches to which the organisms have adapted, but this

  6. Comparative genome analysis of Pseudomonas genomes including Populus-associated isolates

    DOE PAGES

    Jun, Se Ran; Wassenaar, Trudy; Nookaew, Intawat; Hauser, Loren John; Wanchai, Visanu; Land, Miriam L.; Timm, Collin M.; Lu, Tse-Yuan S.; Schadt, Christopher Warren; Doktycz, Mitchel John; et al

    2016-01-01

    The Pseudomonas genus contains a metabolically versatile group of organisms that are known to occupy numerous ecological niches including the rhizosphere and endosphere of many plants influencing phylogenetic diversity and heterogeneity. In this study, comparative genome analysis was performed on over one thousand Pseudomonas genomes, including 21 Pseudomonas strains isolated from the roots of native Populus deltoides. Based on average amino acid identity, genomic clusters were identified within the Pseudomonas genus, which showed agreements with clades by NCBI and cliques by IMG. The P. fluorescens group was organized into 20 distinct genomic clusters, representing enormous diversity and heterogeneity. The speciesmore » P. aeruginosa showed clear distinction in their genomic relatedness compared to other Pseudomonas species groups based on the pan and core genome analysis. The 19 isolates of our 21 Populus-associated isolates formed three distinct subgroups within the P. fluorescens major group, supported by pathway profiles analysis, while two isolates were more closely related to P. chlororaphis and P. putida. The specific genes to Populus-associated subgroups were identified where genes specific to subgroup 1 include several sensory systems such as proteins which act in two-component signal transduction, a TonB-dependent receptor, and a phosphorelay sensor; specific genes to subgroup 2 contain unique hypothetical genes; and genes specific to subgroup 3 organisms have a different hydrolase activity. IMPORTANCE The comparative genome analyses of the genus Pseudomonas that included Populus-associated isolates resulted in novel insights into high diversity of Pseudomonas. Consistent and robust genomic clusters with phylogenetic homogeneity were identified, which resolved species-clades that are not clearly defined by 16S rRNA gene sequence analysis alone. The genomic clusters may be reflective of distinct ecological niches to which the organisms have adapted, but

  7. A novel statistic for genome-wide interaction analysis.

    PubMed

    Wu, Xuesen; Dong, Hua; Luo, Li; Zhu, Yun; Peng, Gang; Reveille, John D; Xiong, Momiao

    2010-09-23

    Although great progress in genome-wide association studies (GWAS) has been made, the significant SNP associations identified by GWAS account for only a few percent of the genetic variance, leading many to question where and how we can find the missing heritability. There is increasing interest in genome-wide interaction analysis as a possible source of finding heritability unexplained by current GWAS. However, the existing statistics for testing interaction have low power for genome-wide interaction analysis. To meet challenges raised by genome-wide interactional analysis, we have developed a novel statistic for testing interaction between two loci (either linked or unlinked). The null distribution and the type I error rates of the new statistic for testing interaction are validated using simulations. Extensive power studies show that the developed statistic has much higher power to detect interaction than classical logistic regression. The results identified 44 and 211 pairs of SNPs showing significant evidence of interactions with FDR<0.001 and 0.001genome-wide interaction analysis is a valuable tool for finding remaining missing heritability unexplained by the current GWAS, and the developed novel statistic is able to search significant interaction between SNPs across the genome. Real data analysis showed that the results of genome-wide interaction analysis can be replicated in two independent studies.

  8. Diversity of Pseudomonas Genomes, Including Populus-Associated Isolates, as Revealed by Comparative Genome Analysis

    PubMed Central

    Jun, Se-Ran; Wassenaar, Trudy M.; Nookaew, Intawat; Hauser, Loren; Wanchai, Visanu; Land, Miriam; Timm, Collin M.; Lu, Tse-Yuan S.; Schadt, Christopher W.; Doktycz, Mitchel J.; Pelletier, Dale A.

    2015-01-01

    The Pseudomonas genus contains a metabolically versatile group of organisms that are known to occupy numerous ecological niches, including the rhizosphere and endosphere of many plants. Their diversity influences the phylogenetic diversity and heterogeneity of these communities. On the basis of average amino acid identity, comparative genome analysis of >1,000 Pseudomonas genomes, including 21 Pseudomonas strains isolated from the roots of native Populus deltoides (eastern cottonwood) trees resulted in consistent and robust genomic clusters with phylogenetic homogeneity. All Pseudomonas aeruginosa genomes clustered together, and these were clearly distinct from other Pseudomonas species groups on the basis of pangenome and core genome analyses. In contrast, the genomes of Pseudomonas fluorescens were organized into 20 distinct genomic clusters, representing enormous diversity and heterogeneity. Most of our 21 Populus-associated isolates formed three distinct subgroups within the major P. fluorescens group, supported by pathway profile analysis, while two isolates were more closely related to Pseudomonas chlororaphis and Pseudomonas putida. Genes specific to Populus-associated subgroups were identified. Genes specific to subgroup 1 include several sensory systems that act in two-component signal transduction, a TonB-dependent receptor, and a phosphorelay sensor. Genes specific to subgroup 2 contain hypothetical genes, and genes specific to subgroup 3 were annotated with hydrolase activity. This study justifies the need to sequence multiple isolates, especially from P. fluorescens, which displays the most genetic variation, in order to study functional capabilities from a pangenomic perspective. This information will prove useful when choosing Pseudomonas strains for use to promote growth and increase disease resistance in plants. PMID:26519390

  9. Genome sequence and comparative genome analysis of Lactobacillus casei: insights into their niche-associated evolution.

    PubMed

    Cai, Hui; Thompson, Rebecca; Budinich, Mateo F; Broadbent, Jeff R; Steele, James L

    2009-01-01

    Lactobacillus casei is remarkably adaptable to diverse habitats and widely used in the food industry. To reveal the genomic features that contribute to its broad ecological adaptability and examine the evolution of the species, the genome sequence of L. casei ATCC 334 is analyzed and compared with other sequenced lactobacilli. This analysis reveals that ATCC 334 contains a high number of coding sequences involved in carbohydrate utilization and transcriptional regulation, reflecting its requirement for dealing with diverse environmental conditions. A comparison of the genome sequences of ATCC 334 to L. casei BL23 reveals 12 and 19 genomic islands, respectively. For a broader assessment of the genetic variability within L. casei, gene content of 21 L. casei strains isolated from various habitats (cheeses, n = 7; plant materials, n = 8; and human sources, n = 6) was examined by comparative genome hybridization with an ATCC 334-based microarray. This analysis resulted in identification of 25 hypervariable regions. One of these regions contains an overrepresentation of genes involved in carbohydrate utilization and transcriptional regulation and was thus proposed as a lifestyle adaptation island. Differences in L. casei genome inventory reveal both gene gain and gene decay. Gene gain, via acquisition of genomic islands, likely confers a fitness benefit in specific habitats. Gene decay, that is, loss of unnecessary ancestral traits, is observed in the cheese isolates and likely results in enhanced fitness in the dairy niche. This study gives the first picture of the stable versus variable regions in L. casei and provides valuable insights into evolution, lifestyle adaptation, and metabolic diversity of L. casei. PMID:20333194

  10. Genome-Wide DNA Methylation Patterns and Transcription Analysis in Sheep Muscle

    PubMed Central

    Couldrey, Christine; Brauning, Rudiger; Bracegirdle, Jeremy; Maclean, Paul; Henderson, Harold V.; McEwan, John C.

    2014-01-01

    DNA methylation plays a central role in regulating many aspects of growth and development in mammals through regulating gene expression. The development of next generation sequencing technologies have paved the way for genome-wide, high resolution analysis of DNA methylation landscapes using methodology known as reduced representation bisulfite sequencing (RRBS). While RRBS has proven to be effective in understanding DNA methylation landscapes in humans, mice, and rats, to date, few studies have utilised this powerful method for investigating DNA methylation in agricultural animals. Here we describe the utilisation of RRBS to investigate DNA methylation in sheep Longissimus dorsi muscles. RRBS analysis of ∼1% of the genome from Longissimus dorsi muscles provided data of suitably high precision and accuracy for DNA methylation analysis, at all levels of resolution from genome-wide to individual nucleotides. Combining RRBS data with mRNAseq data allowed the sheep Longissimus dorsi muscle methylome to be compared with methylomes from other species. While some species differences were identified, many similarities were observed between DNA methylation patterns in sheep and other more commonly studied species. The RRBS data presented here highlights the complexity of epigenetic regulation of genes. However, the similarities observed across species are promising, in that knowledge gained from epigenetic studies in human and mice may be applied, with caution, to agricultural species. The ability to accurately measure DNA methylation in agricultural animals will contribute an additional layer of information to the genetic analyses currently being used to maximise production gains in these species. PMID:25010796

  11. Sensitive and specific KRAS somatic mutation analysis on whole-genome amplified DNA from archival tissues.

    PubMed

    van Eijk, Ronald; van Puijenbroek, Marjo; Chhatta, Amiet R; Gupta, Nisha; Vossen, Rolf H A M; Lips, Esther H; Cleton-Jansen, Anne-Marie; Morreau, Hans; van Wezel, Tom

    2010-01-01

    Kirsten RAS (KRAS) is a small GTPase that plays a key role in Ras/mitogen-activated protein kinase signaling; somatic mutations in KRAS are frequently found in many cancers. The most common KRAS mutations result in a constitutively active protein. Accurate detection of KRAS mutations is pivotal to the molecular diagnosis of cancer and may guide proper treatment selection. Here, we describe a two-step KRAS mutation screening protocol that combines whole-genome amplification (WGA), high-resolution melting analysis (HRM) as a prescreen method for mutation carrying samples, and direct Sanger sequencing of DNA from formalin-fixed, paraffin-embedded (FFPE) tissue, from which limited amounts of DNA are available. We developed target-specific primers, thereby avoiding amplification of homologous KRAS sequences. The addition of herring sperm DNA facilitated WGA in DNA samples isolated from as few as 100 cells. KRAS mutation screening using high-resolution melting analysis on wgaDNA from formalin-fixed, paraffin-embedded tissue is highly sensitive and specific; additionally, this method is feasible for screening of clinical specimens, as illustrated by our analysis of pancreatic cancers. Furthermore, PCR on wgaDNA does not introduce genotypic changes, as opposed to unamplified genomic DNA. This method can, after validation, be applied to virtually any potentially mutated region in the genome.

  12. Analysis of the Core Genome and Pan-Genome of Autotrophic Acetogenic Bacteria

    PubMed Central

    Shin, Jongoh; Song, Yoseb; Jeong, Yujin; Cho, Byung-Kwan

    2016-01-01

    Acetogens are obligate anaerobic bacteria capable of reducing carbon dioxide (CO2) to multicarbon compounds coupled to the oxidation of inorganic substrates, such as hydrogen (H2) or carbon monoxide (CO), via the Wood-Ljungdahl pathway. Owing to the metabolic capability of CO2 fixation, much attention has been focused on understanding the unique pathways associated with acetogens, particularly their metabolic coupling of CO2 fixation to energy conservation. Most known acetogens are phylogenetically and metabolically diverse bacteria present in 23 different bacterial genera. With the increased volume of available genome information, acetogenic bacterial genomes can be analyzed by comparative genome analysis. Even with the genetic diversity that exists among acetogens, the Wood-Ljungdahl pathway, a central metabolic pathway, and cofactor biosynthetic pathways are highly conserved for autotrophic growth. Additionally, comparative genome analysis revealed that most genes in the acetogen-specific core genome were associated with the Wood-Ljungdahl pathway. The conserved enzymes and those predicted as missing can provide insight into biological differences between acetogens and allow for the discovery of promising candidates for industrial applications. PMID:27733845

  13. Enhancing genomic laboratory reports: A qualitative analysis of provider review

    PubMed Central

    Rahm, Alanna Kulchak; Stuckey, Heather; Green, Jamie; Feldman, Lynn; Zallen, Doris T.; Bonhag, Michele; Segal, Michael M.; Fan, Audrey L.; Williams, Marc S.

    2016-01-01

    This study reports on the responses of physicians who reviewed provider and patient versions of a genomic laboratory report designed to communicate results of whole genome sequencing. Semi‐structured interviews addressed concept communication, elements, and format of example genome reports. Analysis of the coded transcripts resulted in recognition of three constructs around communication of genome sequencing results: (1) Providers agreed that whole genomic sequencing results are complex and they welcomed a report that provided supportive interpretation information to accompany sequencing results; (2) Providers strongly endorsed a report that included active clinical guidance, such as reference to practice guidelines, if available; and (3) Providers valued the genomic report as a resource that would serve as the basis to facilitate communication of genome sequencing results with their patients and families. Providers valued both versions of the report, though they affirmed the need for a provider‐oriented report. Critical elements of the report included clear language to explain the result, as well as consolidated yet comprehensive prognostic information with clear guidance over time for the clinical care of the patient. Most importantly, it appears a report with this design has the potential not only to return results but also serves as a communication tool to help providers and patients discuss and coordinate care over time. © 2016 The Authors. American Journal of Medical Genetics Part A published by Wiley Periodicals, Inc. PMID:26842872

  14. Local sequence assembly reveals a high-resolution profile of somatic structural variations in 97 cancer genomes

    PubMed Central

    Zhuang, Jiali; Weng, Zhiping

    2015-01-01

    Genomic structural variations (SVs) are pervasive in many types of cancers. Characterizing their underlying mechanisms and potential molecular consequences is crucial for understanding the basic biology of tumorigenesis. Here, we engineered a local assembly-based algorithm (laSV) that detects SVs with high accuracy from paired-end high-throughput genomic sequencing data and pinpoints their breakpoints at single base-pair resolution. By applying laSV to 97 tumor-normal paired genomic sequencing datasets across six cancer types produced by The Cancer Genome Atlas Research Network, we discovered that non-allelic homologous recombination is the primary mechanism for generating somatic SVs in acute myeloid leukemia. This finding contrasts with results for the other five types of solid tumors, in which non-homologous end joining and microhomology end joining are the predominant mechanisms. We also found that the genes recursively mutated by single nucleotide alterations differed from the genes recursively mutated by SVs, suggesting that these two types of genetic alterations play different roles during cancer progression. We further characterized how the gene structures of the oncogene JAK1 and the tumor suppressors KDM6A and RB1 are affected by somatic SVs and discussed the potential functional implications of intergenic SVs. PMID:26283183

  15. MBGD update 2015: microbial genome database for flexible ortholog analysis utilizing a diverse set of genomic data

    PubMed Central

    Uchiyama, Ikuo; Mihara, Motohiro; Nishide, Hiroyo; Chiba, Hirokazu

    2015-01-01

    The microbial genome database for comparative analysis (MBGD) (available at http://mbgd.genome.ad.jp/) is a comprehensive ortholog database for flexible comparative analysis of microbial genomes, where the users are allowed to create an ortholog table among any specified set of organisms. Because of the rapid increase in microbial genome data owing to the next-generation sequencing technology, it becomes increasingly challenging to maintain high-quality orthology relationships while allowing the users to incorporate the latest genomic data available into an analysis. Because many of the recently accumulating genomic data are draft genome sequences for which some complete genome sequences of the same or closely related species are available, MBGD now stores draft genome data and allows the users to incorporate them into a user-specific ortholog database using the MyMBGD functionality. In this function, draft genome data are incorporated into an existing ortholog table created only from the complete genome data in an incremental manner to prevent low-quality draft data from affecting clustering results. In addition, to provide high-quality orthology relationships, the standard ortholog table containing all the representative genomes, which is first created by the rapid classification program DomClust, is now refined using DomRefine, a recently developed program for improving domain-level clustering using multiple sequence alignment information. PMID:25398900

  16. MBGD update 2015: microbial genome database for flexible ortholog analysis utilizing a diverse set of genomic data.

    PubMed

    Uchiyama, Ikuo; Mihara, Motohiro; Nishide, Hiroyo; Chiba, Hirokazu

    2015-01-01

    The microbial genome database for comparative analysis (MBGD) (available at http://mbgd.genome.ad.jp/) is a comprehensive ortholog database for flexible comparative analysis of microbial genomes, where the users are allowed to create an ortholog table among any specified set of organisms. Because of the rapid increase in microbial genome data owing to the next-generation sequencing technology, it becomes increasingly challenging to maintain high-quality orthology relationships while allowing the users to incorporate the latest genomic data available into an analysis. Because many of the recently accumulating genomic data are draft genome sequences for which some complete genome sequences of the same or closely related species are available, MBGD now stores draft genome data and allows the users to incorporate them into a user-specific ortholog database using the MyMBGD functionality. In this function, draft genome data are incorporated into an existing ortholog table created only from the complete genome data in an incremental manner to prevent low-quality draft data from affecting clustering results. In addition, to provide high-quality orthology relationships, the standard ortholog table containing all the representative genomes, which is first created by the rapid classification program DomClust, is now refined using DomRefine, a recently developed program for improving domain-level clustering using multiple sequence alignment information.

  17. Cytogenetic analysis from DNA by comparative genomic hybridization.

    PubMed

    Tachdjian, G; Aboura, A; Lapierre, J M; Viguié, F

    2000-01-01

    Comparative genomic hybridization (CGH) is a modified in situ hybridization technique which allows detection and mapping of DNA sequence copy differences between two genomes in a single experiment. In CGH analysis, two differentially labelled genomic DNA (study and reference) are co-hybridized to normal metaphase spreads. Chromosomal locations of copy number changes in the DNA segments of the study genome are revealed by a variable fluorescence intensity ratio along each target chromosome. Since its development, CGH has been applied mostly as a research tool in the field of cancer cytogenetics to identify genetic changes in many previously unknown regions. CGH may also have a role in clinical cytogenetics for detection and identification of unbalanced chromosomal abnormalities.

  18. Genome wide copy number analysis of single cells

    PubMed Central

    Baslan, Timour; Kendall, Jude; Rodgers, Linda; Cox, Hilary; Riggs, Mike; Stepansky, Asya; Troge, Jennifer; Ravi, Kandasamy; Esposito, Diane; Lakshmi, B.; Wigler, Michael; Navin, Nicholas; Hicks, James

    2016-01-01

    Summary Copy number variation (CNV) is increasingly recognized as an important contributor to phenotypic variation in health and disease. Most methods for determining CNV rely on admixtures of cells, where information regarding genetic heterogeneity is lost. Here, we present a protocol that allows for the genome wide copy number analysis of single nuclei isolated from mixed populations of cells. Single nucleus sequencing (SNS), combines flow sorting of single nuclei based on DNA content, whole genome amplification (WGA), followed by next generation sequencing to quantize genomic intervals in a genome wide manner. Multiplexing of single cells is discussed. Additionally, we outline informatic approaches that correct for biases inherent in the WGA procedure and allow for accurate determination of copy number profiles. All together, the protocol takes ~3 days from flow cytometry to sequence-ready DNA libraries. PMID:22555242

  19. Differential DNA Methylation Analysis without a Reference Genome.

    PubMed

    Klughammer, Johanna; Datlinger, Paul; Printz, Dieter; Sheffield, Nathan C; Farlik, Matthias; Hadler, Johanna; Fritsch, Gerhard; Bock, Christoph

    2015-12-22

    Genome-wide DNA methylation mapping uncovers epigenetic changes associated with animal development, environmental adaptation, and species evolution. To address the lack of high-throughput methods for DNA methylation analysis in non-model organisms, we developed an integrated approach for studying DNA methylation differences independent of a reference genome. Experimentally, our method relies on an optimized 96-well protocol for reduced representation bisulfite sequencing (RRBS), which we have validated in nine species (human, mouse, rat, cow, dog, chicken, carp, sea bass, and zebrafish). Bioinformatically, we developed the RefFreeDMA software to deduce ad hoc genomes directly from RRBS reads and to pinpoint differentially methylated regions between samples or groups of individuals (http://RefFreeDMA.computational-epigenetics.org). The identified regions are interpreted using motif enrichment analysis and/or cross-mapping to annotated genomes. We validated our method by reference-free analysis of cell-type-specific DNA methylation in the blood of human, cow, and carp. In summary, we present a cost-effective method for epigenome analysis in ecology and evolution, which enables epigenome-wide association studies in natural populations and species without a reference genome.

  20. Differential DNA Methylation Analysis without a Reference Genome.

    PubMed

    Klughammer, Johanna; Datlinger, Paul; Printz, Dieter; Sheffield, Nathan C; Farlik, Matthias; Hadler, Johanna; Fritsch, Gerhard; Bock, Christoph

    2015-12-22

    Genome-wide DNA methylation mapping uncovers epigenetic changes associated with animal development, environmental adaptation, and species evolution. To address the lack of high-throughput methods for DNA methylation analysis in non-model organisms, we developed an integrated approach for studying DNA methylation differences independent of a reference genome. Experimentally, our method relies on an optimized 96-well protocol for reduced representation bisulfite sequencing (RRBS), which we have validated in nine species (human, mouse, rat, cow, dog, chicken, carp, sea bass, and zebrafish). Bioinformatically, we developed the RefFreeDMA software to deduce ad hoc genomes directly from RRBS reads and to pinpoint differentially methylated regions between samples or groups of individuals (http://RefFreeDMA.computational-epigenetics.org). The identified regions are interpreted using motif enrichment analysis and/or cross-mapping to annotated genomes. We validated our method by reference-free analysis of cell-type-specific DNA methylation in the blood of human, cow, and carp. In summary, we present a cost-effective method for epigenome analysis in ecology and evolution, which enables epigenome-wide association studies in natural populations and species without a reference genome. PMID:26673328

  1. Differential DNA Methylation Analysis without a Reference Genome

    PubMed Central

    Klughammer, Johanna; Datlinger, Paul; Printz, Dieter; Sheffield, Nathan C.; Farlik, Matthias; Hadler, Johanna; Fritsch, Gerhard; Bock, Christoph

    2015-01-01

    Summary Genome-wide DNA methylation mapping uncovers epigenetic changes associated with animal development, environmental adaptation, and species evolution. To address the lack of high-throughput methods for DNA methylation analysis in non-model organisms, we developed an integrated approach for studying DNA methylation differences independent of a reference genome. Experimentally, our method relies on an optimized 96-well protocol for reduced representation bisulfite sequencing (RRBS), which we have validated in nine species (human, mouse, rat, cow, dog, chicken, carp, sea bass, and zebrafish). Bioinformatically, we developed the RefFreeDMA software to deduce ad hoc genomes directly from RRBS reads and to pinpoint differentially methylated regions between samples or groups of individuals (http://RefFreeDMA.computational-epigenetics.org). The identified regions are interpreted using motif enrichment analysis and/or cross-mapping to annotated genomes. We validated our method by reference-free analysis of cell-type-specific DNA methylation in the blood of human, cow, and carp. In summary, we present a cost-effective method for epigenome analysis in ecology and evolution, which enables epigenome-wide association studies in natural populations and species without a reference genome. PMID:26673328

  2. Complete genomic sequence analysis of norovirus isolated from South Korea.

    PubMed

    Lee, Gyu-Cheol; Jung, Gyoo Seung; Lee, Chan Hee

    2012-10-01

    The complete nucleotide and deduced amino acid sequences of the RNA genome of a recently isolated norovirus (NoV) from Korea, designated Hu/GII-4/CBNU2/2007/KR (CBNU2), were determined and characterized by phylogenetic comparison with several genetically diverse NoV sequences. The RNA genome of CBNU2 is 7,560 nucleotides in length, excluding the 3' poly (A) tract. It includes three open reading frames (ORFs): ORF1, which encodes the nonstructural polyprotein (5-5,104); ORF2, which encodes VP1 (5,085-6,707); and ORF3, which encodes VP2 (6,707-7,513). ORF2-based phylogenetic analysis revealed that CBNU2 belonged to the GII.4 genotype, the most prevalent genotype, and formed a cluster with NoVs isolated from Asian regions, between 2006 and 2008. Comparative analysis with the consensus sequence of 207 completely sequenced NoV genomes showed 47 mismatched nucleotides: 26 in ORF1, 14 in ORF2, and 7 in ORF3, resulting in 8 amino acid changes: 3 in ORF1, 2 in ORF2, and 3 in ORF3. Phylogenetic analysis with full genome ORF1, ORF2, and ORF3 nucleotide sequences obtained from CBNU2 and each of the other representative NoV genomes suggested that CBNU2 had not undergone recombination with any of the other NoVs. A SimPlot analysis further supported this finding.

  3. A Novel Approach to Helicobacter pylori Pan-Genome Analysis for Identification of Genomic Islands.

    PubMed

    Uchiyama, Ikuo; Albritton, Jacob; Fukuyo, Masaki; Kojima, Kenji K; Yahara, Koji; Kobayashi, Ichizo

    2016-01-01

    Genomes of a given bacterial species can show great variation in gene content and thus systematic analysis of the entire gene repertoire, termed the pan-genome, is important for understanding bacterial intra-species diversity, population genetics, and evolution. Here, we analyzed the pan-genome from 30 completely sequenced strains of the human gastric pathogen Helicobacter pylori belonging to various phylogeographic groups, focusing on 991 accessory (not fully conserved) orthologous groups (OGs). We developed a method to evaluate the mobility of genes within a genome, using the gene order in the syntenically conserved regions as a reference, and classified the 991 accessory OGs into five classes: Core, Stable, Intermediate, Mobile, and Unique. Phylogenetic networks based on the gene content of Core and Stable classes are highly congruent with that created from the concatenated alignment of fully conserved core genes, in contrast to those of Intermediate and Mobile classes, which show quite different topologies. By clustering the accessory OGs on the basis of phylogenetic pattern similarity and chromosomal proximity, we identified 60 co-occurring gene clusters (CGCs). In addition to known genomic islands, including cag pathogenicity island, bacteriophages, and integrating conjugative elements, we identified some novel ones. One island encodes TerY-phosphorylation triad, which includes the eukaryote-type protein kinase/phosphatase gene pair, and components of type VII secretion system. Another one contains a reverse-transcriptase homolog, which may be involved in the defense against phage infection through altruistic suicide. Many of the CGCs contained restriction-modification (RM) genes. Different RM systems sometimes occupied the same (orthologous) locus in the strains. We anticipate that our method will facilitate pan-genome studies in general and help identify novel genomic islands in various bacterial species. PMID:27504980

  4. A Novel Approach to Helicobacter pylori Pan-Genome Analysis for Identification of Genomic Islands

    PubMed Central

    Uchiyama, Ikuo; Albritton, Jacob; Fukuyo, Masaki; Kojima, Kenji K.; Yahara, Koji; Kobayashi, Ichizo

    2016-01-01

    Genomes of a given bacterial species can show great variation in gene content and thus systematic analysis of the entire gene repertoire, termed the pan-genome, is important for understanding bacterial intra-species diversity, population genetics, and evolution. Here, we analyzed the pan-genome from 30 completely sequenced strains of the human gastric pathogen Helicobacter pylori belonging to various phylogeographic groups, focusing on 991 accessory (not fully conserved) orthologous groups (OGs). We developed a method to evaluate the mobility of genes within a genome, using the gene order in the syntenically conserved regions as a reference, and classified the 991 accessory OGs into five classes: Core, Stable, Intermediate, Mobile, and Unique. Phylogenetic networks based on the gene content of Core and Stable classes are highly congruent with that created from the concatenated alignment of fully conserved core genes, in contrast to those of Intermediate and Mobile classes, which show quite different topologies. By clustering the accessory OGs on the basis of phylogenetic pattern similarity and chromosomal proximity, we identified 60 co-occurring gene clusters (CGCs). In addition to known genomic islands, including cag pathogenicity island, bacteriophages, and integrating conjugative elements, we identified some novel ones. One island encodes TerY-phosphorylation triad, which includes the eukaryote-type protein kinase/phosphatase gene pair, and components of type VII secretion system. Another one contains a reverse-transcriptase homolog, which may be involved in the defense against phage infection through altruistic suicide. Many of the CGCs contained restriction-modification (RM) genes. Different RM systems sometimes occupied the same (orthologous) locus in the strains. We anticipate that our method will facilitate pan-genome studies in general and help identify novel genomic islands in various bacterial species. PMID:27504980

  5. Genomic Analysis of Companion Rabbit Staphylococcus aureus

    PubMed Central

    Holmes, Mark A.; Harrison, Ewan M.; Fisher, Elizabeth A.; Graham, Elizabeth M.; Parkhill, Julian; Foster, Geoffrey; Paterson, Gavin K.

    2016-01-01

    In addition to being an important human pathogen, Staphylococcus aureus is able to cause a variety of infections in numerous other host species. While the S. aureus strains causing infection in several of these hosts have been well characterised, this is not the case for companion rabbits (Oryctolagus cuniculus), where little data are available on S. aureus strains from this host. To address this deficiency we have performed antimicrobial susceptibility testing and genome sequencing on a collection of S. aureus isolates from companion rabbits. The findings show a diverse S. aureus population is able to cause infection in this host, and while antimicrobial resistance was uncommon, the isolates possess a range of known and putative virulence factors consistent with a diverse clinical presentation in companion rabbits including severe abscesses. We additionally show that companion rabbit isolates carry polymorphisms within dltB as described as underlying host-adaption of S. aureus to farmed rabbits. The availability of S. aureus genome sequences from companion rabbits provides an important aid to understanding the pathogenesis of disease in this host and in the clinical management and surveillance of these infections. PMID:26963381

  6. Genomic Analysis of Companion Rabbit Staphylococcus aureus.

    PubMed

    Holmes, Mark A; Harrison, Ewan M; Fisher, Elizabeth A; Graham, Elizabeth M; Parkhill, Julian; Foster, Geoffrey; Paterson, Gavin K

    2016-01-01

    In addition to being an important human pathogen, Staphylococcus aureus is able to cause a variety of infections in numerous other host species. While the S. aureus strains causing infection in several of these hosts have been well characterised, this is not the case for companion rabbits (Oryctolagus cuniculus), where little data are available on S. aureus strains from this host. To address this deficiency we have performed antimicrobial susceptibility testing and genome sequencing on a collection of S. aureus isolates from companion rabbits. The findings show a diverse S. aureus population is able to cause infection in this host, and while antimicrobial resistance was uncommon, the isolates possess a range of known and putative virulence factors consistent with a diverse clinical presentation in companion rabbits including severe abscesses. We additionally show that companion rabbit isolates carry polymorphisms within dltB as described as underlying host-adaption of S. aureus to farmed rabbits. The availability of S. aureus genome sequences from companion rabbits provides an important aid to understanding the pathogenesis of disease in this host and in the clinical management and surveillance of these infections. PMID:26963381

  7. Genomic Analysis of Companion Rabbit Staphylococcus aureus.

    PubMed

    Holmes, Mark A; Harrison, Ewan M; Fisher, Elizabeth A; Graham, Elizabeth M; Parkhill, Julian; Foster, Geoffrey; Paterson, Gavin K

    2016-01-01

    In addition to being an important human pathogen, Staphylococcus aureus is able to cause a variety of infections in numerous other host species. While the S. aureus strains causing infection in several of these hosts have been well characterised, this is not the case for companion rabbits (Oryctolagus cuniculus), where little data are available on S. aureus strains from this host. To address this deficiency we have performed antimicrobial susceptibility testing and genome sequencing on a collection of S. aureus isolates from companion rabbits. The findings show a diverse S. aureus population is able to cause infection in this host, and while antimicrobial resistance was uncommon, the isolates possess a range of known and putative virulence factors consistent with a diverse clinical presentation in companion rabbits including severe abscesses. We additionally show that companion rabbit isolates carry polymorphisms within dltB as described as underlying host-adaption of S. aureus to farmed rabbits. The availability of S. aureus genome sequences from companion rabbits provides an important aid to understanding the pathogenesis of disease in this host and in the clinical management and surveillance of these infections.

  8. Savant Genome Browser 2: visualization and analysis for population-scale genomics.

    PubMed

    Fiume, Marc; Smith, Eric J M; Brook, Andrew; Strbenac, Dario; Turner, Brian; Mezlini, Aziz M; Robinson, Mark D; Wodak, Shoshana J; Brudno, Michael

    2012-07-01

    High-throughput sequencing (HTS) technologies are providing an unprecedented capacity for data generation, and there is a corresponding need for efficient data exploration and analysis capabilities. Although most existing tools for HTS data analysis are developed for either automated (e.g. genotyping) or visualization (e.g. genome browsing) purposes, such tools are most powerful when combined. For example, integration of visualization and computation allows users to iteratively refine their analyses by updating computational parameters within the visual framework in real-time. Here we introduce the second version of the Savant Genome Browser, a standalone program for visual and computational analysis of HTS data. Savant substantially improves upon its predecessor and existing tools by introducing innovative visualization modes and navigation interfaces for several genomic datatypes, and synergizing visual and automated analyses in a way that is powerful yet easy even for non-expert users. We also present a number of plugins that were developed by the Savant Community, which demonstrate the power of integrating visual and automated analyses using Savant. The Savant Genome Browser is freely available (open source) at www.savantbrowser.com.

  9. What’s in the genome of a filamentous fungus? Analysis of the Neurospora genome sequence

    PubMed Central

    Mannhaupt, Gertrud; Montrone, Corinna; Haase, Dirk; Mewes, H. Werner; Aign, Verena; Hoheisel, Jörg D.; Fartmann, Berthold; Nyakatura, Gerald; Kempken, Frank; Maier, Josef; Schulte, Ulrich

    2003-01-01

    The German Neurospora Genome Project has assembled sequences from ordered cosmid and BAC clones of linkage groups II and V of the genome of Neurospora crassa in 13 and 12 contigs, respectively. Including additional sequences located on other linkage groups a total of 12 Mb were subjected to a manual gene extraction and annotation process. The genome comprises a small number of repetitive elements, a low degree of segmental duplications and very few paralogous genes. The analysis of the 3218 identified open reading frames provides a first overview of the protein equipment of a filamentous fungus. Significantly, N.crassa possesses a large variety of metabolic enzymes including a substantial number of enzymes involved in the degradation of complex substrates as well as secondary metabolism. While several of these enzymes are specific for filamentous fungi many are shared exclusively with prokaryotes. PMID:12655011

  10. High-resolution analysis of polyprenols by supercritical fluid chromatography.

    PubMed

    Bamba, T; Fukasaki, W; Kajiyama, S; Ute, K; Kitayama, T; Kobayashi, A

    2001-03-01

    A high-resolution analysis of polyprenol mixtures was achieved by supercritical fluid chromatography (SFC). The separation of polyprenols was examined on an octadecylsilane-packed column with liquid carbon dioxide as the mobile phase and ethanol as modifier. Using this chromatography system, the resolution of separation (Rs) between octadecaprenol (prenol 18) and nonadecaprenol (prenol 19) was two times higher than that using conventional reversed-phase high-performance liquid chromatography. Our SFC technique allows the advantage of baseline separation of polyprenol samples containing hydrophobic components such as terpenes or fatty acids that are unfavorable for good separation. This method is very useful for the analysis of structurally close polyprenol analogues of rubber plant metabolites.

  11. Population-based 3D genome structure analysis reveals driving forces in spatial genome organization

    PubMed Central

    Li, Wenyuan; Kalhor, Reza; Dai, Chao; Hao, Shengli; Gong, Ke; Zhou, Yonggang; Li, Haochen; Zhou, Xianghong Jasmine; Le Gros, Mark A.; Larabell, Carolyn A.; Chen, Lin; Alber, Frank

    2016-01-01

    Conformation capture technologies (e.g., Hi-C) chart physical interactions between chromatin regions on a genome-wide scale. However, the structural variability of the genome between cells poses a great challenge to interpreting ensemble-averaged Hi-C data, particularly for long-range and interchromosomal interactions. Here, we present a probabilistic approach for deconvoluting Hi-C data into a model population of distinct diploid 3D genome structures, which facilitates the detection of chromatin interactions likely to co-occur in individual cells. Our approach incorporates the stochastic nature of chromosome conformations and allows a detailed analysis of alternative chromatin structure states. For example, we predict and experimentally confirm the presence of large centromere clusters with distinct chromosome compositions varying between individual cells. The stability of these clusters varies greatly with their chromosome identities. We show that these chromosome-specific clusters can play a key role in the overall chromosome positioning in the nucleus and stabilizing specific chromatin interactions. By explicitly considering genome structural variability, our population-based method provides an important tool for revealing novel insights into the key factors shaping the spatial genome organization. PMID:26951677

  12. The sea lamprey meiotic map improves resolution of ancient vertebrate genome duplications

    PubMed Central

    Smith, Jeramiah J.; Keinath, Melissa C.

    2015-01-01

    It is generally accepted that many genes present in vertebrate genomes owe their origin to two whole-genome duplications that occurred deep in the ancestry of the vertebrate lineage. However, details regarding the timing and outcome of these duplications are not well resolved. We present high-density meiotic and comparative genomic maps for the sea lamprey (Petromyzon marinus), a representative of an ancient lineage that diverged from all other vertebrates ∼550 million years ago. Linkage analyses yielded a total of 95 linkage groups, similar to the estimated number of germline chromosomes (1n ∼ 99), spanning a total of 5570.25 cM. Comparative mapping data yield strong support for the hypothesis that a single whole-genome duplication occurred in the basal vertebrate lineage, but do not strongly support a hypothetical second event. Rather, these comparative maps reveal several evolutionarily independent segmental duplications occurring over the last 600+ million years of chordate evolution. This refined history of vertebrate genome duplication should permit more precise investigations of vertebrate evolution. PMID:26048246

  13. Microfluidic device for bacterial genome extraction and analysis

    NASA Astrophysics Data System (ADS)

    Galajda, Peter; Riehn, Robert; Wang, Yan-Mei; Keymer, Juan; Golding, Ido; Cox, Edward C.; Austin, Robert H.

    2006-03-01

    Although single molecule DNA manipulation and analysis techniques are emerging, methods for whole genome extraction from single cells, genomic length DNA handling and analytics is still to be developed. Here we present a microfabricated device to address some of these needs. This microfluidic chip is suitable for culturing bacteria and subsequently retrieve their genetic content. As a next step, the extracted DNA can be introduced in a nanostructured segment of the chip for precise handling, stretching and analysis. We hope that similar microdevices can be useful in studying genetic aspects of the cell lifecycle in a variety of organisms.

  14. High Resolution Continuous Flow Analysis System for Polar Ice Cores

    NASA Astrophysics Data System (ADS)

    Dallmayr, Remi; Azuma, Kumiko; Yamada, Hironobu; Kjær, Helle Astrid; Vallelonga, Paul; Azuma, Nobuhiko; Takata, Morimasa

    2014-05-01

    In the last decades, Continuous Flow Analysis (CFA) technology for ice core analyses has been developed to reconstruct the past changes of the climate system 1), 2). Compared with traditional analyses of discrete samples, a CFA system offers much faster and higher depth resolution analyses. It also generates a decontaminated sample stream without time-consuming sample processing procedure by using the inner area of an ice-core sample.. The CFA system that we have been developing is currently able to continuously measure stable water isotopes 3) and electrolytic conductivity, as well as to collect discrete samples for the both inner and outer areas with variable depth resolutions. Chemistry analyses4) and methane-gas analysis 5) are planned to be added using the continuous water stream system 5). In order to optimize the resolution of the current system with minimal sample volumes necessary for different analyses, our CFA system typically melts an ice core at 1.6 cm/min. Instead of using a wire position encoder with typical 1mm positioning resolution 6), we decided to use a high-accuracy CCD Laser displacement sensor (LKG-G505, Keyence). At the 1.6 cm/min melt rate, the positioning resolution was increased to 0.27mm. Also, the mixing volume that occurs in our open split debubbler is regulated using its weight. The overflow pumping rate is smoothly PID controlled to maintain the weight as low as possible, while keeping a safety buffer of water to avoid air bubbles downstream. To evaluate the system's depth-resolution, we will present the preliminary data of electrolytic conductivity obtained by melting 12 bags of the North Greenland Eemian Ice Drilling (NEEM) ice core. The samples correspond to different climate intervals (Greenland Stadial 21, 22, Greenland Stadial 5, Greenland Interstadial 5, Greenland Interstadial 7, Greenland Stadial 8). We will present results for the Greenland Stadial -8, whose depths and ages are between 1723.7 and 1724.8 meters, and 35.520 to

  15. FVGWAS: Fast Voxelwise Genome Wide Association Analysis of Large-scale Imaging Genetic Data 1

    PubMed Central

    Huang, Meiyan; Nichols, Thomas; Huang, Chao; Yang, Yu; Lu, Zhaohua; Feng, Qianjing; Knickmeyer, Rebecca C; Zhu, Hongtu

    2015-01-01

    More and more large-scale imaging genetic studies are being widely conducted to collect a rich set of imaging, genetic, and clinical data to detect putative genes for complexly inherited neuropsychiatric and neurodegenerative disorders. Several major big-data challenges arise from testing genome-wide (NC > 12 million known variants) associations with signals at millions of locations (NV ~ 106) in the brain from thousands of subjects (n ~ 103). The aim of this paper is to develop a Fast Voxelwise Genome Wide Association analysiS (FVGWAS) framework to e ciently carry out whole-genome analyses of whole-brain data. FVGWAS consists of three components including a heteroscedastic linear model, a global sure independence screening (G-SIS) procedure, and a detection procedure based on wild bootstrap methods. Specifically, for standard linear association, the computational complexity is O(nNV NC) for voxelwise genome wide association analysis (VGWAS) method compared with O((NC + NV)n2) for FVGWAS. Simulation studies show that FVGWAS is an effcient method of searching sparse signals in an extremely large search space, while controlling for the family-wise error rate. Finally, we have successfully applied FVGWAS to a large-scale imaging genetic data analysis of ADNI data with 708 subjects, 193,275 voxels in RAVENS maps, and 501,584 SNPs, and the total processing time was 203,645 seconds for a single CPU. Our FVG-WAS may be a valuable statistical toolbox for large-scale imaging genetic analysis as the field is rapidly advancing with ultra-high-resolution imaging and whole-genome sequencing. PMID:26025292

  16. Coevolution analysis of Hepatitis C virus genome to identify the structural and functional dependency network of viral proteins

    PubMed Central

    Champeimont, Raphaël; Laine, Elodie; Hu, Shuang-Wei; Penin, Francois; Carbone, Alessandra

    2016-01-01

    A novel computational approach of coevolution analysis allowed us to reconstruct the protein-protein interaction network of the Hepatitis C Virus (HCV) at the residue resolution. For the first time, coevolution analysis of an entire viral genome was realized, based on a limited set of protein sequences with high sequence identity within genotypes. The identified coevolving residues constitute highly relevant predictions of protein-protein interactions for further experimental identification of HCV protein complexes. The method can be used to analyse other viral genomes and to predict the associated protein interaction networks. PMID:27198619

  17. Coevolution analysis of Hepatitis C virus genome to identify the structural and functional dependency network of viral proteins

    NASA Astrophysics Data System (ADS)

    Champeimont, Raphaël; Laine, Elodie; Hu, Shuang-Wei; Penin, Francois; Carbone, Alessandra

    2016-05-01

    A novel computational approach of coevolution analysis allowed us to reconstruct the protein-protein interaction network of the Hepatitis C Virus (HCV) at the residue resolution. For the first time, coevolution analysis of an entire viral genome was realized, based on a limited set of protein sequences with high sequence identity within genotypes. The identified coevolving residues constitute highly relevant predictions of protein-protein interactions for further experimental identification of HCV protein complexes. The method can be used to analyse other viral genomes and to predict the associated protein interaction networks.

  18. Digital microarray analysis for digital artifact genomics

    NASA Astrophysics Data System (ADS)

    Jaenisch, Holger; Handley, James; Williams, Deborah

    2013-06-01

    We implement a Spatial Voting (SV) based analogy of microarray analysis for digital gene marker identification in malware code sections. We examine a famous set of malware formally analyzed by Mandiant and code named Advanced Persistent Threat (APT1). APT1 is a Chinese organization formed with specific intent to infiltrate and exploit US resources. Manidant provided a detailed behavior and sting analysis report for the 288 malware samples available. We performed an independent analysis using a new alternative to the traditional dynamic analysis and static analysis we call Spatial Analysis (SA). We perform unsupervised SA on the APT1 originating malware code sections and report our findings. We also show the results of SA performed on some members of the families associated by Manidant. We conclude that SV based SA is a practical fast alternative to dynamics analysis and static analysis.

  19. Resolution and noise trade-off analysis for volumetric CT

    SciTech Connect

    Li Baojun; Avinash, Gopal B.; Hsieh, Jiang

    2007-10-15

    Until recently, most studies addressing the trade-off between spatial resolution and quantum noise were performed in the context of single-slice CT. In this study, we extend the theoretical framework of previous works to volumetric CT and further extend it by taking into account the actual shapes of the preferred reconstruction kernels. In the experimental study, we also attempt to explore a three-dimensional approach for spatial resolution measurement, as opposed to the conventional two-dimensional approaches that were widely adopted in previously published studies. By scanning a finite-sized sphere phantom, the MTF was measured from the edge profile along the spherical surface. Cases of different resolutions (and noise levels) were generated by adjusting the reconstruction kernel. To reduce bias, the total photon fluxes were matched: 120 kVp, 200 mA, and 1 s per gantry rotation. All data sets were reconstructed using a modified FDK algorithm under the same condition: Scan field-of-view (SFOV)=10 cm, and slice thickness=0.625 mm. The theoretical analysis indicated that the variance of noise is proportional to >4th power of the spatial resolution. Our experimental results supported this conclusion by showing the relationship is 4.6th (helical) or 5th (axial) power.

  20. Resolution and noise trade-off analysis for volumetric CT.

    PubMed

    Li, Baojun; Avinash, Gopal B; Hsieh, Jiang

    2007-10-01

    Until recently, most studies addressing the trade-off between spatial resolution and quantum noise were performed in the context of single-slice CT. In this study, we extend the theoretical framework of previous works to volumetric CT and further extend it by taking into account the actual shapes of the preferred reconstruction kernels. In the experimental study, we also attempt to explore a three-dimensional approach for spatial resolution measurement, as opposed to the conventional two-dimensional approaches that were widely adopted in previously published studies. By scanning a finite-sized sphere phantom, the MTF was measured from the edge profile along the spherical surface. Cases of different resolutions (and noise levels) were generated by adjusting the reconstruction kernel. To reduce bias, the total photon fluxes were matched: 120 kVp, 200 mA, and 1 s per gantry rotation. All data sets were reconstructed using a modified FDK algorithm under the same condition: Scan field-of-view (SFOV) = 10 cm, and slice thickness = 0.625 mm. The theoretical analysis indicated that the variance of noise is proportional to > 4th power of the spatial resolution. Our experimental results supported this conclusion by showing the relationship is 4.6th (helical) or 5th (axial) power.

  1. Base resolution methylome profiling: considerations in platform selection, data preprocessing and analysis

    PubMed Central

    Sun, Zhifu; Cunningham, Julie; Slager, Susan; Kocher, Jean-Pierre

    2015-01-01

    Bisulfite treatment-based methylation microarray (mainly Illumina 450K Infinium array) and next-generation sequencing (reduced representation bisulfite sequencing, Agilent SureSelect Human Methyl-Seq, NimbleGen SeqCap Epi CpGiant or whole-genome bisulfite sequencing) are commonly used for base resolution DNA methylome research. Although multiple tools and methods have been developed and used for the data preprocessing and analysis, confusions remains for these platforms including how and whether the 450k array should be normalized; which platform should be used to better fit researchers’ needs; and which statistical models would be more appropriate for differential methylation analysis. This review presents the commonly used platforms and compares the pros and cons of each in methylome profiling. We then discuss approaches to study design, data normalization, bias correction and model selection for differentially methylated individual CpGs and regions. PMID:26366945

  2. Base resolution methylome profiling: considerations in platform selection, data preprocessing and analysis.

    PubMed

    Sun, Zhifu; Cunningham, Julie; Slager, Susan; Kocher, Jean-Pierre

    2015-08-01

    Bisulfite treatment-based methylation microarray (mainly Illumina 450K Infinium array) and next-generation sequencing (reduced representation bisulfite sequencing, Agilent SureSelect Human Methyl-Seq, NimbleGen SeqCap Epi CpGiant or whole-genome bisulfite sequencing) are commonly used for base resolution DNA methylome research. Although multiple tools and methods have been developed and used for the data preprocessing and analysis, confusions remains for these platforms including how and whether the 450k array should be normalized; which platform should be used to better fit researchers' needs; and which statistical models would be more appropriate for differential methylation analysis. This review presents the commonly used platforms and compares the pros and cons of each in methylome profiling. We then discuss approaches to study design, data normalization, bias correction and model selection for differentially methylated individual CpGs and regions.

  3. MGcV: the microbial genomic context viewer for comparative genome analysis

    PubMed Central

    2013-01-01

    Background Conserved gene context is used in many types of comparative genome analyses. It is used to provide leads on gene function, to guide the discovery of regulatory sequences, but also to aid in the reconstruction of metabolic networks. We present the Microbial Genomic context Viewer (MGcV), an interactive, web-based application tailored to strengthen the practice of manual comparative genome context analysis for bacteria. Results MGcV is a versatile, easy-to-use tool that renders a visualization of the genomic context of any set of selected genes, genes within a phylogenetic tree, genomic segments, or regulatory elements. It is tailored to facilitate laborious tasks such as the interactive annotation of gene function, the discovery of regulatory elements, or the sequence-based reconstruction of gene regulatory networks. We illustrate that MGcV can be used in gene function annotation by visually integrating information on prokaryotic genes, like their annotation as available from NCBI with other annotation data such as Pfam domains, sub-cellular location predictions and gene-sequence characteristics such as GC content. We also illustrate the usefulness of the interactive features that allow the graphical selection of genes to facilitate data gathering (e.g. upstream regions, ID’s or annotation), in the analysis and reconstruction of transcription regulation. Moreover, putative regulatory elements and their corresponding scores or data from RNA-seq and microarray experiments can be uploaded, visualized and interpreted in (ranked-) comparative context maps. The ranked maps allow the interpretation of predicted regulatory elements and experimental data in light of each other. Conclusion MGcV advances the manual comparative analysis of genes and regulatory elements by providing fast and flexible integration of gene related data combined with straightforward data retrieval. MGcV is available at http://mgcv.cmbi.ru.nl. PMID:23547764

  4. Stacks: an analysis tool set for population genomics

    PubMed Central

    CATCHEN, JULIAN; HOHENLOHE, PAUL A.; BASSHAM, SUSAN; AMORES, ANGEL; CRESKO, WILLIAM A.

    2014-01-01

    Massively parallel short-read sequencing technologies, coupled with powerful software platforms, are enabling investigators to analyse tens of thousands of genetic markers. This wealth of data is rapidly expanding and allowing biological questions to be addressed with unprecedented scope and precision. The sizes of the data sets are now posing significant data processing and analysis challenges. Here we describe an extension of the Stacks software package to efficiently use genotype-by-sequencing data for studies of populations of organisms. Stacks now produces core population genomic summary statistics and SNP-by-SNP statistical tests. These statistics can be analysed across a reference genome using a smoothed sliding window. Stacks also now provides several output formats for several commonly used downstream analysis packages. The expanded population genomics functions in Stacks will make it a useful tool to harness the newest generation of massively parallel genotyping data for ecological and evolutionary genetics. PMID:23701397

  5. Probabilistic latent semantic analysis applied to whole bacterial genomes identifies common genomic features.

    PubMed

    Rusakovica, Julija; Hallinan, Jennifer; Wipat, Anil; Zuliani, Paolo

    2014-01-01

    The spread of drug resistance amongst clinically-important bacteria is a serious, and growing, problem [1]. However, the analysis of entire genomes requires considerable computational effort, usually including the assembly of the genome and subsequent identification of genes known to be important in pathology. An alternative approach is to use computational algorithms to identify genomic differences between pathogenic and non-pathogenic bacteria, even without knowing the biological meaning of those differences. To overcome this problem, a range of techniques for dimensionality reduction have been developed. One such approach is known as latent-variable models [2]. In latent-variable models dimensionality reduction is achieved by representing a high-dimensional data by a few hidden or latent variables, which are not directly observed but inferred from the observed variables present in the model. Probabilistic Latent Semantic Indexing (PLSA) is an extention of LSA [3]. PLSA is based on a mixture decomposition derived from a latent class model. The main objective of the algorithm, as in LSA, is to represent high-dimensional co-occurrence information in a lower-dimensional way in order to discover the hidden semantic structure of the data using a probabilistic framework. In this work we applied the PLSA approach to analyse the common genomic features in methicillin resistant Staphylococcus aureus, using tokens derived from amino acid sequences rather than DNA. We characterised genome-scale amino acid sequences in terms of their components, and then investigated the relationships between genomes and tokens and the phenotypes they generated. As a control we used the non-pathogenic model Gram-positive bacterium Bacillus subtilis. PMID:24980693

  6. Probabilistic latent semantic analysis applied to whole bacterial genomes identifies common genomic features.

    PubMed

    Rusakovica, Julija; Hallinan, Jennifer; Wipat, Anil; Zuliani, Paolo

    2014-06-30

    The spread of drug resistance amongst clinically-important bacteria is a serious, and growing, problem [1]. However, the analysis of entire genomes requires considerable computational effort, usually including the assembly of the genome and subsequent identification of genes known to be important in pathology. An alternative approach is to use computational algorithms to identify genomic differences between pathogenic and non-pathogenic bacteria, even without knowing the biological meaning of those differences. To overcome this problem, a range of techniques for dimensionality reduction have been developed. One such approach is known as latent-variable models [2]. In latent-variable models dimensionality reduction is achieved by representing a high-dimensional data by a few hidden or latent variables, which are not directly observed but inferred from the observed variables present in the model. Probabilistic Latent Semantic Indexing (PLSA) is an extention of LSA [3]. PLSA is based on a mixture decomposition derived from a latent class model. The main objective of the algorithm, as in LSA, is to represent high-dimensional co-occurrence information in a lower-dimensional way in order to discover the hidden semantic structure of the data using a probabilistic framework. In this work we applied the PLSA approach to analyse the common genomic features in methicillin resistant Staphylococcus aureus, using tokens derived from amino acid sequences rather than DNA. We characterised genome-scale amino acid sequences in terms of their components, and then investigated the relationships between genomes and tokens and the phenotypes they generated. As a control we used the non-pathogenic model Gram-positive bacterium Bacillus subtilis.

  7. Evolution Analysis of Simple Sequence Repeats in Plant Genome.

    PubMed

    Qin, Zhen; Wang, Yanping; Wang, Qingmei; Li, Aixian; Hou, Fuyun; Zhang, Liming

    2015-01-01

    Simple sequence repeats (SSRs) are widespread units on genome sequences, and play many important roles in plants. In order to reveal the evolution of plant genomes, we investigated the evolutionary regularities of SSRs during the evolution of plant species and the plant kingdom by analysis of twelve sequenced plant genome sequences. First, in the twelve studied plant genomes, the main SSRs were those which contain repeats of 1-3 nucleotides combination. Second, in mononucleotide SSRs, the A/T percentage gradually increased along with the evolution of plants (except for P. patens). With the increase of SSRs repeat number the percentage of A/T in C. reinhardtii had no significant change, while the percentage of A/T in terrestrial plants species gradually declined. Third, in dinucleotide SSRs, the percentage of AT/TA increased along with the evolution of plant kingdom and the repeat number increased in terrestrial plants species. This trend was more obvious in dicotyledon than monocotyledon. The percentage of CG/GC showed the opposite pattern to the AT/TA. Forth, in trinucleotide SSRs, the percentages of combinations including two or three A/T were in a rising trend along with the evolution of plant kingdom; meanwhile with the increase of SSRs repeat number in plants species, different species chose different combinations as dominant SSRs. SSRs in C. reinhardtii, P. patens, Z. mays and A. thaliana showed their specific patterns related to evolutionary position or specific changes of genome sequences. The results showed that, SSRs not only had the general pattern in the evolution of plant kingdom, but also were associated with the evolution of the specific genome sequence. The study of the evolutionary regularities of SSRs provided new insights for the analysis of the plant genome evolution.

  8. Evolution Analysis of Simple Sequence Repeats in Plant Genome

    PubMed Central

    Qin, Zhen; Wang, Yanping; Wang, Qingmei; Li, Aixian; Hou, Fuyun; Zhang, Liming

    2015-01-01

    Simple sequence repeats (SSRs) are widespread units on genome sequences, and play many important roles in plants. In order to reveal the evolution of plant genomes, we investigated the evolutionary regularities of SSRs during the evolution of plant species and the plant kingdom by analysis of twelve sequenced plant genome sequences. First, in the twelve studied plant genomes, the main SSRs were those which contain repeats of 1–3 nucleotides combination. Second, in mononucleotide SSRs, the A/T percentage gradually increased along with the evolution of plants (except for P. patens). With the increase of SSRs repeat number the percentage of A/T in C. reinhardtii had no significant change, while the percentage of A/T in terrestrial plants species gradually declined. Third, in dinucleotide SSRs, the percentage of AT/TA increased along with the evolution of plant kingdom and the repeat number increased in terrestrial plants species. This trend was more obvious in dicotyledon than monocotyledon. The percentage of CG/GC showed the opposite pattern to the AT/TA. Forth, in trinucleotide SSRs, the percentages of combinations including two or three A/T were in a rising trend along with the evolution of plant kingdom; meanwhile with the increase of SSRs repeat number in plants species, different species chose different combinations as dominant SSRs. SSRs in C. reinhardtii, P. patens, Z. mays and A. thaliana showed their specific patterns related to evolutionary position or specific changes of genome sequences. The results showed that, SSRs not only had the general pattern in the evolution of plant kingdom, but also were associated with the evolution of the specific genome sequence. The study of the evolutionary regularities of SSRs provided new insights for the analysis of the plant genome evolution. PMID:26630570

  9. Revealing misassembled segments in the bovine reference genome by high resolution linkage disequilibrium scan

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Misassembly signatures, created by shuffling the order of sequences while assembling a genome, can be easily seen by analyzing the unexpected behaviour of the linkage disequilibrium (LD) decay. A heuristic process was proposed to identify those misassembly signatures and presented the ones found in ...

  10. NGSmethDB: an updated genome resource for high quality, single-cytosine resolution methylomes

    PubMed Central

    Geisen, Stefanie; Barturen, Guillermo; Alganza, Ángel M.; Hackenberg, Michael; Oliver, José L.

    2014-01-01

    The updated release of ‘NGSmethDB’ (http://bioinfo2.ugr.es/NGSmethDB) is a repository for single-base whole-genome methylome maps for the best-assembled eukaryotic genomes. Short-read data sets from NGS bisulfite-sequencing projects of cell lines, fresh and pathological tissues are first pre-processed and aligned to the corresponding reference genome, and then the cytosine methylation levels are profiled. One major improvement is the application of a unique bioinformatics protocol to all data sets, thereby assuring the comparability of all values with each other. We implemented stringent quality controls to minimize important error sources, such as sequencing errors, bisulfite failures, clonal reads or single nucleotide variants (SNVs). This leads to reliable and high-quality methylomes, all obtained under uniform settings. Another significant improvement is the detection in parallel of SNVs, which might be crucial for many downstream analyses (e.g. SNVs and differential-methylation relationships). A next-generation methylation browser allows fast and smooth scrolling and zooming, thus speeding data download/upload, at the same time requiring fewer server resources. Several data mining tools allow the comparison/retrieval of methylation levels in different tissues or genome regions. NGSmethDB methylomes are also available as native tracks through a UCSC hub, which allows comparison with a wide range of third-party annotations, in particular phenotype or disease annotations. PMID:24271385

  11. High-resolution genetic mapping of maize pan-genome sequence anchors

    Technology Transfer Automated Retrieval System (TEKTRAN)

    In addition to single-nucleotide polymorphisms, structural variation is abundant in many plant genomes. The structural variation across a species can be represented by a ‘pan-genome’, which is essential to fully understand the genetic control of phenotypes. However, the pan-genome’s complexity hinde...

  12. Computational analysis of high resolution unsteady airloads for rotor aeroacoustics

    NASA Technical Reports Server (NTRS)

    Quackenbush, Todd R.; Lam, C.-M. Gordon; Wachspress, Daniel A.; Bliss, Donald B.

    1994-01-01

    The study of helicopter aerodynamic loading for acoustics applications requires the application of efficient yet accurate simulations of the velocity field induced by the rotor's vortex wake. This report summarizes work to date on the development of such an analysis, which builds on the Constant Vorticity Contour (CVC) free wake model, previously implemented for the study of vibratory loading in the RotorCRAFT computer code. The present effort has focused on implementation of an airload reconstruction approach that computes high resolution airload solutions of rotor/rotor-wake interactions required for acoustics computations. Supplementary efforts on the development of improved vortex core modeling, unsteady aerodynamic effects, higher spatial resolution of rotor loading, and fast vortex wake implementations have substantially enhanced the capabilities of the resulting software, denoted RotorCRAFT/AA (AeroAcoustics). Results of validation calculations using recently acquired model rotor data show that by employing airload reconstruction it is possible to apply the CVC wake analysis with temporal and spatial resolution suitable for acoustics applications while reducing the computation time required by one to two orders of magnitude relative to that required by direct calculations. Promising correlation with this body of airload and noise data has been obtained for a variety of rotor configurations and operating conditions.

  13. Castor bean organelle genome sequencing and worldwide genetic diversity analysis.

    PubMed

    Rivarola, Maximo; Foster, Jeffrey T; Chan, Agnes P; Williams, Amber L; Rice, Danny W; Liu, Xinyue; Melake-Berhan, Admasu; Huot Creasy, Heather; Puiu, Daniela; Rosovitz, M J; Khouri, Hoda M; Beckstrom-Sternberg, Stephen M; Allan, Gerard J; Keim, Paul; Ravel, Jacques; Rabinowicz, Pablo D

    2011-01-01

    Castor bean is an important oil-producing plant in the Euphorbiaceae family. Its high-quality oil contains up to 90% of the unusual fatty acid ricinoleate, which has many industrial and medical applications. Castor bean seeds also contain ricin, a highly toxic Type 2 ribosome-inactivating protein, which has gained relevance in recent years due to biosafety concerns. In order to gain knowledge on global genetic diversity in castor bean and to ultimately help the development of breeding and forensic tools, we carried out an extensive chloroplast sequence diversity analysis. Taking advantage of the recently published genome sequence of castor bean, we assembled the chloroplast and mitochondrion genomes extracting selected reads from the available whole genome shotgun reads. Using the chloroplast reference genome we used the methylation filtration technique to readily obtain draft genome sequences of 7 geographically and genetically diverse castor bean accessions. These sequence data were used to identify single nucleotide polymorphism markers and phylogenetic analysis resulted in the identification of two major clades that were not apparent in previous population genetic studies using genetic markers derived from nuclear DNA. Two distinct sub-clades could be defined within each major clade and large-scale genotyping of castor bean populations worldwide confirmed previously observed low levels of genetic diversity and showed a broad geographic distribution of each sub-clade.

  14. Castor Bean Organelle Genome Sequencing and Worldwide Genetic Diversity Analysis

    PubMed Central

    Chan, Agnes P.; Williams, Amber L.; Rice, Danny W.; Liu, Xinyue; Melake-Berhan, Admasu; Huot Creasy, Heather; Puiu, Daniela; Rosovitz, M. J.; Khouri, Hoda M.; Beckstrom-Sternberg, Stephen M.; Allan, Gerard J.; Keim, Paul; Ravel, Jacques; Rabinowicz, Pablo D.

    2011-01-01

    Castor bean is an important oil-producing plant in the Euphorbiaceae family. Its high-quality oil contains up to 90% of the unusual fatty acid ricinoleate, which has many industrial and medical applications. Castor bean seeds also contain ricin, a highly toxic Type 2 ribosome-inactivating protein, which has gained relevance in recent years due to biosafety concerns. In order to gain knowledge on global genetic diversity in castor bean and to ultimately help the development of breeding and forensic tools, we carried out an extensive chloroplast sequence diversity analysis. Taking advantage of the recently published genome sequence of castor bean, we assembled the chloroplast and mitochondrion genomes extracting selected reads from the available whole genome shotgun reads. Using the chloroplast reference genome we used the methylation filtration technique to readily obtain draft genome sequences of 7 geographically and genetically diverse castor bean accessions. These sequence data were used to identify single nucleotide polymorphism markers and phylogenetic analysis resulted in the identification of two major clades that were not apparent in previous population genetic studies using genetic markers derived from nuclear DNA. Two distinct sub-clades could be defined within each major clade and large-scale genotyping of castor bean populations worldwide confirmed previously observed low levels of genetic diversity and showed a broad geographic distribution of each sub-clade. PMID:21750729

  15. Universal multifractal analysis of high-resolution snowfall data

    NASA Astrophysics Data System (ADS)

    Raupach, Timothy; Gires, Auguste; Tchiguirinskaia, Ioulia; Schertzer, Daniel; Berne, Alexis

    2016-04-01

    Universal multifractal analysis offers useful insights into the scaling properties of precipitation data. While much work has been done on the scaling properties of rainfall fields, less is known about the scaling properties of solid precipitation such as snowfall, especially at high resolution. We present results of a universal multifractal (UM) analysis of high-resolution solid precipitation data. The data were recorded using a 2D-video-disdrometer (2DVD) situated in the Swiss Alps. Analysis was performed on a one-hour period of snowfall, during which time the mean wind speed was zero, temperatures were low, and no hail was detected. The 2DVD recorded information on individual particles, from which we calculated snow mass. Three "cuts" of the spatio-temporal snowfall process were analysed using the UM framework. First, high-resolution timeseries of precipitation intensity at 100 ms temporal resolution were analysed. These results show two scaling regimes with a transition area between them. Second, we analysed reconstructed vertical columns of particle concentration and snow mass, assuming no horizontal wind and constant vertical velocity (equal to the one recorded on the ground). Strong scaling was observed in the particle concentration fields, with the influence of large (and therefore rare) snowflakes degrading the quality of the scaling observed for higher moments of the particle distribution. There was a clear difference between the measured fields and fields in which the vertical distribution of particles was made homogeneous, indicating that the measured snowfall fields contained non-homogeneous fields. Scaling behaviour was observed down to vertical scales of about 0.5 m, which is similar to published results using rain data. Finally, we used the UM framework to investigate the scaling properties of 2D maps of snow accumulation over a subset of the instrument collection area of 5.12 x 5.12 cm^2. As expected from the vertical column analysis, given that

  16. High-Resolution Genome Screen for Bone Mineral Density in Heterogeneous Stock Rat

    PubMed Central

    Alam, Imranul; Koller, Daniel L.; Cañete, Toni; Blázquez, Gloria; López-Aumatell, Regina; Martínez-Membrives, Esther; Díaz-Morán, Sira; Tobeña, Adolf; Fernández-Teruel, Alberto; Stridh, Pernilla; Diez, Margarita; Olsson, Tomas; Johannesson, Martina; Baud, Amelie; Econs, Michael J.; Foroud, Tatiana

    2014-01-01

    We previously demonstrated that skeletal mass, structure and biomechanical properties vary considerably in heterogeneous stock (HS) rat strains. In addition, we observed strong heritability for several of these skeletal phenotypes in the HS rat model, suggesting that it represents a unique genetic resource for dissecting the complex genetics underlying bone fragility. The purpose of this study was to identify and localize genes associated with bone mineral density in HS rats. We measured bone phenotypes from 1524 adult male and female HS rats between 17 to 20 weeks of age. Phenotypes included DXA measurements for bone mineral content and areal bone mineral density for femur and lumbar spine (L3-5), and volumetric BMD measurements by CT for the midshaft and distal femur, femur neck and 5th lumbar vertebra. A total of 70,000 polymorphic SNPs distributed throughout the genome were selected from genotypes obtained from the Affymetrix rat custom SNPs array for the HS rat population. These SNPs spanned the HS rat genome with a mean linkage disequilibrium coefficient between neighboring SNPs of 0.95. Haplotypes were estimated across the entire genome for each rat using a multipoint haplotype reconstruction method, which calculates the probability of descent for each genotyped locus from each of the 8 founder HS strains. The haplotypes were tested for association with each bone density phenotype via a mixed model with covariate adjustment. We identified quantitative trait loci (QTLs) for bone mineral density phenotypes on chromosomes 2, 9, 10 and 13 meeting a conservative genome-wide empiric significance threshold (FDR=5%; P<3 × 10−6). Importantly, most QTLs were localized to very small genomic regions (1-3 Mb), allowing us to identify a narrow set of potential candidate genes including both novel genes and genes previously shown to have roles in skeletal development and homeostasis. PMID:24643965

  17. From amplification to gene in thyroid cancer: A high-resolution mapped bacterial-artificial-chromosome resource for cancer chromosome aberrations guides gene discovery after comparative genome hybridization

    SciTech Connect

    Chen, X.N.; Gonsky, R.; Korenberg, J.R.; Knauf, J.A.; Fagin, J.A.; Chissoe, S.

    1998-08-01

    Chromosome rearrangements associated with neoplasms provide a rich resource for definition of the pathways of tumorigenesis. The power of comparative genome hybridization (CGH) to identify novel genes depends on the existence of suitable markers, which are lacking throughout most of the genome. The authors now report a general approach that translates CGH data into higher-resolution genomic-clone data that are then used to define the genes located in aneuploid regions. They used CGH to study 33 thyroid-tumor DNAs and two tumor-cell-line DNAs. The results revealed amplifications of chromosome band 2p21, with less-intense amplification on 2p13, 19q13.1, and 1p36 and with least-intense amplification on 1p34, 1q42, 5q31, 5q33-34, 9q32-34, and 14q32. To define the 2p21 region amplified, a dense array of 373 FISH-mapped chromosome 2 bacterial artificial chromosomes (BACs) was constructed, and 87 of these were hybridized to a tumor-cell line. Four BACs carried genomic DNA that was amplified in these cells. The maximum amplified region was narrowed to 3--6 Mb by multicolor FISH with the flanking BACs, and the minimum amplicon size was defined by a contig of 420 kb. Sequence analysis of the amplified BAC 1D9 revealed a fragment of the gene, encoding protein kinase C epsilon (PKC{epsilon}), that was then shown to be amplified and rearranged in tumor cells. In summary, CGH combined with a dense mapped resource of BACs and large-scale sequencing has led directly to the definition of PKC{epsilon} as a previously unmapped candidate gene involved in thyroid tumorigenesis.

  18. Genome-wide high-resolution mapping of UV-induced mitotic recombination events in Saccharomyces cerevisiae.

    PubMed

    Yin, Yi; Petes, Thomas D

    2013-10-01

    In the yeast Saccharomyces cerevisiae and most other eukaryotes, mitotic recombination is important for the repair of double-stranded DNA breaks (DSBs). Mitotic recombination between homologous chromosomes can result in loss of heterozygosity (LOH). In this study, LOH events induced by ultraviolet (UV) light are mapped throughout the genome to a resolution of about 1 kb using single-nucleotide polymorphism (SNP) microarrays. UV doses that have little effect on the viability of diploid cells stimulate crossovers more than 1000-fold in wild-type cells. In addition, UV stimulates recombination in G1-synchronized cells about 10-fold more efficiently than in G2-synchronized cells. Importantly, at high doses of UV, most conversion events reflect the repair of two sister chromatids that are broken at approximately the same position whereas at low doses, most conversion events reflect the repair of a single broken chromatid. Genome-wide mapping of about 380 unselected crossovers, break-induced replication (BIR) events, and gene conversions shows that UV-induced recombination events occur throughout the genome without pronounced hotspots, although the ribosomal RNA gene cluster has a significantly lower frequency of crossovers.

  19. High-resolution profiling of γH2AX around DNA double strand breaks in the mammalian genome

    PubMed Central

    Iacovoni, Jason S; Caron, Pierre; Lassadi, Imen; Nicolas, Estelle; Massip, Laurent; Trouche, Didier; Legube, Gaëlle

    2010-01-01

    Chromatin acts as a key regulator of DNA-related processes such as DNA damage repair. Although ChIP-chip is a powerful technique to provide high-resolution maps of protein–genome interactions, its use to study DNA double strand break (DSB) repair has been hindered by the limitations of the available damage induction methods. We have developed a human cell line that permits induction of multiple DSBs randomly distributed and unambiguously positioned within the genome. Using this system, we have generated the first genome-wide mapping of γH2AX around DSBs. We found that all DSBs trigger large γH2AX domains, which spread out from the DSB in a bidirectional, discontinuous and not necessarily symmetrical manner. The distribution of γH2AX within domains is influenced by gene transcription, as parallel mappings of RNA Polymerase II and strand-specific expression showed that γH2AX does not propagate on active genes. In addition, we showed that transcription is accurately maintained within γH2AX domains, indicating that mechanisms may exist to protect gene transcription from γH2AX spreading and from the chromatin rearrangements induced by DSBs. PMID:20360682

  20. Thyroid insufficiency in developing rat brain: A genomic analysis.

    EPA Science Inventory

    Thyroid Insufficiency in the Developing Rat Brain: A Genomic Analysis. JE Royland and ME Gilbert, Neurotox. Div., U.S. EPA, RTP, NC, USA. Endocrine disruption (ED) is an area of major concern in environmental neurotoxicity. Severe deficits in thyroid hormone (TH) levels have bee...

  1. GENOMIC ANALYSIS OF THE TESTICULAR TOXICITY OF HALOACETIC ACIDS

    EPA Science Inventory

    Genomic analysis of the testicular toxicity of haloacetic acids

    David J. Dix and John C. Rockett
    Reproductive Toxicology Division, National Health and Environmental Effects Research Laboratory, Office of Research and Development, U.S. Environmental Protection Agency, R...

  2. Genomic Analysis at the Single-Cell Level

    PubMed Central

    Kalisky, Tomer; Blainey, Paul; Quake, Stephen R.

    2013-01-01

    Studying complex biological systems such as a developing embryo, a tumor, or a microbial ecosystem often involves understanding the behavior and heterogeneity of the individual cells that constitute the system and their interactions. In this review, we discuss a variety of approaches to single-cell genomic analysis. PMID:21942365

  3. Integrated translational genomics for analysis of complex traits in sorghum

    Technology Transfer Automated Retrieval System (TEKTRAN)

    We will report on the integration of sequencing and genotype data from natural variation (by whole genome resequencing [wgs] or genotype by sequencing [gbs]), transcriptome (RNA-seq) and mutant analysis (also by wgs) with the goal of identifying genes controlling important agronomic traits and tran...

  4. Detecting Genomic Signatures of Natural Selection with Principal Component Analysis: Application to the 1000 Genomes Data

    PubMed Central

    Duforet-Frebourg, Nicolas; Luu, Keurcien; Laval, Guillaume; Bazin, Eric; Blum, Michael G.B.

    2016-01-01

    To characterize natural selection, various analytical methods for detecting candidate genomic regions have been developed. We propose to perform genome-wide scans of natural selection using principal component analysis (PCA). We show that the common FST index of genetic differentiation between populations can be viewed as the proportion of variance explained by the principal components. Considering the correlations between genetic variants and each principal component provides a conceptual framework to detect genetic variants involved in local adaptation without any prior definition of populations. To validate the PCA-based approach, we consider the 1000 Genomes data (phase 1) considering 850 individuals coming from Africa, Asia, and Europe. The number of genetic variants is of the order of 36 millions obtained with a low-coverage sequencing depth (3×). The correlations between genetic variation and each principal component provide well-known targets for positive selection (EDAR, SLC24A5, SLC45A2, DARC), and also new candidate genes (APPBPP2, TP1A1, RTTN, KCNMA, MYO5C) and noncoding RNAs. In addition to identifying genes involved in biological adaptation, we identify two biological pathways involved in polygenic adaptation that are related to the innate immune system (beta defensins) and to lipid metabolism (fatty acid omega oxidation). An additional analysis of European data shows that a genome scan based on PCA retrieves classical examples of local adaptation even when there are no well-defined populations. PCA-based statistics, implemented in the PCAdapt R package and the PCAdapt fast open-source software, retrieve well-known signals of human adaptation, which is encouraging for future whole-genome sequencing project, especially when defining populations is difficult. PMID:26715629

  5. Sequencing and annotated analysis of an Estonian human genome.

    PubMed

    Lilleoja, Rutt; Sarapik, Aili; Reimann, Ene; Reemann, Paula; Jaakma, Ülle; Vasar, Eero; Kõks, Sulev

    2012-02-01

    In present study we describe the sequencing and annotated analysis of the individual genome of Estonian. Using SOLID technology we generated 2,449,441,916 of 50-bp reads. The Bioscope version 1.3 was used for mapping and pairing of reads to the NCBI human genome reference (build 36, hg18). Bioscope enables also the annotation of the results of variant (tertiary) analysis. The average mapping of reads was 75.5% with total coverage of 107.72 Gb. resulting in mean fold coverage of 34.6. We found 3,482,975 SNPs out of which 352,492 were novel. 21,222 SNPs were in coding region: 10,649 were synonymous SNPs, 10,360 were nonsynonymous missense SNPs, 155 were nonsynonymous nonsense SNPs and 58 were nonsynonymous frameshifts. We identified 219 CNVs with total base pair coverage of 37,326,300 bp and 87,451 large insertion/deletion polymorphisms covering 10,152,256 bp of the genome. In addition, we found 285,864 small size insertion/deletion polymorphisms out of which 133,969 were novel. Finally, we identified 53 inversions, 19 overlapped genes and 2 overlapped exons. Interestingly, we found the region in chromosome 6 to be enriched with the coding SNPs and CNVs. This study confirms previous findings, that our genomes are more complex and variable as thought before. Therefore, sequencing of the personal genomes followed by annotation would improve the analysis of heritability of phenotypes and our understandings on the functions of genome.

  6. The Cancer Genome Atlas Pan-Cancer analysis project.

    PubMed

    Weinstein, John N; Collisson, Eric A; Mills, Gordon B; Shaw, Kenna R Mills; Ozenberger, Brad A; Ellrott, Kyle; Shmulevich, Ilya; Sander, Chris; Stuart, Joshua M

    2013-10-01

    The Cancer Genome Atlas (TCGA) Research Network has profiled and analyzed large numbers of human tumors to discover molecular aberrations at the DNA, RNA, protein and epigenetic levels. The resulting rich data provide a major opportunity to develop an integrated picture of commonalities, differences and emergent themes across tumor lineages. The Pan-Cancer initiative compares the first 12 tumor types profiled by TCGA. Analysis of the molecular aberrations and their functional roles across tumor types will teach us how to extend therapies effective in one cancer type to others with a similar genomic profile.

  7. BEDTools: the Swiss-army tool for genome feature analysis

    PubMed Central

    Quinlan, Aaron R.

    2014-01-01

    Technological advances have enabled the use of DNA sequencing as a flexible tool to characterize genetic variation and to measure the activity of diverse cellular phenomena such as gene isoform expression and transcription factor binding. Extracting biological insight from the experiments enabled by these advances demands the analysis of large, multi-dimensional datasets. This unit describes the use of the BEDTools toolkit for the exploration of high-throughput genomics datasets. I present several protocols for common genomic analyses and demonstrate how simple BEDTools operations may be combined to create bespoke pipelines addressing complex questions. PMID:25199790

  8. The Cancer Genome Atlas Pan-Cancer analysis project.

    PubMed

    Weinstein, John N; Collisson, Eric A; Mills, Gordon B; Shaw, Kenna R Mills; Ozenberger, Brad A; Ellrott, Kyle; Shmulevich, Ilya; Sander, Chris; Stuart, Joshua M

    2013-10-01

    The Cancer Genome Atlas (TCGA) Research Network has profiled and analyzed large numbers of human tumors to discover molecular aberrations at the DNA, RNA, protein and epigenetic levels. The resulting rich data provide a major opportunity to develop an integrated picture of commonalities, differences and emergent themes across tumor lineages. The Pan-Cancer initiative compares the first 12 tumor types profiled by TCGA. Analysis of the molecular aberrations and their functional roles across tumor types will teach us how to extend therapies effective in one cancer type to others with a similar genomic profile. PMID:24071849

  9. Genomic-Wide Analysis with Microarrays in Human Oncology

    PubMed Central

    Inaoka, Kenichi; Inokawa, Yoshikuni; Nomoto, Shuji

    2015-01-01

    DNA microarray technologies have advanced rapidly and had a profound impact on examining gene expression on a genomic scale in research. This review discusses the history and development of microarray and DNA chip devices, and specific microarrays are described along with their methods and applications. In particular, microarrays have detected many novel cancer-related genes by comparing cancer tissues and non-cancerous tissues in oncological research. Recently, new methods have been in development, such as the double-combination array and triple-combination array, which allow more effective analysis of gene expression and epigenetic changes. Analysis of gene expression alterations in precancerous regions compared with normal regions and array analysis in drug-resistance cancer tissues are also successfully performed. Compared with next-generation sequencing, a similar method of genome analysis, several important differences distinguish these techniques and their applications. Development of novel microarray technologies is expected to contribute to further cancer research.

  10. Sequencing and Analysis of Neanderthal Genomic DNA

    SciTech Connect

    Noonan, James P.; Coop, Graham; Kudaravalli, Sridhar; Smith,Doug; Krause, Johannes; Alessi, Joe; Chen, Feng; Platt, Darren; Paabo,Svante; Pritchard, Jonathan K.; Rubin, Edward M.

    2006-06-13

    Recovery and analysis of multiple Neanderthal autosomalsequences using a metagenomic approach reveals that modern humans andNeanderthals split ~;400,000 years ago, without significant evidence ofsubsequent admixture.

  11. Advances in computer simulation of genome evolution: toward more realistic evolutionary genomics analysis by approximate bayesian computation.

    PubMed

    Arenas, Miguel

    2015-04-01

    NGS technologies present a fast and cheap generation of genomic data. Nevertheless, ancestral genome inference is not so straightforward due to complex evolutionary processes acting on this material such as inversions, translocations, and other genome rearrangements that, in addition to their implicit complexity, can co-occur and confound ancestral inferences. Recently, models of genome evolution that accommodate such complex genomic events are emerging. This letter explores these novel evolutionary models and proposes their incorporation into robust statistical approaches based on computer simulations, such as approximate Bayesian computation, that may produce a more realistic evolutionary analysis of genomic data. Advantages and pitfalls in using these analytical methods are discussed. Potential applications of these ancestral genomic inferences are also pointed out.

  12. Comparative Analysis of Genome Diversity in Bullmastiff Dogs.

    PubMed

    Mortlock, Sally-Anne; Khatkar, Mehar S; Williamson, Peter

    2016-01-01

    Management and preservation of genomic diversity in dog breeds is a major objective for maintaining health. The present study was undertaken to characterise genomic diversity in Bullmastiff dogs using both genealogical and molecular analysis. Genealogical analysis of diversity was conducted using a database consisting of 16,378 Bullmastiff pedigrees from year 1980 to 2013. Additionally, a total of 188 Bullmastiff dogs were genotyped using the 170,000 SNP Illumina CanineHD Beadchip. Genealogical parameters revealed a mean inbreeding coefficient of 0.047; 142 total founders (f); an effective number of founders (fe) of 79; an effective number of ancestors (fa) of 62; and an effective population size of the reference population of 41. Genetic diversity and the degree of genome-wide homogeneity within the breed were also investigated using molecular data. Multiple-locus heterozygosity (MLH) was equal to 0.206; runs of homozygosity (ROH) as proportion of the genome, averaged 16.44%; effective population size was 29.1, with an average inbreeding coefficient of 0.035, all estimated using SNP Data. Fine-scale population structure was analysed using NETVIEW, a population analysis pipeline. Visualisation of the high definition network captured relationships among individuals within and between subpopulations. Effects of unequal founder use, and ancestral inbreeding and selection, were evident. While current levels of Bullmastiff heterozygosity, inbreeding and homozygosity are not unusual, a relatively small effective population size indicates that a breeding strategy to reduce the inbreeding rate may be beneficial. PMID:26824579

  13. Comparative Analysis of Genome Diversity in Bullmastiff Dogs

    PubMed Central

    Mortlock, Sally-Anne; Khatkar, Mehar S.; Williamson, Peter

    2016-01-01

    Management and preservation of genomic diversity in dog breeds is a major objective for maintaining health. The present study was undertaken to characterise genomic diversity in Bullmastiff dogs using both genealogical and molecular analysis. Genealogical analysis of diversity was conducted using a database consisting of 16,378 Bullmastiff pedigrees from year 1980 to 2013. Additionally, a total of 188 Bullmastiff dogs were genotyped using the 170,000 SNP Illumina CanineHD Beadchip. Genealogical parameters revealed a mean inbreeding coefficient of 0.047; 142 total founders (f); an effective number of founders (fe) of 79; an effective number of ancestors (fa) of 62; and an effective population size of the reference population of 41. Genetic diversity and the degree of genome-wide homogeneity within the breed were also investigated using molecular data. Multiple-locus heterozygosity (MLH) was equal to 0.206; runs of homozygosity (ROH) as proportion of the genome, averaged 16.44%; effective population size was 29.1, with an average inbreeding coefficient of 0.035, all estimated using SNP Data. Fine-scale population structure was analysed using NETVIEW, a population analysis pipeline. Visualisation of the high definition network captured relationships among individuals within and between subpopulations. Effects of unequal founder use, and ancestral inbreeding and selection, were evident. While current levels of Bullmastiff heterozygosity, inbreeding and homozygosity are not unusual, a relatively small effective population size indicates that a breeding strategy to reduce the inbreeding rate may be beneficial. PMID:26824579

  14. Phylogeny and comparative genome analysis of a Basidiomycete fungi

    SciTech Connect

    Riley, Robert W.; Salamov, Asaf; Grigoriev, Igor; Hibbett, David

    2011-03-14

    Fungi of the phylum Basidiomycota, make up some 37percent of the described fungi, and are important from the perspectives of forestry, agriculture, medicine, and bioenergy. This diverse phylum includes the mushrooms, wood rots, plant pathogenic rusts and smuts, and some human pathogens. To better understand these important fungi, we have undertaken a comparative genomic analysis of the Basidiomycetes with available sequenced genomes. We report a phylogeny that sheds light on previously unclear evolutionary relationships among the Basidiomycetes. We also define a `core proteome? based on protein families conserved in all Basidiomycetes. We identify key expansions and contractions in protein families that may be responsible for the degradation of plant biomass such as cellulose, hemicellulose, and lignin. Finally, we speculate as to the genomic changes that drove such expansions and contractions.

  15. Genome analysis of the platypus reveals unique signatures of evolution

    PubMed Central

    Warren, Wesley C.; Hillier, LaDeana W.; Marshall Graves, Jennifer A.; Birney, Ewan; Ponting, Chris P.; Grützner, Frank; Belov, Katherine; Miller, Webb; Clarke, Laura; Chinwalla, Asif T.; Yang, Shiaw-Pyng; Heger, Andreas; Locke, Devin P.; Miethke, Pat; Waters, Paul D.; Veyrunes, Frédéric; Fulton, Lucinda; Fulton, Bob; Graves, Tina; Wallis, John; Puente, Xose S.; López-Otín, Carlos; Ordóñez, Gonzalo R.; Eichler, Evan E.; Chen, Lin; Cheng, Ze; Deakin, Janine E.; Alsop, Amber; Thompson, Katherine; Kirby, Patrick; Papenfuss, Anthony T.; Wakefield, Matthew J.; Olender, Tsviya; Lancet, Doron; Huttley, Gavin A.; Smit, Arian F. A.; Pask, Andrew; Temple-Smith, Peter; Batzer, Mark A.; Walker, Jerilyn A.; Konkel, Miriam K.; Harris, Robert S.; Whittington, Camilla M.; Wong, Emily S. W.; Gemmell, Neil J.; Buschiazzo, Emmanuel; Vargas Jentzsch, Iris M.; Merkel, Angelika; Schmitz, Juergen; Zemann, Anja; Churakov, Gennady; Kriegs, Jan Ole; Brosius, Juergen; Murchison, Elizabeth P.; Sachidanandam, Ravi; Smith, Carly; Hannon, Gregory J.; Tsend-Ayush, Enkhjargal; McMillan, Daniel; Attenborough, Rosalind; Rens, Willem; Ferguson-Smith, Malcolm; Lefèvre, Christophe M.; Sharp, Julie A.; Nicholas, Kevin R.; Ray, David A.; Kube, Michael; Reinhardt, Richard; Pringle, Thomas H.; Taylor, James; Jones, Russell C.; Nixon, Brett; Dacheux, Jean-Louis; Niwa, Hitoshi; Sekita, Yoko; Huang, Xiaoqiu; Stark, Alexander; Kheradpour, Pouya; Kellis, Manolis; Flicek, Paul; Chen, Yuan; Webber, Caleb; Hardison, Ross; Nelson, Joanne; Hallsworth-Pepin, Kym; Delehaunty, Kim; Markovic, Chris; Minx, Pat; Feng, Yucheng; Kremitzki, Colin; Mitreva, Makedonka; Glasscock, Jarret; Wylie, Todd; Wohldmann, Patricia; Thiru, Prathapan; Nhan, Michael N.; Pohl, Craig S.; Smith, Scott M.; Hou, Shunfeng; Renfree, Marilyn B.; Mardis, Elaine R.; Wilson, Richard K.

    2009-01-01

    We present a draft genome sequence of the platypus, Ornithorhynchus anatinus. This monotreme exhibits a fascinating combination of reptilian and mammalian characters. For example, platypuses have a coat of fur adapted to an aquatic lifestyle; platypus females lactate, yet lay eggs; and males are equipped with venom similar to that of reptiles. Analysis of the first monotreme genome aligned these features with genetic innovations. We find that reptile and platypus venom proteins have been co-opted independently from the same gene families; milk protein genes are conserved despite platypuses laying eggs; and immune gene family expansions are directly related to platypus biology. Expansions of protein, non-protein-coding RNA and microRNA families, as well as repeat elements, are identified. Sequencing of this genome now provides a valuable resource for deep mammalian comparative analyses, as well as for monotreme biology and conservation. PMID:18464734

  16. Whole-genome CNV analysis: advances in computational approaches

    PubMed Central

    Pirooznia, Mehdi; Goes, Fernando S.; Zandi, Peter P.

    2015-01-01

    Accumulating evidence indicates that DNA copy number variation (CNV) is likely to make a significant contribution to human diversity and also play an important role in disease susceptibility. Recent advances in genome sequencing technologies have enabled the characterization of a variety of genomic features, including CNVs. This has led to the development of several bioinformatics approaches to detect CNVs from next-generation sequencing data. Here, we review recent advances in CNV detection from whole genome sequencing. We discuss the informatics approaches and current computational tools that have been developed as well as their strengths and limitations. This review will assist researchers and analysts in choosing the most suitable tools for CNV analysis as well as provide suggestions for new directions in future development. PMID:25918519

  17. Genome-wide proximal promoter analysis and interpretation.

    PubMed

    Guruceaga, Elizabeth; Segura, Victor; Corrales, Fernando J; Rubio, Angel

    2010-01-01

    High-throughput gene expression technologies based on DNA microarrays allow the examination of biological systems. However, the interpretation of the complex molecular descriptions generated by these approaches is still challenging. The development of new methodologies to identify common regulatory mechanisms involved in the control of the expression of a set of co-expressed genes might enhance our capacity to extract functional information from genomic data sets. In this chapter, we describe a method that integrates different sources of information: gene expression data, genome sequence information, described transcription factor binding sites (TFBSs), functional information, and bibliographic data. The starting point of the analysis is the extraction of promoter sequences from a whole genome and the detection of TFBSs in each gene promoter. This information allows the identification of enriched TFBSs in the proximal promoter of differentially expressed genes. The functional and bibliographic interpretation of the results improves our biological insight into the regulatory mechanisms involved in a microarray experiment. PMID:19957149

  18. Development and Characterization of Simple Sequence Repeat Markers Providing Genome-Wide Coverage and High Resolution in Maize

    PubMed Central

    Xu, Jie; Liu, Ling; Xu, Yunbi; Chen, Churun; Rong, Tingzhao; Ali, Farhan; Zhou, Shufeng; Wu, Fengkai; Liu, Yaxi; Wang, Jing; Cao, Moju; Lu, Yanli

    2013-01-01

    Simple sequence repeats (SSRs) have been widely used in maize genetics and breeding, because they are co-dominant, easy to score, and highly abundant. In this study, we used whole-genome sequences from 16 maize inbreds and 1 wild relative to determine SSR abundance and to develop a set of high-density polymorphic SSR markers. A total of 264 658 SSRs were identified across the 17 genomes, with an average of 135 693 SSRs per genome. Marker density was one SSR every of 15.48 kb. (C/G)n, (AT)n, (CAG/CTG)n, and (AAAT/ATTT)n were the most frequent motifs for mono, di-, tri-, and tetra-nucleotide SSRs, respectively. SSRs were most abundant in intergenic region and least frequent in untranslated regions, as revealed by comparing SSR distributions of three representative resequenced genomes. Comparing SSR sequences and e-polymerase chain reaction analysis among the 17 tested genomes created a new database, including 111 887 SSRs, that could be develop as polymorphic markers in silico. Among these markers, 58.00, 26.09, 7.20, 3.00, 3.93, and 1.78% of them had mono, di-, tri-, tetra-, penta-, and hexa-nucleotide motifs, respectively. Polymorphic information content for 35 573 polymorphic SSRs out of 111 887 loci varied from 0.05 to 0.83, with an average of 0.31 in the 17 tested genomes. Experimental validation of polymorphic SSR markers showed that over 70% of the primer pairs could generate the target bands with length polymorphism, and these markers would be very powerful when they are used for genetic populations derived from various types of maize germplasms that were sampled for this study. PMID:23804557

  19. Increasing microscopy resolution with photobleaching and intensity cumulant analysis.

    PubMed

    Brutkowski, Wojtek; Dziob, Daniel; Bernas, Tytus

    2015-11-01

    Super-resolution fluorescence microscopy and its applications for analysis of biological structures are evolving rapidly field. A number of approaches aimed at overcoming the fundamental limit imposed by diffraction have been proposed in recent years. Here we present a modification of super-resolution optical fluctuation imaging (SOFI), a technique based on spatio-temporal evaluation of the optical signal from independently fluctuating emitters. Instead of rapid, reversible photoswitching, photobleaching is used to produce irreversible transitions between emitting and nonemitting states of the fluorochrome molecules. Simulated images are used to demonstrate that, in the absence of noise, the proposed SOFI modification increases the efficiency of transfer of high spatial frequencies in a fluorescence microscope. Correspondingly, a decrease of the point spread function (PSF) width is obtained. Moreover, the modified SOFI algorithm is capable of resolving point emitters in the presence of simulated noise. Using real biological images we demonstrate that an increase of resolution is obtained in 2D optical sections through densely packed chromatin in cell nuclei and lamin layer at the nuclear envelope. Finally, the approach is extended to 3D wide-field microscopy, allowing reduction of out-of-focus image blurring.

  20. Genome-Assisted Analysis of Dissimilatory Metal-Reducing Bacteria

    SciTech Connect

    Fredrickson, Jim K.; Romine, Margaret F.

    2005-06-01

    Whole genome sequence for Shewanella oneidensis and Geobacter sulfurreducens has provided numerous new biological insights into the function of these model dissimilatory metal-reducing bacteria. Many of the discoveries, including the identification of a high number of c-type cytochromes in both organisms, have been the result of comparative genomic analyses including several that were experimentally confirmed. Genome sequence has also aided the identification of genes important for the reduction of metal ions and other electron acceptors utilized by these organisms during anaerobic growth by facilitating the identification of genes disrupted by random insertions. Technologies for assaying global expression patterns for genes (mRNA) and proteins have also been enabled by the availability of genome sequence but their application has been limited mainly to the analysis of the role of global regulatory genes and to identifying genes expressed or repressed in response to specific electron acceptors. It is anticipated that details regarding the mechanisms of metal ion respiration, and metabolism in general, will eventually be revealed by comprehensive, systems-level analyses enabled by functional genomic analyses.

  1. Sequence analysis of the genome of carnation (Dianthus caryophyllus L.).

    PubMed

    Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi

    2014-06-01

    The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. 'Francesco' was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568,887,315 bp, consisting of 45,088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16,644 bp and 60,737 bp, respectively, and the longest scaffold was 1,287,144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ∼ 98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp.

  2. The Chlamydia psittaci Genome: A Comparative Analysis of Intracellular Pathogens

    PubMed Central

    Saluz, Hans Peter

    2012-01-01

    Background Chlamydiaceae are a family of obligate intracellular pathogens causing a wide range of diseases in animals and humans, and facing unique evolutionary constraints not encountered by free-living prokaryotes. To investigate genomic aspects of infection, virulence and host preference we have sequenced Chlamydia psittaci, the pathogenic agent of ornithosis. Results A comparison of the genome of the avian Chlamydia psittaci isolate 6BC with the genomes of other chlamydial species, C. trachomatis, C. muridarum, C. pneumoniae, C. abortus, C. felis and C. caviae, revealed a high level of sequence conservation and synteny across taxa, with the major exception of the human pathogen C. trachomatis. Important differences manifest in the polymorphic membrane protein family specific for the Chlamydiae and in the highly variable chlamydial plasticity zone. We identified a number of psittaci-specific polymorphic membrane proteins of the G family that may be related to differences in host-range and/or virulence as compared to closely related Chlamydiaceae. We calculated non-synonymous to synonymous substitution rate ratios for pairs of orthologous genes to identify putative targets of adaptive evolution and predicted type III secreted effector proteins. Conclusions This study is the first detailed analysis of the Chlamydia psittaci genome sequence. It provides insights in the genome architecture of C. psittaci and proposes a number of novel candidate genes mostly of yet unknown function that may be important for pathogen-host interactions. PMID:22506068

  3. Integrative prescreening in analysis of multiple cancer genomic studies

    PubMed Central

    2012-01-01

    Background In high throughput cancer genomic studies, results from the analysis of single datasets often suffer from a lack of reproducibility because of small sample sizes. Integrative analysis can effectively pool and analyze multiple datasets and provides a cost effective way to improve reproducibility. In integrative analysis, simultaneously analyzing all genes profiled may incur high computational cost. A computationally affordable remedy is prescreening, which fits marginal models, can be conducted in a parallel manner, and has low computational cost. Results An integrative prescreening approach is developed for the analysis of multiple cancer genomic datasets. Simulation shows that the proposed integrative prescreening has better performance than alternatives, particularly including prescreening with individual datasets, an intensity approach and meta-analysis. We also analyze multiple microarray gene profiling studies on liver and pancreatic cancers using the proposed approach. Conclusions The proposed integrative prescreening provides an effective way to reduce the dimensionality in cancer genomic studies. It can be coupled with existing analysis methods to identify cancer markers. PMID:22799431

  4. Integrated Analysis of Whole Genome and Transcriptome Sequencing Reveals Diverse Transcriptomic Aberrations Driven by Somatic Genomic Changes in Liver Cancers

    PubMed Central

    Shiraishi, Yuichi; Fujimoto, Akihiro; Furuta, Mayuko; Tanaka, Hiroko; Chiba, Ken-ichi; Boroevich, Keith A.; Abe, Tetsuo; Kawakami, Yoshiiku; Ueno, Masaki; Gotoh, Kunihito; Ariizumi, Shun-ichi; Shibuya, Tetsuo; Nakano, Kaoru; Sasaki, Aya; Maejima, Kazuhiro; Kitada, Rina; Hayami, Shinya; Shigekawa, Yoshinobu; Marubashi, Shigeru; Yamada, Terumasa; Kubo, Michiaki; Ishikawa, Osamu; Aikata, Hiroshi; Arihiro, Koji; Ohdan, Hideki; Yamamoto, Masakazu; Yamaue, Hiroki; Chayama, Kazuaki; Tsunoda, Tatsuhiko; Miyano, Satoru; Nakagawa, Hidewaki

    2014-01-01

    Recent studies applying high-throughput sequencing technologies have identified several recurrently mutated genes and pathways in multiple cancer genomes. However, transcriptional consequences from these genomic alterations in cancer genome remain unclear. In this study, we performed integrated and comparative analyses of whole genomes and transcriptomes of 22 hepatitis B virus (HBV)-related hepatocellular carcinomas (HCCs) and their matched controls. Comparison of whole genome sequence (WGS) and RNA-Seq revealed much evidence that various types of genomic mutations triggered diverse transcriptional changes. Not only splice-site mutations, but also silent mutations in coding regions, deep intronic mutations and structural changes caused splicing aberrations. HBV integrations generated diverse patterns of virus-human fusion transcripts depending on affected gene, such as TERT, CDK15, FN1 and MLL4. Structural variations could drive over-expression of genes such as WNT ligands, with/without creating gene fusions. Furthermore, by taking account of genomic mutations causing transcriptional aberrations, we could improve the sensitivity of deleterious mutation detection in known cancer driver genes (TP53, AXIN1, ARID2, RPS6KA3), and identified recurrent disruptions in putative cancer driver genes such as HNF4A, CPS1, TSC1 and THRAP3 in HCCs. These findings indicate genomic alterations in cancer genome have diverse transcriptomic effects, and integrated analysis of WGS and RNA-Seq can facilitate the interpretation of a large number of genomic alterations detected in cancer genome. PMID:25526364

  5. DNA copy number analysis of fresh and formalin-fixed specimens by shallow whole-genome sequencing with identification and exclusion of problematic regions in the genome assembly

    PubMed Central

    Scheinin, Ilari; Sie, Daoud; Bengtsson, Henrik; van de Wiel, Mark A.; Olshen, Adam B.; van Thuijl, Hinke F.; van Essen, Hendrik F.; Eijk, Paul P.; Rustenburg, François; Meijer, Gerrit A.; Reijneveld, Jaap C.; Wesseling, Pieter; Pinkel, Daniel; Albertson, Donna G.

    2014-01-01

    Detection of DNA copy number aberrations by shallow whole-genome sequencing (WGS) faces many challenges, including lack of completion and errors in the human reference genome, repetitive sequences, polymorphisms, variable sample quality, and biases in the sequencing procedures. Formalin-fixed paraffin-embedded (FFPE) archival material, the analysis of which is important for studies of cancer, presents particular analytical difficulties due to degradation of the DNA and frequent lack of matched reference samples. We present a robust, cost-effective WGS method for DNA copy number analysis that addresses these challenges more successfully than currently available procedures. In practice, very useful profiles can be obtained with ∼0.1× genome coverage. We improve on previous methods by first implementing a combined correction for sequence mappability and GC content, and second, by applying this procedure to sequence data from the 1000 Genomes Project in order to develop a blacklist of problematic genome regions. A small subset of these blacklisted regions was previously identified by ENCODE, but the vast majority are novel unappreciated problematic regions. Our procedures are implemented in a pipeline called QDNAseq. We have analyzed over 1000 samples, most of which were obtained from the fixed tissue archives of more than 25 institutions. We demonstrate that for most samples our sequencing and analysis procedures yield genome profiles with noise levels near the statistical limit imposed by read counting. The described procedures also provide better correction of artifacts introduced by low DNA quality than prior approaches and better copy number data than high-resolution microarrays at a substantially lower cost. PMID:25236618

  6. DNA copy number analysis of fresh and formalin-fixed specimens by shallow whole-genome sequencing with identification and exclusion of problematic regions in the genome assembly.

    PubMed

    Scheinin, Ilari; Sie, Daoud; Bengtsson, Henrik; van de Wiel, Mark A; Olshen, Adam B; van Thuijl, Hinke F; van Essen, Hendrik F; Eijk, Paul P; Rustenburg, François; Meijer, Gerrit A; Reijneveld, Jaap C; Wesseling, Pieter; Pinkel, Daniel; Albertson, Donna G; Ylstra, Bauke

    2014-12-01

    Detection of DNA copy number aberrations by shallow whole-genome sequencing (WGS) faces many challenges, including lack of completion and errors in the human reference genome, repetitive sequences, polymorphisms, variable sample quality, and biases in the sequencing procedures. Formalin-fixed paraffin-embedded (FFPE) archival material, the analysis of which is important for studies of cancer, presents particular analytical difficulties due to degradation of the DNA and frequent lack of matched reference samples. We present a robust, cost-effective WGS method for DNA copy number analysis that addresses these challenges more successfully than currently available procedures. In practice, very useful profiles can be obtained with ∼0.1× genome coverage. We improve on previous methods by first implementing a combined correction for sequence mappability and GC content, and second, by applying this procedure to sequence data from the 1000 Genomes Project in order to develop a blacklist of problematic genome regions. A small subset of these blacklisted regions was previously identified by ENCODE, but the vast majority are novel unappreciated problematic regions. Our procedures are implemented in a pipeline called QDNAseq. We have analyzed over 1000 samples, most of which were obtained from the fixed tissue archives of more than 25 institutions. We demonstrate that for most samples our sequencing and analysis procedures yield genome profiles with noise levels near the statistical limit imposed by read counting. The described procedures also provide better correction of artifacts introduced by low DNA quality than prior approaches and better copy number data than high-resolution microarrays at a substantially lower cost.

  7. Emerging pathogens of gilthead seabream: characterisation and genomic analysis of novel intracellular β-proteobacteria

    PubMed Central

    Seth-Smith, Helena M.B.; Dourala, Nancy; Fehr, Alexander; Qi, Weihong; Katharios, Pantelis; Ruetten, Maja; Mateos, José M.; Nufer, Lisbeth; Weilenmann, Roseline; Ziegler, Urs; Thomson, Nicholas R; Schlapbach, Ralph; Vaughan, Lloyd

    2015-01-01

    New and emerging environmental pathogens pose some of the greatest threats to modern aquaculture, a critical source of food protein globally. As with other intensive farming practices, increasing our understanding of the biology of infections is important to improve animal welfare and husbandry. The gill infection epitheliocystis is increasingly problematic in gilthead seabream (Sparus aurata), a major Mediterranean aquaculture species. Epitheliocystis is generally associated with chlamydial bacteria, yet we were not able to localise chlamydial targets within the major gilthead seabream lesions. Two previously unidentified species within a novel β-proteobacterial genus were instead identified. These co-infecting intracellular bacteria have been characterised using high resolution imaging and genomics, presenting the most comprehensive study on epitheliocystis agents to date. The genomes of the two uncultured species, Ca. Ichthyocystis hellenicum and Ca. Ichthyocystis sparus, have been de novo sequenced and annotated from preserved material. Analysis of the genomes shows a compact core indicating a metabolic dependency on the host, and an accessory genome with an unprecedented number of tandemly arrayed gene families. This study represents a critical insight into novel, emerging fish pathogens and will be used to underpin future investigations into the bacterial origins, and to develop diagnostic and treatment strategies. PMID:26849311

  8. Viral genome analysis and knowledge management.

    PubMed

    Kuiken, Carla; Yoon, Hyejin; Abfalterer, Werner; Gaschen, Brian; Lo, Chienchi; Korber, Bette

    2013-01-01

    One of the challenges of genetic data analysis is to combine information from sources that are distributed around the world and accessible through a wide array of different methods and interfaces. The HIV database and its footsteps, the hepatitis C virus (HCV) and hemorrhagic fever virus (HFV) databases, have made it their mission to make different data types easily available to their users. This involves a large amount of behind-the-scenes processing, including quality control and analysis of the sequences and their annotation. Gene and protein sequences are distilled from the sequences that are stored in GenBank; to this end, both submitter annotation and script-generated sequences are used. Alignments of both nucleotide and amino acid sequences are generated, manually curated, distilled into an alignment model, and regenerated in an iterative cycle that results in ever better new alignments. Annotation of epidemiological and clinical information is parsed, checked, and added to the database. User interfaces are updated, and new interfaces are added based upon user requests. Vital for its success, the database staff are heavy users of the system, which enables them to fix bugs and find opportunities for improvement. In this chapter we describe some of the infrastructure that keeps these heavily used analysis platforms alive and vital after nearly 25 years of use. The database/analysis platforms described in this chapter can be accessed at http://hiv.lanl.gov http://hcv.lanl.gov http://hfv.lanl.gov.

  9. Viral genome analysis and knowledge management.

    PubMed

    Kuiken, Carla; Yoon, Hyejin; Abfalterer, Werner; Gaschen, Brian; Lo, Chienchi; Korber, Bette

    2013-01-01

    One of the challenges of genetic data analysis is to combine information from sources that are distributed around the world and accessible through a wide array of different methods and interfaces. The HIV database and its footsteps, the hepatitis C virus (HCV) and hemorrhagic fever virus (HFV) databases, have made it their mission to make different data types easily available to their users. This involves a large amount of behind-the-scenes processing, including quality control and analysis of the sequences and their annotation. Gene and protein sequences are distilled from the sequences that are stored in GenBank; to this end, both submitter annotation and script-generated sequences are used. Alignments of both nucleotide and amino acid sequences are generated, manually curated, distilled into an alignment model, and regenerated in an iterative cycle that results in ever better new alignments. Annotation of epidemiological and clinical information is parsed, checked, and added to the database. User interfaces are updated, and new interfaces are added based upon user requests. Vital for its success, the database staff are heavy users of the system, which enables them to fix bugs and find opportunities for improvement. In this chapter we describe some of the infrastructure that keeps these heavily used analysis platforms alive and vital after nearly 25 years of use. The database/analysis platforms described in this chapter can be accessed at http://hiv.lanl.gov http://hcv.lanl.gov http://hfv.lanl.gov. PMID:23192551

  10. Genome sequencing and analysis conference grant

    SciTech Connect

    Venter, J.C.

    1995-10-01

    The 14 plenary session presentations focused on nematode; yeast; fruit fly; plants; mycobacteria; and man. In addition there were presentations on a variety of technical innovations including database developments and refinements, bioelectronic genesensors, computer-assisted multiplex techniques, and hybridization analysis with DNA chip technology. This document includes a list of exhibitors and abstracts of sessions.

  11. Improved protocol for rapid identification of certain spa types using high resolution melting curve analysis.

    PubMed

    Mayerhofer, Benjamin; Stöger, Anna; Pietzka, Ariane T; Fernandez, Haizpea Lasa; Prewein, Bernhard; Sorschag, Sieglinde; Kunert, Renate; Allerberger, Franz; Ruppitsch, Werner

    2015-01-01

    Methicillin-resistant Staphylococcus aureus is one of the most significant pathogens associated with health care. For efficient surveillance, control and outbreak investigation, S. aureus typing is essential. A high resolution melting curve analysis was developed and evaluated for rapid identification of the most frequent spa types found in an Austrian hospital consortium covering 2,435 beds. Among 557 methicillin-resistant Staphylococcus aureus isolates 38 different spa types were identified by sequence analysis of the hypervariable region X of the protein A gene (spa). Identification of spa types through their characteristic high resolution melting curve profiles was considerably improved by double spiking with genomic DNA from spa type t030 and spa type t003 and allowed unambiguous and fast identification of the ten most frequent spa types t001 (58%), t003 (12%), t190 (9%), t041 (5%), t022 (2%), t032 (2%), t008 (2%), t002 (1%), t5712 (1%) and t2203 (1%), representing 93% of all isolates within this hospital consortium. The performance of the assay was evaluated by testing samples with unknown spa types from the daily routine and by testing three different high resolution melting curve analysis real-time PCR instruments. The ten most frequent spa types were identified from all samples and on all instruments with 100% specificity and 100% sensitivity. Compared to classical spa typing by sequence analysis, this gene scanning assay is faster, cheaper and can be performed in a single closed tube assay format. Therefore it is an optimal screening tool to detect the most frequent endemic spa types and to exclude non-endemic spa types within a hospital. PMID:25768007

  12. Improved protocol for rapid identification of certain spa types using high resolution melting curve analysis.

    PubMed

    Mayerhofer, Benjamin; Stöger, Anna; Pietzka, Ariane T; Fernandez, Haizpea Lasa; Prewein, Bernhard; Sorschag, Sieglinde; Kunert, Renate; Allerberger, Franz; Ruppitsch, Werner

    2015-01-01

    Methicillin-resistant Staphylococcus aureus is one of the most significant pathogens associated with health care. For efficient surveillance, control and outbreak investigation, S. aureus typing is essential. A high resolution melting curve analysis was developed and evaluated for rapid identification of the most frequent spa types found in an Austrian hospital consortium covering 2,435 beds. Among 557 methicillin-resistant Staphylococcus aureus isolates 38 different spa types were identified by sequence analysis of the hypervariable region X of the protein A gene (spa). Identification of spa types through their characteristic high resolution melting curve profiles was considerably improved by double spiking with genomic DNA from spa type t030 and spa type t003 and allowed unambiguous and fast identification of the ten most frequent spa types t001 (58%), t003 (12%), t190 (9%), t041 (5%), t022 (2%), t032 (2%), t008 (2%), t002 (1%), t5712 (1%) and t2203 (1%), representing 93% of all isolates within this hospital consortium. The performance of the assay was evaluated by testing samples with unknown spa types from the daily routine and by testing three different high resolution melting curve analysis real-time PCR instruments. The ten most frequent spa types were identified from all samples and on all instruments with 100% specificity and 100% sensitivity. Compared to classical spa typing by sequence analysis, this gene scanning assay is faster, cheaper and can be performed in a single closed tube assay format. Therefore it is an optimal screening tool to detect the most frequent endemic spa types and to exclude non-endemic spa types within a hospital.

  13. Meta-analysis of genome-wide association from genomic prediction models.

    PubMed

    Bernal Rubio, Y L; Gualdrón Duarte, J L; Bates, R O; Ernst, C W; Nonneman, D; Rohrer, G A; King, A; Shackelford, S D; Wheeler, T L; Cantet, R J C; Steibel, J P

    2016-02-01

    Genome-wide association (GWA) studies based on GBLUP models are a common practice in animal breeding. However, effect sizes of GWA tests are small, requiring larger sample sizes to enhance power of detection of rare variants. Because of difficulties in increasing sample size in animal populations, one alternative is to implement a meta-analysis (MA), combining information and results from independent GWA studies. Although this methodology has been used widely in human genetics, implementation in animal breeding has been limited. Thus, we present methods to implement a MA of GWA, describing the proper approach to compute weights derived from multiple genomic evaluations based on animal-centric GBLUP models. Application to real datasets shows that MA increases power of detection of associations in comparison with population-level GWA, allowing for population structure and heterogeneity of variance components across populations to be accounted for. Another advantage of MA is that it does not require access to genotype data that is required for a joint analysis. Scripts related to the implementation of this approach, which consider the strength of association as well as the sign, are distributed and thus account for heterogeneity in association phase between QTL and SNPs. Thus, MA of GWA is an attractive alternative to summarizing results from multiple genomic studies, avoiding restrictions with genotype data sharing, definition of fixed effects and different scales of measurement of evaluated traits.

  14. Elemental Analysis of Glass Optical Fibres with High Spatial Resolution.

    NASA Astrophysics Data System (ADS)

    Pugh, Andrew

    Available from UMI in association with The British Library. The properties of glass optical fibres are very strongly dependent on the elemental concentration profiles of the fibre cores. Core dopants such as germanium define the core refractive index, which in turn defines the manner in which the light is transmitted through the fibre. Erbium in fibre cores can facilitate the operation of fibre lasers and aluminium in turn can control the erbium distribution. The aim of the project described in this thesis was to measure the elemental concentration profiles in a variety of fibre cores using X-ray microanalysis in an electron microscope. Conventional X-ray microanalysis of bulk samples has an analytical resolution in the order of a micron. With monomode optical fibre cores having cores typically three microns in diameter the resolution of the conventional technique is plainly inadequate. An experimental technique has been developed for the preparation of thin cross-sectional samples of glass optical fibres. Application of this technique has facilitated the preparation and analysis of thin film specimens with an average thickness of 400 microns. This approach has allowed analysis to be performed with an effective spatial resolution of 100-300 nm. The technique has been applied to the determination of germanium concentration in Raman fibres, to the investigation of erbium confinement in erbium doped fibres and to the investigation of inter-ionic diffusion in semiconductor doped fibres. It has been shown that the germanium, and hence refractive index, profile of germanium doped fibres is not changed by the process of fibre drawing. Evidence has been gathered supporting the theory of erbium confinement by aluminium and an important degree of elemental diffusion has been shown to take place during the drawing of semiconductor doped fibres. In addition an experimental technique has been developed for the preparation of thin cross-sectional samples of glass optical fibres.

  15. Massively expedited genome-wide heritability analysis (MEGHA).

    PubMed

    Ge, Tian; Nichols, Thomas E; Lee, Phil H; Holmes, Avram J; Roffman, Joshua L; Buckner, Randy L; Sabuncu, Mert R; Smoller, Jordan W

    2015-02-24

    The discovery and prioritization of heritable phenotypes is a computational challenge in a variety of settings, including neuroimaging genetics and analyses of the vast phenotypic repositories in electronic health record systems and population-based biobanks. Classical estimates of heritability require twin or pedigree data, which can be costly and difficult to acquire. Genome-wide complex trait analysis is an alternative tool to compute heritability estimates from unrelated individuals, using genome-wide data that are increasingly ubiquitous, but is computationally demanding and becomes difficult to apply in evaluating very large numbers of phenotypes. Here we present a fast and accurate statistical method for high-dimensional heritability analysis using genome-wide SNP data from unrelated individuals, termed massively expedited genome-wide heritability analysis (MEGHA) and accompanying nonparametric sampling techniques that enable flexible inferences for arbitrary statistics of interest. MEGHA produces estimates and significance measures of heritability with several orders of magnitude less computational time than existing methods, making heritability-based prioritization of millions of phenotypes based on data from unrelated individuals tractable for the first time to our knowledge. As a demonstration of application, we conducted heritability analyses on global and local morphometric measurements derived from brain structural MRI scans, using genome-wide SNP data from 1,320 unrelated young healthy adults of non-Hispanic European ancestry. We also computed surface maps of heritability for cortical thickness measures and empirically localized cortical regions where thickness measures were significantly heritable. Our analyses demonstrate the unique capability of MEGHA for large-scale heritability-based screening and high-dimensional heritability profile construction.

  16. Objective high Resolution Analysis over Complex Terrain with VERA

    NASA Astrophysics Data System (ADS)

    Mayer, D.; Steinacker, R.; Steiner, A.

    2012-04-01

    VERA (Vienna Enhanced Resolution Analysis) is a model independent, high resolution objective analysis of meteorological fields over complex terrain. This system consists of a special developed quality control procedure and a combination of an interpolation and a downscaling technique. Whereas the so called VERA-QC is presented at this conference in the contribution titled "VERA-QC, an approved Data Quality Control based on Self-Consistency" by Andrea Steiner, this presentation will focus on the method and the characteristics of the VERA interpolation scheme which enables one to compute grid point values of a meteorological field based on irregularly distributed observations and topography related aprior knowledge. Over a complex topography meteorological fields are not smooth in general. The roughness which is induced by the topography can be explained physically. The knowledge about this behavior is used to define the so called Fingerprints (e.g. a thermal Fingerprint reproducing heating or cooling over mountainous terrain or a dynamical Fingerprint reproducing positive pressure perturbation on the windward side of a ridge) under idealized conditions. If the VERA algorithm recognizes patterns of one or more Fingerprints at a few observation points, the corresponding patterns are used to downscale the meteorological information in a greater surrounding. This technique allows to achieve an analysis with a resolution much higher than the one of the observational network. The interpolation of irregularly distributed stations to a regular grid (in space and time) is based on a variational principle applied to first and second order spatial and temporal derivatives. Mathematically, this can be formulated as a cost function that is equivalent to the penalty function of a thin plate smoothing spline. After the analysis field has been divided into the Fingerprint components and the unexplained part respectively, the requirement of a smooth distribution is applied to the

  17. Comparative Analysis of Genome Sequences Covering the Seven Cronobacter Species

    PubMed Central

    Cummings, Craig A.; Shih, Rita; Degoricija, Lovorka; Rico, Alain; Brzoska, Pius; Hamby, Stephen E.; Masood, Naqash; Hariri, Sumyya; Sonbol, Hana; Chuzhanova, Nadia; McClelland, Michael; Furtado, Manohar R.; Forsythe, Stephen J.

    2012-01-01

    Background Species of Cronobacter are widespread in the environment and are occasional food-borne pathogens associated with serious neonatal diseases, including bacteraemia, meningitis, and necrotising enterocolitis. The genus is composed of seven species: C. sakazakii, C. malonaticus, C. turicensis, C. dublinensis, C. muytjensii, C. universalis, and C. condimenti. Clinical cases are associated with three species, C. malonaticus, C. turicensis and, in particular, with C. sakazakii multilocus sequence type 4. Thus, it is plausible that virulence determinants have evolved in certain lineages. Methodology/Principal Findings We generated high quality sequence drafts for eleven Cronobacter genomes representing the seven Cronobacter species, including an ST4 strain of C. sakazakii. Comparative analysis of these genomes together with the two publicly available genomes revealed Cronobacter has over 6,000 genes in one or more strains and over 2,000 genes shared by all Cronobacter. Considerable variation in the presence of traits such as type six secretion systems, metal resistance (tellurite, copper and silver), and adhesins were found. C. sakazakii is unique in the Cronobacter genus in encoding genes enabling the utilization of exogenous sialic acid which may have clinical significance. The C. sakazakii ST4 strain 701 contained additional genes as compared to other C. sakazakii but none of them were known specific virulence-related genes. Conclusions/Significance Genome comparison revealed that pair-wise DNA sequence identity varies between 89 and 97% in the seven Cronobacter species, and also suggested various degrees of divergence. Sets of universal core genes and accessory genes unique to each strain were identified. These gene sequences can be used for designing genus/species specific detection assays. Genes encoding adhesins, T6SS, and metal resistance genes as well as prophages are found in only subsets of genomes and have contributed considerably to the variation of

  18. Genome-wide association interaction analysis for Alzheimer's disease.

    PubMed

    Gusareva, Elena S; Carrasquillo, Minerva M; Bellenguez, Céline; Cuyvers, Elise; Colon, Samuel; Graff-Radford, Neill R; Petersen, Ronald C; Dickson, Dennis W; Mahachie John, Jestinah M; Bessonov, Kyrylo; Van Broeckhoven, Christine; Harold, Denise; Williams, Julie; Amouyel, Philippe; Sleegers, Kristel; Ertekin-Taner, Nilüfer; Lambert, Jean-Charles; Van Steen, Kristel; Ramirez, Alfredo

    2014-11-01

    We propose a minimal protocol for exhaustive genome-wide association interaction analysis that involves screening for epistasis over large-scale genomic data combining strengths of different methods and statistical tools. The different steps of this protocol are illustrated on a real-life data application for Alzheimer's disease (AD) (2259 patients and 6017 controls from France). Particularly, in the exhaustive genome-wide epistasis screening we identified AD-associated interacting SNPs-pair from chromosome 6q11.1 (rs6455128, the KHDRBS2 gene) and 13q12.11 (rs7989332, the CRYL1 gene) (p = 0.006, corrected for multiple testing). A replication analysis in the independent AD cohort from Germany (555 patients and 824 controls) confirmed the discovered epistasis signal (p = 0.036). This signal was also supported by a meta-analysis approach in 5 independent AD cohorts that was applied in the context of epistasis for the first time. Transcriptome analysis revealed negative correlation between expression levels of KHDRBS2 and CRYL1 in both the temporal cortex (β = -0.19, p = 0.0006) and cerebellum (β = -0.23, p < 0.0001) brain regions. This is the first time a replicable epistasis associated with AD was identified using a hypothesis free screening approach.

  19. High-resolution genomic profiling of thyroid lesions uncovers preferential copy number gains affecting mitochondrial biogenesis loci in the oncocytic variants

    PubMed Central

    Kurelac, Ivana; de Biase, Dario; Calabrese, Claudia; Ceccarelli, Claudio; Ng, Charlotte KY; Lim, Raymond; MacKay, Alan; Weigelt, Britta; Porcelli, Anna Maria; Reis-Filho, Jorge S; Tallini, Giovanni; Gasparre, Giuseppe

    2015-01-01

    Oncocytic change is the result of aberrant mitochondrial hyperplasia, which may occur in both neoplastic and non-neoplastic cells and is not infrequent in the thyroid. Despite being a well-characterized histologic phenotype, the molecular causes underlying such a distinctive cellular change are poorly understood. To identify potential genetic causes for the oncocytic phenotype in thyroid, we analyzed copy number alterations in a set of oncocytic (n=21) and non-oncocytic (n=20) thyroid lesions by high-resolution microarray-based comparative genomic hybridization (aCGH). Each group comprised lesions of diverse histologic types, including hyperplastic nodules, adenomas and carcinomas. Unsupervised hierarchical clustering of categorical aCGH data resulted in two distinct branches, one of which was significantly enriched for samples with the oncocytic phenotype, regardless of histologic type. Analysis of aCGH events showed that the oncocytic group harbored a significantly higher number of genes involved in copy number gains, when compared to that of conventional thyroid lesions. Functional annotation demonstrated an enrichment for copy number gains that affect genes encoding activators of mitochondrial biogenesis in oncocytic cases but not in their non-oncocytic counterparts. Taken together, our data suggest that genomic alterations may represent additional/alternative mechanisms underlying the development of the oncocytic phenotype in the thyroid. PMID:26269756

  20. Rapid High Resolution Genotyping of Francisella tularensis by Whole Genome Sequence Comparison of Annotated Genes (“MLST+”)

    PubMed Central

    Mellmann, Alexander; Höppner, Sebastian; Splettstoesser, Wolf D.; Harmsen, Dag

    2015-01-01

    The zoonotic disease tularemia is caused by the bacterium Francisella tularensis. This pathogen is considered as a category A select agent with potential to be misused in bioterrorism. Molecular typing based on DNA-sequence like canSNP-typing or MLVA has become the accepted standard for this organism. Due to the organism’s highly clonal nature, the current typing methods have reached their limit of discrimination for classifying closely related subpopulations within the subspecies F. tularensis ssp. holarctica. We introduce a new gene-by-gene approach, MLST+, based on whole genome data of 15 sequenced F. tularensis ssp. holarctica strains and apply this approach to investigate an epidemic of lethal tularemia among non-human primates in two animal facilities in Germany. Due to the high resolution of MLST+ we are able to demonstrate that three independent clones of this highly infectious pathogen were responsible for these spatially and temporally restricted outbreaks. PMID:25856198

  1. Comparative Genome Analysis of Basidiomycete Fungi

    SciTech Connect

    Riley, Robert; Salamov, Asaf; Morin, Emmanuelle; Nagy, Laszlo; Manning, Gerard; Baker, Scott; Brown, Daren; Henrissat, Bernard; Levasseur, Anthony; Hibbett, David; Martin, Francis; Grigoriev, Igor

    2012-03-19

    Fungi of the phylum Basidiomycota (basidiomycetes), make up some 37percent of the described fungi, and are important in forestry, agriculture, medicine, and bioenergy. This diverse phylum includes the mushrooms, wood rots, symbionts, and plant and animal pathogens. To better understand the diversity of phenotypes in basidiomycetes, we performed a comparative analysis of 35 basidiomycete fungi spanning the diversity of the phylum. Phylogenetic patterns of lignocellulose degrading genes suggest a continuum rather than a sharp dichotomy between the white rot and brown rot modes of wood decay. Patterns of secondary metabolic enzymes give additional insight into the broad array of phenotypes found in the basidiomycetes. We suggest that the profile of an organism in lignocellulose-targeting genes can be used to predict its nutritional mode, and predict Dacryopinax sp. as a brown rot; Botryobasidium botryosum and Jaapia argillacea as white rots.

  2. Genome wide characterization of simple sequence repeats in watermelon genome and their application in comparative mapping and genetic diversity analysis

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Simple sequence repeats (SSR) or microsatellite markers are one of the most informative and versatile DNA-based markers. The use of next-generation sequencing technologies allow whole genome sequencing and make it possible to develop large numbers of SSRs through bioinformatic analysis of genome da...

  3. Progress toward accurate high spatial resolution actinide analysis by EPMA

    NASA Astrophysics Data System (ADS)

    Jercinovic, M. J.; Allaz, J. M.; Williams, M. L.

    2010-12-01

    High precision, high spatial resolution EPMA of actinides is a significant issue for geochronology, resource geochemistry, and studies involving the nuclear fuel cycle. Particular interest focuses on understanding of the behavior of Th and U in the growth and breakdown reactions relevant to actinide-bearing phases (monazite, zircon, thorite, allanite, etc.), and geochemical fractionation processes involving Th and U in fluid interactions. Unfortunately, the measurement of minor and trace concentrations of U in the presence of major concentrations of Th and/or REEs is particularly problematic, especially in complexly zoned phases with large compositional variation on the micro or nanoscale - spatial resolutions now accessible with modern instruments. Sub-micron, high precision compositional analysis of minor components is feasible in very high Z phases where scattering is limited at lower kV (15kV or less) and where the beam diameter can be kept below 400nm at high current (e.g. 200-500nA). High collection efficiency spectrometers and high performance electron optics in EPMA now allow the use of lower overvoltage through an exceptional range in beam current, facilitating higher spatial resolution quantitative analysis. The U LIII edge at 17.2 kV precludes L-series analysis at low kV (high spatial resolution), requiring careful measurements of the actinide M series. Also, U-La detection (wavelength = 0.9A) requires the use of LiF (220) or (420), not generally available on most instruments. Strong peak overlaps of Th on U make highly accurate interference correction mandatory, with problems compounded by the ThMIV and ThMV absorption edges affecting peak, background, and interference calibration measurements (especially the interference of the Th M line family on UMb). Complex REE bearing phases such as monazite, zircon, and allanite have particularly complex interference issues due to multiple peak and background overlaps from elements present in the activation

  4. Analysis of the impact of spatial resolution on land/water classifications using high-resolution aerial imagery

    USGS Publications Warehouse

    Enwright, Nicholas M.; Jones, William R.; Garber, Adrienne L.; Keller, Matthew J.

    2014-01-01

    Long-term monitoring efforts often use remote sensing to track trends in habitat or landscape conditions over time. To most appropriately compare observations over time, long-term monitoring efforts strive for consistency in methods. Thus, advances and changes in technology over time can present a challenge. For instance, modern camera technology has led to an increasing availability of very high-resolution imagery (i.e. submetre and metre) and a shift from analogue to digital photography. While numerous studies have shown that image resolution can impact the accuracy of classifications, most of these studies have focused on the impacts of comparing spatial resolution changes greater than 2 m. Thus, a knowledge gap exists on the impacts of minor changes in spatial resolution (i.e. submetre to about 1.5 m) in very high-resolution aerial imagery (i.e. 2 m resolution or less). This study compared the impact of spatial resolution on land/water classifications of an area dominated by coastal marsh vegetation in Louisiana, USA, using 1:12,000 scale colour-infrared analogue aerial photography (AAP) scanned at four different dot-per-inch resolutions simulating ground sample distances (GSDs) of 0.33, 0.54, 1, and 2 m. Analysis of the impact of spatial resolution on land/water classifications was conducted by exploring various spatial aspects of the classifications including density of waterbodies and frequency distributions in waterbody sizes. This study found that a small-magnitude change (1–1.5 m) in spatial resolution had little to no impact on the amount of water classified (i.e. percentage mapped was less than 1.5%), but had a significant impact on the mapping of very small waterbodies (i.e. waterbodies ≤ 250 m2). These findings should interest those using temporal image classifications derived from very high-resolution aerial photography as a component of long-term monitoring programs.

  5. Transcription-coupled and global genome repair in the Saccharomyces cerevisiae RPB2 gene at nucleotide resolution.

    PubMed Central

    Tijsterman, M; Tasseron-de Jong, J G; van de Putte, P; Brouwer, J

    1996-01-01

    Repair of UV-induced cyclobutane pyrimidine dimers (CPDs) was examined at single nucleotide resolution in the yeast Saccharomyces cerevisiae, using an improved protocol for genomic end-labelling. To obtain the sensitivity required for adduct detection in yeast, an oligonucleotide-directed enrichment step was introduced into the current methodology developed for adduct detection in Escherichia coli. With this method, heterogeneous repair of CPDs within the RPB2 locus is observed. Individual CPDs positioned in the transcribed strand are removed very efficiently with identical kinetics. This fast repair starts within 23 bases downstream of the transcription initiation site. The non-transcribed strand of the active gene exhibits slow repair without detectable repair variations between individual lesions. In contrast, CPDs positioned in the promoter region show profound repair heterogeneity. Here, CPDs at specific sites are removed very quickly, with comparable rates to CPDs positioned in the transcribed strand, while at other positions lesions are not repaired at all during the period studied. Interestingly, the fast repair in the promoter region is dependent on the RAD7 and RAD16 genes, as are the slowly repaired CPDs in this region and in the non-transcribed strand. This indicates that the global genome repair pathway is not intrinsically slow and at specific positions can be as efficient as the transcription-coupled repair pathway. PMID:8836174

  6. Integrated genome-wide analysis of genomic changes and gene regulation in human adrenocortical tissue samples

    PubMed Central

    Gara, Sudheer Kumar; Wang, Yonghong; Patel, Dhaval; Liu-Chittenden, Yi; Jain, Meenu; Boufraqech, Myriem; Zhang, Lisa; Meltzer, Paul S.; Kebebew, Electron

    2015-01-01

    To gain insight into the pathogenesis of adrenocortical carcinoma (ACC) and whether there is progression from normal-to-adenoma-to-carcinoma, we performed genome-wide gene expression, gene methylation, microRNA expression and comparative genomic hybridization (CGH) analysis in human adrenocortical tissue (normal, adrenocortical adenomas and ACC) samples. A pairwise comparison of normal, adrenocortical adenomas and ACC gene expression profiles with more than four-fold expression differences and an adjusted P-value < 0.05 revealed no major differences in normal versus adrenocortical adenoma whereas there are 808 and 1085, respectively, dysregulated genes between ACC versus adrenocortical adenoma and ACC versus normal. The majority of the dysregulated genes in ACC were downregulated. By integrating the CGH, gene methylation and expression profiles of potential miRNAs with the gene expression of dysregulated genes, we found that there are higher alterations in ACC versus normal compared to ACC versus adrenocortical adenoma. Importantly, we identified several novel molecular pathways that are associated with dysregulated genes and further experimentally validated that oncostatin m signaling induces caspase 3 dependent apoptosis and suppresses cell proliferation. Finally, we propose that there is higher number of genomic changes from normal-to-adenoma-to-carcinoma and identified oncostatin m signaling as a plausible druggable pathway for therapeutics. PMID:26446994

  7. Integrated genome-wide analysis of genomic changes and gene regulation in human adrenocortical tissue samples.

    PubMed

    Gara, Sudheer Kumar; Wang, Yonghong; Patel, Dhaval; Liu-Chittenden, Yi; Jain, Meenu; Boufraqech, Myriem; Zhang, Lisa; Meltzer, Paul S; Kebebew, Electron

    2015-10-30

    To gain insight into the pathogenesis of adrenocortical carcinoma (ACC) and whether there is progression from normal-to-adenoma-to-carcinoma, we performed genome-wide gene expression, gene methylation, microRNA expression and comparative genomic hybridization (CGH) analysis in human adrenocortical tissue (normal, adrenocortical adenomas and ACC) samples. A pairwise comparison of normal, adrenocortical adenomas and ACC gene expression profiles with more than four-fold expression differences and an adjusted P-value < 0.05 revealed no major differences in normal versus adrenocortical adenoma whereas there are 808 and 1085, respectively, dysregulated genes between ACC versus adrenocortical adenoma and ACC versus normal. The majority of the dysregulated genes in ACC were downregulated. By integrating the CGH, gene methylation and expression profiles of potential miRNAs with the gene expression of dysregulated genes, we found that there are higher alterations in ACC versus normal compared to ACC versus adrenocortical adenoma. Importantly, we identified several novel molecular pathways that are associated with dysregulated genes and further experimentally validated that oncostatin m signaling induces caspase 3 dependent apoptosis and suppresses cell proliferation. Finally, we propose that there is higher number of genomic changes from normal-to-adenoma-to-carcinoma and identified oncostatin m signaling as a plausible druggable pathway for therapeutics.

  8. GENSTYLE: exploration and analysis of DNA sequences with genomic signature.

    PubMed

    Fertil, Bernard; Massin, Matthieu; Lespinats, Sylvain; Devic, Caroline; Dumee, Philippe; Giron, Alain

    2005-07-01

    GENSTYLE (http://Genstyle.imed.jussieu.fr) is a workspace designed for the characterization and classification of nucleotide sequences. Based on the genomic signature paradigm, GENSTYLE focuses on oligonucleotide frequencies in DNA sequences. Users can select sequences of interest in the GENSTYLE companion database, where the whole set of GenBank sequences is grouped per species, or upload their own sequences to work with. Tools for the exploration and analysis of signatures allow (i) identification of the origin of DNA segments (detection of rare species or species for which technical problems prevent fast characterization, such as micro-organisms with slow growth), (ii) analysis of the homogeneity of a genome and isolation of areas with novel functionality (horizontal transfers for example)--and (iii) molecular phylogeny and taxonomy.

  9. Genome and Proteome Analysis of Industrial Fungi

    SciTech Connect

    Baker, Scott E.; Wend, Christopher F.; Martinez, Antonio D.; Magnuson, Jon K.; Panisko, Ellen A.; Dai, Ziyu; Bruno, Kenneth S.; Anderson, Kevin K.; Monroe, Matthew E.; Daly, Don S.; Lasure, Linda L.

    2007-09-06

    In order to decrease dependence on petroleum, the United States Department of Energy (USDOE) Office of the Biomass Program (OBP) is investing in research and development to enable its vision of the biorefinery. The biorefinery will decrease the use of petroleum through conversion of biomass such as crops or agricultural waste into fuels and products. How do fungi fit into the biorefinery? Analysis of the “Top Ten” study indicates that nine of the top twelve chemical building blocks are currently produced or may potentially be produced by fungal fermentation processes. However, a significant barrier to the use of bio-based products is the economic feasibility – fuels and products must be price-competitive with those derived from petroleum. An obvious way to decrease the costs of biobased products from fungi is to make fermentation strains more productive and processes more efficient. Traditional strain improvement programs typically span a time scale measured in decades and process development done through the use of batch cultures is extremely labor intensive.

  10. A Chromosome 7 Pericentric Inversion Defined at Single-Nucleotide Resolution Using Diagnostic Whole Genome Sequencing in a Patient with Hand-Foot-Genital Syndrome.

    PubMed

    Watson, Christopher M; Crinnion, Laura A; Harrison, Sally M; Lascelles, Carolina; Antanaviciute, Agne; Carr, Ian M; Bonthron, David T; Sheridan, Eamonn

    2016-01-01

    Next generation sequencing methodologies are facilitating the rapid characterisation of novel structural variants at nucleotide resolution. These approaches are particularly applicable to variants initially identified using alternative molecular methods. We report a child born with bilateral postaxial syndactyly of the feet and bilateral fifth finger clinodactyly. This was presumed to be an autosomal recessive syndrome, due to the family history of consanguinity. Karyotype analysis revealed a homozygous pericentric inversion of chromosome 7 (46,XX,inv(7)(p15q21)x2) which was confirmed to be heterozygous in both unaffected parents. Since the resolution of the karyotype was insufficient to identify any putatively causative gene, we undertook medium-coverage whole genome sequencing using paired-end reads, in order to elucidate the molecular breakpoints. In a two-step analysis, we first narrowed down the region by identifying discordant read-pairs, and then determined the precise molecular breakpoint by analysing the mapping locations of "soft-clipped" breakpoint-spanning reads. PCR and Sanger sequencing confirmed the identified breakpoints, both of which were located in intergenic regions. Significantly, the 7p15 breakpoint was located 523 kb upstream of HOXA13, the locus for hand-foot-genital syndrome. By inference from studies of HOXA locus control in the mouse, we suggest that the inversion has delocalised a HOXA13 enhancer to produce the phenotype observed in our patient. This study demonstrates how modern genetic diagnostic approach can characterise structural variants at nucleotide resolution and provide potential insights into functional regulation. PMID:27272187

  11. A Chromosome 7 Pericentric Inversion Defined at Single-Nucleotide Resolution Using Diagnostic Whole Genome Sequencing in a Patient with Hand-Foot-Genital Syndrome

    PubMed Central

    Crinnion, Laura A.; Harrison, Sally M.; Lascelles, Carolina; Antanaviciute, Agne; Carr, Ian M.; Bonthron, David T.; Sheridan, Eamonn

    2016-01-01

    Next generation sequencing methodologies are facilitating the rapid characterisation of novel structural variants at nucleotide resolution. These approaches are particularly applicable to variants initially identified using alternative molecular methods. We report a child born with bilateral postaxial syndactyly of the feet and bilateral fifth finger clinodactyly. This was presumed to be an autosomal recessive syndrome, due to the family history of consanguinity. Karyotype analysis revealed a homozygous pericentric inversion of chromosome 7 (46,XX,inv(7)(p15q21)x2) which was confirmed to be heterozygous in both unaffected parents. Since the resolution of the karyotype was insufficient to identify any putatively causative gene, we undertook medium-coverage whole genome sequencing using paired-end reads, in order to elucidate the molecular breakpoints. In a two-step analysis, we first narrowed down the region by identifying discordant read-pairs, and then determined the precise molecular breakpoint by analysing the mapping locations of “soft-clipped” breakpoint-spanning reads. PCR and Sanger sequencing confirmed the identified breakpoints, both of which were located in intergenic regions. Significantly, the 7p15 breakpoint was located 523 kb upstream of HOXA13, the locus for hand-foot-genital syndrome. By inference from studies of HOXA locus control in the mouse, we suggest that the inversion has delocalised a HOXA13 enhancer to produce the phenotype observed in our patient. This study demonstrates how modern genetic diagnostic approach can characterise structural variants at nucleotide resolution and provide potential insights into functional regulation. PMID:27272187

  12. High-resolution genome-wide mapping of HIF-binding sites by ChIP-seq

    PubMed Central

    Schödel, Johannes; Oikonomopoulos, Spyros; Ragoussis, Jiannis; Pugh, Christopher W.; Ratcliffe, Peter J.

    2011-01-01

    Hypoxia-inducible factor (HIF) regulates the major transcriptional cascade central to the response of all mammalian cells to alterations in oxygen tension. Expression arrays indicate that many hundreds of genes are regulated by this pathway, controlling diverse processes that in turn orchestrate both oxygen delivery and utilization. However, the extent to which HIF exerts direct versus indirect control over gene expression together with the factors dictating the range of HIF-regulated genes remains unclear. Using chromatin immunoprecipitation linked to high throughput sequencing, we identify HIF-binding sites across the genome, independently of gene architecture. Using gene set enrichment analysis, we demonstrate robust associations with the regulation of gene expression by HIF, indicating that these sites operate over long genomic intervals. Analysis of HIF-binding motifs demonstrates sequence preferences outside of the core RCGTG-binding motif but does not reveal any additional absolute sequence requirements. Across the entire genome, only a small proportion of these potential binding sites are bound by HIF, although occupancy of potential sites was enhanced approximately 20-fold at normoxic DNAse1 hypersensitivity sites (irrespective of distance from promoters), suggesting that epigenetic regulation of chromatin may have an important role in defining the response to hypoxia. PMID:21447827

  13. High-Resolution Analysis of Atmospheric Mass Spectra: Identification, Resolution, Assignment of complex mass spectra

    NASA Astrophysics Data System (ADS)

    Stark, H.; Yatavelli, R. L. N.; Thompson, S.; Mazzoleni, L. R.; Kimmel, J.; Cubison, M.; Day, D. A.; Campuzano Jost, P.; Palm, B. B.; Chhabra, P. S.; Canagaratna, M. R.; Jayne, J. T.; Worsnop, D. R.; Jimenez, J. L.

    2015-12-01

    The troposphere can contain thousands of organic molecules with widely varying carbon numbers and levels of oxidation. Unraveling this complex molecular mixture gives new insights into key processes such as atmospheric processing, secondary aerosol formation, radiative properties as well as implications for human health. High-resolution time-of-flight chemical ionization mass spectrometry (HRToF-CIMS) is a powerful technique with the potential to provide many insights into this complex mix of molecules. We have developed new data analysis techniques to identify the most likely ions present in complex mass spectra in which individual peaks strongly overlap. New ancillary algorithms will also be presented to first develop a list of all possible formulas for this particular ion chemistry and then to automatically assign possible ions to the likely peak positions. Spectral simulation experiments confirm that bulk elemental properties such as oxidation state and carbon number can be reliably extracted from this method. Comparison of results from a CIMS operated during the 2011 BEACHON-RoMBAS campaign in the Colorado Rocky Mountains to electrospray-ultra-high-resolution mass spectrometry data from compounds measured in another forest in the Rockies allows a comparison of the compounds and compound classes measured by both techniques. We will also address the problem of quantifying ion signals from the organic molecule mix encountered in this study by a new method to calculate approximate sensitivities for acetate ionization chemistry to help quantifying concentrations of atmospheric compounds. We applied the above methods to a dataset from the micro-orifice volatilization impactor (MOVI)-CIMS collected during August 2011 as part of the BEACHON campaign. Calculated atmospheric bulk elemental parameters such as diurnal cycles of carbon number and oxidation state from both gas phase and aerosols from a pine forest environment will be presented and compared to data from

  14. TIARA: a database for accurate analysis of multiple personal genomes based on cross-technology

    PubMed Central

    Hong, Dongwan; Park, Sung-Soo; Ju, Young Seok; Kim, Sheehyun; Shin, Jong-Yeon; Kim, Sujung; Yu, Saet-Byeol; Lee, Won-Chul; Lee, Seungbok; Park, Hansoo; Kim, Jong-Il; Seo, Jeong-Sun

    2011-01-01

    High-throughput genomic technologies have been used to explore personal human genomes for the past few years. Although the integration of technologies is important for high-accuracy detection of personal genomic variations, no databases have been prepared to systematically archive genomes and to facilitate the comparison of personal genomic data sets prepared using a variety of experimental platforms. We describe here the Total Integrated Archive of Short-Read and Array (TIARA; http://tiara.gmi.ac.kr) database, which contains personal genomic information obtained from next generation sequencing (NGS) techniques and ultra-high-resolution comparative genomic hybridization (CGH) arrays. This database improves the accuracy of detecting personal genomic variations, such as SNPs, short indels and structural variants (SVs). At present, 36 individual genomes have been archived and may be displayed in the database. TIARA supports a user-friendly genome browser, which retrieves read-depths (RDs) and log2 ratios from NGS and CGH arrays, respectively. In addition, this database provides information on all genomic variants and the raw data, including short reads and feature-level CGH data, through anonymous file transfer protocol. More personal genomes will be archived as more individuals are analyzed by NGS or CGH array. TIARA provides a new approach to the accurate interpretation of personal genomes for genome research. PMID:21051338

  15. SIDEKICK: Genomic data driven analysis and decision-making framework

    PubMed Central

    2010-01-01

    Background Scientists striving to unlock mysteries within complex biological systems face myriad barriers in effectively integrating available information to enhance their understanding. While experimental techniques and available data sources are rapidly evolving, useful information is dispersed across a variety of sources, and sources of the same information often do not use the same format or nomenclature. To harness these expanding resources, scientists need tools that bridge nomenclature differences and allow them to integrate, organize, and evaluate the quality of information without extensive computation. Results Sidekick, a genomic data driven analysis and decision making framework, is a web-based tool that provides a user-friendly intuitive solution to the problem of information inaccessibility. Sidekick enables scientists without training in computation and data management to pursue answers to research questions like "What are the mechanisms for disease X" or "Does the set of genes associated with disease X also influence other diseases." Sidekick enables the process of combining heterogeneous data, finding and maintaining the most up-to-date data, evaluating data sources, quantifying confidence in results based on evidence, and managing the multi-step research tasks needed to answer these questions. We demonstrate Sidekick's effectiveness by showing how to accomplish a complex published analysis in a fraction of the original time with no computational effort using Sidekick. Conclusions Sidekick is an easy-to-use web-based tool that organizes and facilitates complex genomic research, allowing scientists to explore genomic relationships and formulate hypotheses without computational effort. Possible analysis steps include gene list discovery, gene-pair list discovery, various enrichments for both types of lists, and convenient list manipulation. Further, Sidekick's ability to characterize pairs of genes offers new ways to approach genomic analysis that

  16. The sequence and analysis of a Chinese pig genome

    PubMed Central

    2012-01-01

    Background The pig is an economically important food source, amounting to approximately 40% of all meat consumed worldwide. Pigs also serve as an important model organism because of their similarity to humans at the anatomical, physiological and genetic level, making them very useful for studying a variety of human diseases. A pig strain of particular interest is the miniature pig, specifically the Wuzhishan pig (WZSP), as it has been extensively inbred. Its high level of homozygosity offers increased ease for selective breeding for specific traits and a more straightforward understanding of the genetic changes that underlie its biological characteristics. WZSP also serves as a promising means for applications in surgery, tissue engineering, and xenotransplantation. Here, we report the sequencing and analysis of an inbreeding WZSP genome. Results Our results reveal some unique genomic features, including a relatively high level of homozygosity in the diploid genome, an unusual distribution of heterozygosity, an over-representation of tRNA-derived transposable elements, a small amount of porcine endogenous retrovirus, and a lack of type C retroviruses. In addition, we carried out systematic research on gene evolution, together with a detailed investigation of the counterparts of human drug target genes. Conclusion Our results provide the opportunity to more clearly define the genomic character of pig, which could enhance our ability to create more useful pig models. PMID:23587058

  17. Comparative genomic analysis of seven Mycoplasma hyosynoviae strains

    PubMed Central

    Bumgardner, Eric A; Kittichotirat, Weerayuth; Bumgarner, Roger E; Lawrence, Paulraj K

    2015-01-01

    Infection with Mycoplasma hyosynoviae can result in debilitating arthritis in pigs, particularly those aged 10 weeks or older. Strategies for controlling this pathogen are becoming increasingly important due to the rise in the number of cases of arthritis that have been attributed to infection in recent years. In order to begin to develop interventions to prevent arthritis caused by M. hyosynoviae, more information regarding the specific proteins and potential virulence factors that its genome encodes was needed. However, the genome of this emerging swine pathogen had not been sequenced previously. In this report, we present a comparative analysis of the genomes of seven strains of M. hyosynoviae isolated from different locations in North America during the years 2010 to 2013. We identified several putative virulence factors that may contribute to the ability of this pathogen to adhere to host cells. Additionally, we discovered several prophage genes present within the genomes of three strains that show significant similarity to MAV1, a phage isolated from the related species, M. arthritidis. We also identified CRISPR-Cas and type III restriction and modification systems present in two strains that may contribute to their ability to defend against phage infection. PMID:25693846

  18. Genome-wide analysis of mobile genetic element insertion sites

    PubMed Central

    Rawal, Kamal; Ramaswamy, Ram

    2011-01-01

    Mobile genetic elements (MGEs) account for a significant fraction of eukaryotic genomes and are implicated in altered gene expression and disease. We present an efficient computational protocol for MGE insertion site analysis. ELAN, the suite of tools described here uses standard techniques to identify different MGEs and their distribution on the genome. One component, DNASCANNER analyses known insertion sites of MGEs for the presence of signals that are based on a combination of local physical and chemical properties. ISF (insertion site finder) is a machine-learning tool that incorporates information derived from DNASCANNER. ISF permits classification of a given DNA sequence as a potential insertion site or not, using a support vector machine. We have studied the genomes of Homo sapiens, Mus musculus, Drosophila melanogaster and Entamoeba histolytica via a protocol whereby DNASCANNER is used to identify a common set of statistically important signals flanking the insertion sites in the various genomes. These are used in ISF for insertion site prediction, and the current accuracy of the tool is over 65%. We find similar signals at gene boundaries and splice sites. Together, these data are suggestive of a common insertion mechanism that operates in a variety of eukaryotes. PMID:21609951

  19. Pan-Genome Analysis of Brazilian Lineage A Amoebal Mimiviruses

    PubMed Central

    Assis, Felipe L.; Bajrai, Leena; Abrahao, Jonatas S.; Kroon, Erna G.; Dornas, Fabio P.; Andrade, Kétyllen R.; Boratto, Paulo V. M.; Pilotto, Mariana R.; Robert, Catherine; Benamar, Samia; La Scola, Bernard; Colson, Philippe

    2015-01-01

    Since the recent discovery of Samba virus, the first representative of the family Mimiviridae from Brazil, prospecting for mimiviruses has been conducted in different environmental conditions in Brazil. Recently, we isolated using Acanthamoeba sp. three new mimiviruses, all of lineage A of amoebal mimiviruses: Kroon virus from urban lake water; Amazonia virus from the Brazilian Amazon river; and Oyster virus from farmed oysters. The aims of this work were to sequence and analyze the genome of these new Brazilian mimiviruses (mimi-BR) and update the analysis of the Samba virus genome. The genomes of Samba virus, Amazonia virus and Oyster virus were 97%–99% similar, whereas Kroon virus had a low similarity (90%–91%) with other mimi-BR. A total of 3877 proteins encoded by mimi-BR were grouped into 974 orthologous clusters. In addition, we identified three new ORFans in the Kroon virus genome. Additional work is needed to expand our knowledge of the diversity of mimiviruses from Brazil, including if and why among amoebal mimiviruses those of lineage A predominate in the Brazilian environment. PMID:26131958

  20. Privacy-preserving GWAS analysis on federated genomic datasets

    PubMed Central

    2015-01-01

    Background The biomedical community benefits from the increasing availability of genomic data to support meaningful scientific research, e.g., Genome-Wide Association Studies (GWAS). However, high quality GWAS usually requires a large amount of samples, which can grow beyond the capability of a single institution. Federated genomic data analysis holds the promise of enabling cross-institution collaboration for effective GWAS, but it raises concerns about patient privacy and medical information confidentiality (as data are being exchanged across institutional boundaries), which becomes an inhibiting factor for the practical use. Methods We present a privacy-preserving GWAS framework on federated genomic datasets. Our method is to layer the GWAS computations on top of secure multi-party computation (MPC) systems. This approach allows two parties in a distributed system to mutually perform secure GWAS computations, but without exposing their private data outside. Results We demonstrate our technique by implementing a framework for minor allele frequency counting and χ2 statistics calculation, one of typical computations used in GWAS. For efficient prototyping, we use a state-of-the-art MPC framework, i.e., Portable Circuit Format (PCF) [1]. Our experimental results show promise in realizing both efficient and secure cross-institution GWAS computations. PMID:26733045

  1. Benchmarking undedicated cloud computing providers for analysis of genomic datasets.

    PubMed

    Yazar, Seyhan; Gooden, George E C; Mackey, David A; Hewitt, Alex W

    2014-01-01

    A major bottleneck in biological discovery is now emerging at the computational level. Cloud computing offers a dynamic means whereby small and medium-sized laboratories can rapidly adjust their computational capacity. We benchmarked two established cloud computing services, Amazon Web Services Elastic MapReduce (EMR) on Amazon EC2 instances and Google Compute Engine (GCE), using publicly available genomic datasets (E.coli CC102 strain and a Han Chinese male genome) and a standard bioinformatic pipeline on a Hadoop-based platform. Wall-clock time for complete assembly differed by 52.9% (95% CI: 27.5-78.2) for E.coli and 53.5% (95% CI: 34.4-72.6) for human genome, with GCE being more efficient than EMR. The cost of running this experiment on EMR and GCE differed significantly, with the costs on EMR being 257.3% (95% CI: 211.5-303.1) and 173.9% (95% CI: 134.6-213.1) more expensive for E.coli and human assemblies respectively. Thus, GCE was found to outperform EMR both in terms of cost and wall-clock time. Our findings confirm that cloud computing is an efficient and potentially cost-effective alternative for analysis of large genomic datasets. In addition to releasing our cost-effectiveness comparison, we present available ready-to-use scripts for establishing Hadoop instances with Ganglia monitoring on EC2 or GCE. PMID:25247298

  2. Benchmarking undedicated cloud computing providers for analysis of genomic datasets.

    PubMed

    Yazar, Seyhan; Gooden, George E C; Mackey, David A; Hewitt, Alex W

    2014-01-01

    A major bottleneck in biological discovery is now emerging at the computational level. Cloud computing offers a dynamic means whereby small and medium-sized laboratories can rapidly adjust their computational capacity. We benchmarked two established cloud computing services, Amazon Web Services Elastic MapReduce (EMR) on Amazon EC2 instances and Google Compute Engine (GCE), using publicly available genomic datasets (E.coli CC102 strain and a Han Chinese male genome) and a standard bioinformatic pipeline on a Hadoop-based platform. Wall-clock time for complete assembly differed by 52.9% (95% CI: 27.5-78.2) for E.coli and 53.5% (95% CI: 34.4-72.6) for human genome, with GCE being more efficient than EMR. The cost of running this experiment on EMR and GCE differed significantly, with the costs on EMR being 257.3% (95% CI: 211.5-303.1) and 173.9% (95% CI: 134.6-213.1) more expensive for E.coli and human assemblies respectively. Thus, GCE was found to outperform EMR both in terms of cost and wall-clock time. Our findings confirm that cloud computing is an efficient and potentially cost-effective alternative for analysis of large genomic datasets. In addition to releasing our cost-effectiveness comparison, we present available ready-to-use scripts for establishing Hadoop instances with Ganglia monitoring on EC2 or GCE.

  3. Benchmarking Undedicated Cloud Computing Providers for Analysis of Genomic Datasets

    PubMed Central

    Yazar, Seyhan; Gooden, George E. C.; Mackey, David A.; Hewitt, Alex W.

    2014-01-01

    A major bottleneck in biological discovery is now emerging at the computational level. Cloud computing offers a dynamic means whereby small and medium-sized laboratories can rapidly adjust their computational capacity. We benchmarked two established cloud computing services, Amazon Web Services Elastic MapReduce (EMR) on Amazon EC2 instances and Google Compute Engine (GCE), using publicly available genomic datasets (E.coli CC102 strain and a Han Chinese male genome) and a standard bioinformatic pipeline on a Hadoop-based platform. Wall-clock time for complete assembly differed by 52.9% (95% CI: 27.5–78.2) for E.coli and 53.5% (95% CI: 34.4–72.6) for human genome, with GCE being more efficient than EMR. The cost of running this experiment on EMR and GCE differed significantly, with the costs on EMR being 257.3% (95% CI: 211.5–303.1) and 173.9% (95% CI: 134.6–213.1) more expensive for E.coli and human assemblies respectively. Thus, GCE was found to outperform EMR both in terms of cost and wall-clock time. Our findings confirm that cloud computing is an efficient and potentially cost-effective alternative for analysis of large genomic datasets. In addition to releasing our cost-effectiveness comparison, we present available ready-to-use scripts for establishing Hadoop instances with Ganglia monitoring on EC2 or GCE. PMID:25247298

  4. Clinical pertinence metric enables hypothesis-independent genome-phenome analysis for neurologic diagnosis.

    PubMed

    Segal, Michael M; Abdellateef, Mostafa; El-Hattab, Ayman W; Hilbush, Brian S; De La Vega, Francisco M; Tromp, Gerard; Williams, Marc S; Betensky, Rebecca A; Gleeson, Joseph

    2015-06-01

    We describe an "integrated genome-phenome analysis" that combines both genomic sequence data and clinical information for genomic diagnosis. It is novel in that it uses robust diagnostic decision support and combines the clinical differential diagnosis and the genomic variants using a "pertinence" metric. This allows the analysis to be hypothesis-independent, not requiring assumptions about mode of inheritance, number of genes involved, or which clinical findings are most relevant. Using 20 genomic trios with neurologic disease, we find that pertinence scores averaging 99.9% identify the causative variant under conditions in which a genomic trio is analyzed and family-aware variant calling is done. The analysis takes seconds, and pertinence scores can be improved by clinicians adding more findings. The core conclusion is that automated genome-phenome analysis can be accurate, rapid, and efficient. We also conclude that an automated process offers a methodology for quality improvement of many components of genomic analysis.

  5. Specific Analysis of Web Camera and High Resolution Planetary Imaging

    NASA Astrophysics Data System (ADS)

    Park, Youngsik; Lee, Dongju; Jin, Ho; Han, Wonyong; Park, Jang-Hyun

    2006-12-01

    Web camera is usually used for video communication between PC, it has small sensing area, cannot using long exposure application, so that is insufficient for astronomical application. But web camera is suitable for bright planet, moon, it doesn't need long exposure time. So many amateur astronomer using web camera for planetary imaging. We used ToUcam manufactured by Phillips for planetary imaging and Registax commercial program for a video file combining. And then, we are measure a property of web camera, such as linearity, gain that is usually using for analysis of CCD performance. Because of using combine technic selected high quality image from video frame, this method can take higher resolution planetary imaging than one shot image by film, digital camera and CCD. We describe a planetary observing method and a video frame combine method.

  6. Analysis of Automated Aircraft Conflict Resolution and Weather Avoidance

    NASA Technical Reports Server (NTRS)

    Love, John F.; Chan, William N.; Lee, Chu Han

    2009-01-01

    This paper describes an analysis of using trajectory-based automation to resolve both aircraft and weather constraints for near-term air traffic management decision making. The auto resolution algorithm developed and tested at NASA-Ames to resolve aircraft to aircraft conflicts has been modified to mitigate convective weather constraints. Modifications include adding information about the size of a gap between weather constraints to the routing solution. Routes that traverse gaps that are smaller than a specific size are not used. An evaluation of the performance of the modified autoresolver to resolve both conflicts with aircraft and weather was performed. Integration with the Center-TRACON Traffic Management System was completed to evaluate the effect of weather routing on schedule delays.

  7. Intact MicroRNA Analysis Using High Resolution Mass Spectrometry

    NASA Astrophysics Data System (ADS)

    Kullolli, Majlinda; Knouf, Emily; Arampatzidou, Maria; Tewari, Muneesh; Pitteri, Sharon J.

    2014-01-01

    MicroRNAs (miRNAs) are small single-stranded non-coding RNAs that post-transcriptionally regulate gene expression, and play key roles in the regulation of a variety of cellular processes and in disease. New tools to analyze miRNAs will add understanding of the physiological origins and biological functions of this class of molecules. In this study, we investigate the utility of high resolution mass spectrometry for the analysis of miRNAs through proof-of-concept experiments. We demonstrate the ability of mass spectrometry to resolve and separate miRNAs and corresponding 3' variants in mixtures. The mass accuracy of the monoisotopic deprotonated peaks from various miRNAs is in the low ppm range. We compare fragmentation of miRNA by collision-induced dissociation (CID) and by higher-energy collisional dissociation (HCD) which yields similar sequence coverage from both methods but additional fragmentation by HCD versus CID. We measure the linear dynamic range, limit of detection, and limit of quantitation of miRNA loaded onto a C18 column. Lastly, we explore the use of data-dependent acquisition of MS/MS spectra of miRNA during online LC-MS and demonstrate that multiple charge states can be fragmented, yielding nearly full sequence coverage of miRNA on a chromatographic time scale. We conclude that high resolution mass spectrometry allows the separation and measurement of miRNAs in mixtures and a standard LC-MS setup can be adapted for online analysis of these molecules.

  8. A Comparative Analysis of Mitochondrial Genomes in Eustigmatophyte Algae

    PubMed Central

    Ševčíková, Tereza; Klimeš, Vladimír; Zbránková, Veronika; Strnad, Hynek; Hroudová, Miluše; Vlček, Čestmír; Eliáš, Marek

    2016-01-01

    Eustigmatophyceae (Ochrophyta, Stramenopiles) is a small algal group with species of the genus Nannochloropsis being its best studied representatives. Nuclear and organellar genomes have been recently sequenced for several Nannochloropsis spp., but phylogenetically wider genomic studies are missing for eustigmatophytes. We sequenced mitochondrial genomes (mitogenomes) of three species representing most major eustigmatophyte lineages, Monodopsis sp. MarTras21, Vischeria sp. CAUP Q 202 and Trachydiscus minutus, and carried out their comparative analysis in the context of available data from Nannochloropsis and other stramenopiles, revealing a number of noticeable findings. First, mitogenomes of most eustigmatophytes are highly collinear and similar in the gene content, but extensive rearrangements and loss of three otherwise ubiquitous genes happened in the Vischeria lineage; this correlates with an accelerated evolution of mitochondrial gene sequences in this lineage. Second, eustigmatophytes appear to be the only ochrophyte group with the Atp1 protein encoded by the mitogenome. Third, eustigmatophyte mitogenomes uniquely share a truncated nad11 gene encoding only the C-terminal part of the Nad11 protein, while the N-terminal part is encoded by a separate gene in the nuclear genome. Fourth, UGA as a termination codon and the cognate release factor mRF2 were lost from mitochondria independently by the Nannochloropsis and T. minutus lineages. Finally, the rps3 gene in the mitogenome of Vischeria sp. is interrupted by the UAG codon, but the genome includes a gene for an unusual tRNA with an extended anticodon loop that we speculate may serve as a suppressor tRNA to properly decode the rps3 gene. PMID:26872774

  9. A Comparative Analysis of Mitochondrial Genomes in Eustigmatophyte Algae.

    PubMed

    Ševčíková, Tereza; Klimeš, Vladimír; Zbránková, Veronika; Strnad, Hynek; Hroudová, Miluše; Vlček, Čestmír; Eliáš, Marek

    2016-03-01

    Eustigmatophyceae (Ochrophyta, Stramenopiles) is a small algal group with species of the genus Nannochloropsis being its best studied representatives. Nuclear and organellar genomes have been recently sequenced for several Nannochloropsis spp., but phylogenetically wider genomic studies are missing for eustigmatophytes. We sequenced mitochondrial genomes (mitogenomes) of three species representing most major eustigmatophyte lineages, Monodopsis sp. MarTras21, Vischeria sp. CAUP Q 202 and Trachydiscus minutus, and carried out their comparative analysis in the context of available data from Nannochloropsis and other stramenopiles, revealing a number of noticeable findings. First, mitogenomes of most eustigmatophytes are highly collinear and similar in the gene content, but extensive rearrangements and loss of three otherwise ubiquitous genes happened in the Vischeria lineage; this correlates with an accelerated evolution of mitochondrial gene sequences in this lineage. Second, eustigmatophytes appear to be the only ochrophyte group with the Atp1 protein encoded by the mitogenome. Third, eustigmatophyte mitogenomes uniquely share a truncated nad11 gene encoding only the C-terminal part of the Nad11 protein, while the N-terminal part is encoded by a separate gene in the nuclear genome. Fourth, UGA as a termination codon and the cognate release factor mRF2 were lost from mitochondria independently by the Nannochloropsis and T. minutus lineages. Finally, the rps3 gene in the mitogenome of Vischeria sp. is interrupted by the UAG codon, but the genome includes a gene for an unusual tRNA with an extended anticodon loop that we speculate may serve as a suppressor tRNA to properly decode the rps3 gene. PMID:26872774

  10. Comparative analysis of Acinetobacters: three genomes for three lifestyles.

    PubMed

    Vallenet, David; Nordmann, Patrice; Barbe, Valérie; Poirel, Laurent; Mangenot, Sophie; Bataille, Elodie; Dossat, Carole; Gas, Shahinaz; Kreimeyer, Annett; Lenoble, Patricia; Oztas, Sophie; Poulain, Julie; Segurens, Béatrice; Robert, Catherine; Abergel, Chantal; Claverie, Jean-Michel; Raoult, Didier; Médigue, Claudine; Weissenbach, Jean; Cruveiller, Stéphane

    2008-03-19

    Acinetobacter baumannii is the source of numerous nosocomial infections in humans and therefore deserves close attention as multidrug or even pandrug resistant strains are increasingly being identified worldwide. Here we report the comparison of two newly sequenced genomes of A. baumannii. The human isolate A. baumannii AYE is multidrug resistant whereas strain SDF, which was isolated from body lice, is antibiotic susceptible. As reference for comparison in this analysis, the genome of the soil-living bacterium A. baylyi strain ADP1 was used. The most interesting dissimilarities we observed were that i) whereas strain AYE and A. baylyi genomes harbored very few Insertion Sequence elements which could promote expression of downstream genes, strain SDF sequence contains several hundred of them that have played a crucial role in its genome reduction (gene disruptions and simple DNA loss); ii) strain SDF has low catabolic capacities compared to strain AYE. Interestingly, the latter has even higher catabolic capacities than A. baylyi which has already been reported as a very nutritionally versatile organism. This metabolic performance could explain the persistence of A. baumannii nosocomial strains in environments where nutrients are scarce; iii) several processes known to play a key role during host infection (biofilm formation, iron uptake, quorum sensing, virulence factors) were either different or absent, the best example of which is iron uptake. Indeed, strain AYE and A. baylyi use siderophore-based systems to scavenge iron from the environment whereas strain SDF uses an alternate system similar to the Haem Acquisition System (HAS). Taken together, all these observations suggest that the genome contents of the 3 Acinetobacters compared are partly shaped by life in distinct ecological niches: human (and more largely hospital environment), louse, soil.

  11. Comparative analysis of essential genes in prokaryotic genomic islands.

    PubMed

    Zhang, Xi; Peng, Chong; Zhang, Ge; Gao, Feng

    2015-07-30

    Essential genes are thought to encode proteins that carry out the basic functions to sustain a cellular life, and genomic islands (GIs) usually contain clusters of horizontally transferred genes. It has been assumed that essential genes are not likely to be located in GIs, but systematical analysis of essential genes in GIs has not been explored before. Here, we have analyzed the essential genes in 28 prokaryotes by statistical method and reached a conclusion that essential genes in GIs are significantly fewer than those outside GIs. The function of 362 essential genes found in GIs has been explored further by BLAST against the Virulence Factor Database (VFDB) and the phage/prophage sequence database of PHAge Search Tool (PHAST). Consequently, 64 and 60 eligible essential genes are found to share the sequence similarity with the virulence factors and phage/prophages-related genes, respectively. Meanwhile, we find several toxin-related proteins and repressors encoded by these essential genes in GIs. The comparative analysis of essential genes in genomic islands will not only shed new light on the development of the prediction algorithm of essential genes, but also give a clue to detect the functionality of essential genes in genomic islands.

  12. Whole-genome sequence-based analysis of thyroid function.

    PubMed

    Taylor, Peter N; Porcu, Eleonora; Chew, Shelby; Campbell, Purdey J; Traglia, Michela; Brown, Suzanne J; Mullin, Benjamin H; Shihab, Hashem A; Min, Josine; Walter, Klaudia; Memari, Yasin; Huang, Jie; Barnes, Michael R; Beilby, John P; Charoen, Pimphen; Danecek, Petr; Dudbridge, Frank; Forgetta, Vincenzo; Greenwood, Celia; Grundberg, Elin; Johnson, Andrew D; Hui, Jennie; Lim, Ee M; McCarthy, Shane; Muddyman, Dawn; Panicker, Vijay; Perry, John R B; Bell, Jordana T; Yuan, Wei; Relton, Caroline; Gaunt, Tom; Schlessinger, David; Abecasis, Goncalo; Cucca, Francesco; Surdulescu, Gabriela L; Woltersdorf, Wolfram; Zeggini, Eleftheria; Zheng, Hou-Feng; Toniolo, Daniela; Dayan, Colin M; Naitza, Silvia; Walsh, John P; Spector, Tim; Davey Smith, George; Durbin, Richard; Richards, J Brent; Sanna, Serena; Soranzo, Nicole; Timpson, Nicholas J; Wilson, Scott G

    2015-01-01

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N=2,287). Using additional whole-genome sequence and deeply imputed data sets, we report meta-analysis results for common variants (MAF≥1%) associated with TSH and FT4 (N=16,335). For TSH, we identify a novel variant in SYN2 (MAF=23.5%, P=6.15 × 10(-9)) and a new independent variant in PDE8B (MAF=10.4%, P=5.94 × 10(-14)). For FT4, we report a low-frequency variant near B4GALT6/SLC25A52 (MAF=3.2%, P=1.27 × 10(-9)) tagging a rare TTR variant (MAF=0.4%, P=2.14 × 10(-11)). All common variants explain ≥20% of the variance in TSH and FT4. Analysis of rare variants (MAF<1%) using sequence kernel association testing reveals a novel association with FT4 in NRG1. Our results demonstrate that increased coverage in whole-genome sequence association studies identifies novel variants associated with thyroid function. PMID:25743335

  13. Whole-genome sequence-based analysis of thyroid function.

    PubMed

    Taylor, Peter N; Porcu, Eleonora; Chew, Shelby; Campbell, Purdey J; Traglia, Michela; Brown, Suzanne J; Mullin, Benjamin H; Shihab, Hashem A; Min, Josine; Walter, Klaudia; Memari, Yasin; Huang, Jie; Barnes, Michael R; Beilby, John P; Charoen, Pimphen; Danecek, Petr; Dudbridge, Frank; Forgetta, Vincenzo; Greenwood, Celia; Grundberg, Elin; Johnson, Andrew D; Hui, Jennie; Lim, Ee M; McCarthy, Shane; Muddyman, Dawn; Panicker, Vijay; Perry, John R B; Bell, Jordana T; Yuan, Wei; Relton, Caroline; Gaunt, Tom; Schlessinger, David; Abecasis, Goncalo; Cucca, Francesco; Surdulescu, Gabriela L; Woltersdorf, Wolfram; Zeggini, Eleftheria; Zheng, Hou-Feng; Toniolo, Daniela; Dayan, Colin M; Naitza, Silvia; Walsh, John P; Spector, Tim; Davey Smith, George; Durbin, Richard; Richards, J Brent; Sanna, Serena; Soranzo, Nicole; Timpson, Nicholas J; Wilson, Scott G

    2015-03-06

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N=2,287). Using additional whole-genome sequence and deeply imputed data sets, we report meta-analysis results for common variants (MAF≥1%) associated with TSH and FT4 (N=16,335). For TSH, we identify a novel variant in SYN2 (MAF=23.5%, P=6.15 × 10(-9)) and a new independent variant in PDE8B (MAF=10.4%, P=5.94 × 10(-14)). For FT4, we report a low-frequency variant near B4GALT6/SLC25A52 (MAF=3.2%, P=1.27 × 10(-9)) tagging a rare TTR variant (MAF=0.4%, P=2.14 × 10(-11)). All common variants explain ≥20% of the variance in TSH and FT4. Analysis of rare variants (MAF<1%) using sequence kernel association testing reveals a novel association with FT4 in NRG1. Our results demonstrate that increased coverage in whole-genome sequence association studies identifies novel variants associated with thyroid function.

  14. Genomic Analysis of the BMP Family in Glioblastomas.

    PubMed

    Hover, Laura D; Abel, Ty W; Owens, Philip

    2015-01-01

    Glioblastoma multiforme (GBM) is a grade IV glioma with a median survival of 15 months. Recently, bone morphogenetic protein (BMP) signaling has been shown to promote survival in xenograft murine models. To gain a better understanding of the role of BMP signaling in human GBMs, we examined the genomic alterations of 90 genes associated with BMP signaling in GBM patient samples. We completed this analysis using publically available datasets compiled through Te Cancer Genome Atlas and the Glioma Molecular Diagnostic Initiative. Here we show how mRNA expression is altered in GBM samples and how that is associated with patient survival, highlighting both known and novel associations between BMP signaling and GBM biology.

  15. [Cancer Genome Atlas Pan-cancer Analysis Project].

    PubMed

    Zhang, Kun; Wang, Hong

    2015-04-01

    Cancer can exhibit different forms depending on the site of origin, cell types, the different forms of genetic mutations which also affect cancer therapeutic effect. Although many genes have been demonstrated to change a direct result of the change in phenotype, however, many cancers lineage complex molecular mechanisms are still not fully elucidated. Therefore, The Cancer Genome Atlas (TCGA) Research Network analyzed a large human tumors, in order to find the molecular changes in DNA, RNA, protein and epigenetic level, The results contain a wealth of data provides us with an opportunity for common, personality and new ideas throughout the cancer lineages form a whole description. Pan-cancer genome program first compares the 12 kinds of cancer types. Analysis of different tumor molecular changes and their functions, will tell us how effective treatment method is applied to a similar phenotype of the tumor.

  16. Genomic analysis of Skermanella stibiiresistens type strain SB22T

    PubMed Central

    Zhu, Wentao; Huang, Jing; Li, Mingshun; Li, Xiangyang; Wang, Gejiao

    2014-01-01

    Members of genus Skermanella were described as Gram-negative, motile, aerobic, rod-shaped, obligate-heterotrophic bacteria and unable to fix nitrogen. In this study, the genome sequence of Skermanella stibiiresistens SB22T is reported. Phylogenetic analysis using core proteins confirmed the phylogenetic assignment based on 16S rRNA gene sequences. Strain SB22T has all the proteins for complete glycolysis, tricarboxylic acid cycle and pentose phosphate pathway. The RuBisCO encoding genes cbbL1S1 and nitrogenase delta subunit gene anfG are absent, consistent with its inability to fix carbon and nitrogen, respectively. In addition, the genome possesses a series of flagellar assembly and chemotaxis genes to ensure its motility. PMID:25197493

  17. Functional genomic analysis of the Drosophila immune response.

    PubMed

    Valanne, Susanna

    2014-01-01

    Drosophila melanogaster has been widely used as a model organism for over a century now, and also as an immunological research model for over 20 years. With the emergence of RNA interference (RNAi) in Drosophila as a robust tool to silence genes of interest, large-scale or genome-wide functional analysis has become a popular way of studying the Drosophila immune response in cell culture. Drosophila immunity is composed of cellular and humoral immunity mechanisms, and especially the systemic, humoral response pathways have been extensively dissected using the functional genomic approach. Although most components of the main immune pathways had already been found using traditional genetic screening techniques, important findings including pathway components, positive and negative regulators and modifiers have been made with RNAi screening. Additionally, RNAi screening has produced new information on host-pathogen interactions related to the pathogenesis of many microbial species. PMID:23707784

  18. High-resolution genetic map for understanding the effect of genome-wide recombination rate, selection sweep and linkage disequilibrium on nucleotide diversity in watermelon

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genotyping by sequencing (GBS) technology was used to identify a set of 9,933 single nucleotide polymorphism (SNP) markers for constructing a high-resolution genetic map of 1,087 cM for watermelon. The genome-wide variation of recombination rate (GWRR) across the map was evaluated and a positive co...

  19. Genome-wide analysis of alternative splicing during human heart development

    PubMed Central

    Wang, He; Chen, Yanmei; Li, Xinzhong; Chen, Guojun; Zhong, Lintao; Chen, Gangbing; Liao, Yulin; Liao, Wangjun; Bin, Jianping

    2016-01-01

    Alternative splicing (AS) drives determinative changes during mouse heart development. Recent high-throughput technological advancements have facilitated genome-wide AS, while its analysis in human foetal heart transition to the adult stage has not been reported. Here, we present a high-resolution global analysis of AS transitions between human foetal and adult hearts. RNA-sequencing data showed extensive AS transitions occurred between human foetal and adult hearts, and AS events occurred more frequently in protein-coding genes than in long non-coding RNA (lncRNA). A significant difference of AS patterns was found between foetal and adult hearts. The predicted difference in AS events was further confirmed using quantitative reverse transcription-polymerase chain reaction analysis of human heart samples. Functional foetal-specific AS event analysis showed enrichment associated with cell proliferation-related pathways including cell cycle, whereas adult-specific AS events were associated with protein synthesis. Furthermore, 42.6% of foetal-specific AS events showed significant changes in gene expression levels between foetal and adult hearts. Genes exhibiting both foetal-specific AS and differential expression were highly enriched in cell cycle-associated functions. In conclusion, we provided a genome-wide profiling of AS transitions between foetal and adult hearts and proposed that AS transitions and deferential gene expression may play determinative roles in human heart development. PMID:27752099

  20. Metabolomic Analysis of Rat Brain by High Resolution Nuclear Magnetic Resonance Spectroscopy of Tissue Extracts

    PubMed Central

    Lutz, Norbert W.; Béraud, Evelyne; Cozzone, Patrick J.

    2014-01-01

    Studies of gene expression on the RNA and protein levels have long been used to explore biological processes underlying disease. More recently, genomics and proteomics have been complemented by comprehensive quantitative analysis of the metabolite pool present in biological systems. This strategy, termed metabolomics, strives to provide a global characterization of the small-molecule complement involved in metabolism. While the genome and the proteome define the tasks cells can perform, the metabolome is part of the actual phenotype. Among the methods currently used in metabolomics, spectroscopic techniques are of special interest because they allow one to simultaneously analyze a large number of metabolites without prior selection for specific biochemical pathways, thus enabling a broad unbiased approach. Here, an optimized experimental protocol for metabolomic analysis by high-resolution NMR spectroscopy is presented, which is the method of choice for efficient quantification of tissue metabolites. Important strengths of this method are (i) the use of crude extracts, without the need to purify the sample and/or separate metabolites; (ii) the intrinsically quantitative nature of NMR, permitting quantitation of all metabolites represented by an NMR spectrum with one reference compound only; and (iii) the nondestructive nature of NMR enabling repeated use of the same sample for multiple measurements. The dynamic range of metabolite concentrations that can be covered is considerable due to the linear response of NMR signals, although metabolites occurring at extremely low concentrations may be difficult to detect. For the least abundant compounds, the highly sensitive mass spectrometry method may be advantageous although this technique requires more intricate sample preparation and quantification procedures than NMR spectroscopy. We present here an NMR protocol adjusted to rat brain analysis; however, the same protocol can be applied to other tissues with minor

  1. Device for high spatial resolution chemical analysis of a sample and method of high spatial resolution chemical analysis

    DOEpatents

    Van Berkel, Gary J.

    2015-10-06

    A system and method for analyzing a chemical composition of a specimen are described. The system can include at least one pin; a sampling device configured to contact a liquid with a specimen on the at least one pin to form a testing solution; and a stepper mechanism configured to move the at least one pin and the sampling device relative to one another. The system can also include an analytical instrument for determining a chemical composition of the specimen from the testing solution. In particular, the systems and methods described herein enable chemical analysis of specimens, such as tissue, to be evaluated in a manner that the spatial-resolution is limited by the size of the pins used to obtain tissue samples, not the size of the sampling device used to solubilize the samples coupled to the pins.

  2. Analysis of genomic diversity among photosynthetic stem-nodulating rhizobial strains from northeast Argentina.

    PubMed

    Montecchia, Marcela S; Kerber, Norma L; Pucheu, Norma L; Perticari, Alejandro; García, Augusto F

    2002-10-01

    The genomic diversity among photosynthetic rhizobia from northeast Argentina was assessed. Forty six isolates obtained from naturally occurring stem and root nodules of Aeschynomene rudis plants were analyzed by three molecular typing methods with different levels of taxonomic resolution: repetitive sequence-based PCR (rep-PCR) genomic fingerprinting with BOX and REP primers, amplified 16S rDNA restriction analysis (ARDRA), and 16S-23S rDNA intergenic spacer-restriction fragment length polymorphism (IGS-RFLP) analysis. The in vivo absorption spectra of membranes of strains were similar in the near infrared region with peaks at 870 and 800 nm revealing the presence of light harvesting complex I, bacteriochlorophyll-binding polypeptides (LHI-Bchl complex). After extraction with acetone-methanol the spectra differed in the visible part displaying peaks belonging to canthaxanthin or spirilloxanthin as the main carotenoid complement. The genotypic characterization by rep-PCR revealed a high level of genomic diversity among the isolates and almost all the photosynthetic ones have identical ARDRA patterns and fell into one cluster different from Bradyrhizobium japonicum and Bradyrhizobium elkanii. In the combined analysis of ARDRA and rep-PCR fingerprints, 7 clusters were found including most of the isolates. Five of those contained only photosynthetic isolates; all canthaxanthin-containing strains grouped in one cluster, most of the other photosynthetic isolates were grouped in a second large cluster, while the remaining three clusters contained a few strains. The other two clusters comprising reference strains of B. japonicum and B. elkanii, respectively. The IGS-RFLP analysis produced similar clustering for almost all the strains. The 16S rRNA gene sequence of one representative isolate was determined and the DNA sequence analysis confirmed the position of photosynthetic rhizobia in a distinct phylogenetic group within the Bradyrhizobium rDNA cluster.

  3. High density linkage mapping of genomic and transcriptomic SNPs for synteny analysis and anchoring the genome sequence of chickpea

    PubMed Central

    Gaur, Rashmi; Jeena, Ganga; Shah, Niraj; Gupta, Shefali; Pradhan, Seema; Tyagi, Akhilesh K; Jain, Mukesh; Chattopadhyay, Debasis; Bhatia, Sabhyata

    2015-01-01

    This study presents genome-wide discovery of SNPs through next generation sequencing of the genome of Cicer reticulatum. Mapping of the C. reticulatum sequenced reads onto the draft genome assembly of C. arietinum (desi chickpea) resulted in identification of 842,104 genomic SNPs which were utilized along with an additional 36,446 genic SNPs identified from transcriptome sequences of the aforementioned varieties. Two new chickpea Oligo Pool All (OPAs) each having 3,072 SNPs were designed and utilized for SNP genotyping of 129 Recombinant Inbred Lines (RILs). Using Illumina GoldenGate Technology genotyping data of 5,041 SNPs were generated and combined with the 1,673 marker data from previously published studies, to generate a high resolution linkage map. The map comprised of 6698 markers distributed on eight linkage groups spanning 1083.93 cM with an average inter-marker distance of 0.16 cM. Utility of the present map was demonstrated for improving the anchoring of the earlier reported draft genome sequence of desi chickpea by ~30% and that of kabuli chickpea by 18%. The genetic map reported in this study represents the most dense linkage map of chickpea , with the potential to facilitate efficient anchoring of the draft genome sequences of desi as well as kabuli chickpea varieties. PMID:26303721

  4. High density linkage mapping of genomic and transcriptomic SNPs for synteny analysis and anchoring the genome sequence of chickpea.

    PubMed

    Gaur, Rashmi; Jeena, Ganga; Shah, Niraj; Gupta, Shefali; Pradhan, Seema; Tyagi, Akhilesh K; Jain, Mukesh; Chattopadhyay, Debasis; Bhatia, Sabhyata

    2015-01-01

    This study presents genome-wide discovery of SNPs through next generation sequencing of the genome of Cicer reticulatum. Mapping of the C. reticulatum sequenced reads onto the draft genome assembly of C. arietinum (desi chickpea) resulted in identification of 842,104 genomic SNPs which were utilized along with an additional 36,446 genic SNPs identified from transcriptome sequences of the aforementioned varieties. Two new chickpea Oligo Pool All (OPAs) each having 3,072 SNPs were designed and utilized for SNP genotyping of 129 Recombinant Inbred Lines (RILs). Using Illumina GoldenGate Technology genotyping data of 5,041 SNPs were generated and combined with the 1,673 marker data from previously published studies, to generate a high resolution linkage map. The map comprised of 6698 markers distributed on eight linkage groups spanning 1083.93 cM with an average inter-marker distance of 0.16 cM. Utility of the present map was demonstrated for improving the anchoring of the earlier reported draft genome sequence of desi chickpea by ~30% and that of kabuli chickpea by 18%. The genetic map reported in this study represents the most dense linkage map of chickpea , with the potential to facilitate efficient anchoring of the draft genome sequences of desi as well as kabuli chickpea varieties.

  5. Association between chromosomal aberration of COX8C and tethered spinal cord syndrome: array-based comparative genomic hybridization analysis

    PubMed Central

    Zhao, Qiu-jiong; Bai, Shao-cong; Cheng, Cheng; Tao, Ben-zhang; Wang, Le-kai; Liang, Shuang; Yin, Ling; Hang, Xing-yi; Shang, Ai-jia

    2016-01-01

    Copy number variations have been found in patients with neural tube abnormalities. In this study, we performed genome-wide screening using high-resolution array-based comparative genomic hybridization in three children with tethered spinal cord syndrome and two healthy parents. Of eight copy number variations, four were non-polymorphic. These non-polymorphic copy number variations were associated with Angelman and Prader-Willi syndromes, and microcephaly. Gene function enrichment analysis revealed that COX8C, a gene associated with metabolic disorders of the nervous system, was located in the copy number variation region of Patient 1. Our results indicate that array-based comparative genomic hybridization can be used to diagnose tethered spinal cord syndrome. Our results may help determine the pathogenesis of tethered spinal cord syndrome and prevent occurrence of this disease.

  6. Association between chromosomal aberration of COX8C and tethered spinal cord syndrome: array-based comparative genomic hybridization analysis

    PubMed Central

    Zhao, Qiu-jiong; Bai, Shao-cong; Cheng, Cheng; Tao, Ben-zhang; Wang, Le-kai; Liang, Shuang; Yin, Ling; Hang, Xing-yi; Shang, Ai-jia

    2016-01-01

    Copy number variations have been found in patients with neural tube abnormalities. In this study, we performed genome-wide screening using high-resolution array-based comparative genomic hybridization in three children with tethered spinal cord syndrome and two healthy parents. Of eight copy number variations, four were non-polymorphic. These non-polymorphic copy number variations were associated with Angelman and Prader-Willi syndromes, and microcephaly. Gene function enrichment analysis revealed that COX8C, a gene associated with metabolic disorders of the nervous system, was located in the copy number variation region of Patient 1. Our results indicate that array-based comparative genomic hybridization can be used to diagnose tethered spinal cord syndrome. Our results may help determine the pathogenesis of tethered spinal cord syndrome and prevent occurrence of this disease. PMID:27651783

  7. Association between chromosomal aberration of COX8C and tethered spinal cord syndrome: array-based comparative genomic hybridization analysis.

    PubMed

    Zhao, Qiu-Jiong; Bai, Shao-Cong; Cheng, Cheng; Tao, Ben-Zhang; Wang, Le-Kai; Liang, Shuang; Yin, Ling; Hang, Xing-Yi; Shang, Ai-Jia

    2016-08-01

    Copy number variations have been found in patients with neural tube abnormalities. In this study, we performed genome-wide screening using high-resolution array-based comparative genomic hybridization in three children with tethered spinal cord syndrome and two healthy parents. Of eight copy number variations, four were non-polymorphic. These non-polymorphic copy number variations were associated with Angelman and Prader-Willi syndromes, and microcephaly. Gene function enrichment analysis revealed that COX8C, a gene associated with metabolic disorders of the nervous system, was located in the copy number variation region of Patient 1. Our results indicate that array-based comparative genomic hybridization can be used to diagnose tethered spinal cord syndrome. Our results may help determine the pathogenesis of tethered spinal cord syndrome and prevent occurrence of this disease. PMID:27651783

  8. High-resolution physical mapping of a 250-kb region of human chromosome 11q24 by genomic sequence sampling (GSS)

    SciTech Connect

    Selleri, L.; Smith, M.W.; Holmsen, A.L.

    1995-04-10

    A physical map of the region of human chromosome 11q24 containing the FLI1 gene, disrupted by the t(11;22) translocation in Ewing sarcoma and primitive neuroectodermal tumors, was analyzed by genomic sequence sampling. Using a 4- to 5-fold coverage chromosome 11-specific library, 22 region-specific cosmid clones were identified by phenol emulsion reassociation hybridization, with a 245-kb yeast artificial chromosome clone containing the FLI1 gene, and by directed {open_quotes}walking{close_quotes} techniques. Cosmid contigs were constructed by individual clone fingerprinting using restriction enzyme digestion and assembly with the Genome Reconstruction and AsseMbly (GRAM) computer algorithm. The relative orientation and spacing of cosmid contigs with respect to the chromosome were determined by the structural analysis of cosmid clones and by direct visual in situ hybridization mapping. Each cosmid clone in the contig was subjected to {open_quotes}one-pass{close_quotes} end sequencing, and the resulting ordered sequence fragments represent {approximately}5% of the complete DNA sequence, making the entire region accessible by PCR amplification. The sequence samples were analyzed for putative exons, repetitive DNAs, and simple sequence repeats using a variety of computer algorithms. Based upon the computer predictions, Southern and Northern blot experiments led to the independent identification and localization of the FLI1 gene as well as a previously unknown gene located in this region of chromosome 11q24. This approach to high-resolution physical analysis of human chromosomes allows the assembly of detailed sequence-based maps. 62 refs., 7 figs.

  9. Integrated genome-scale analysis of the transcriptional regulatory landscape in a blood stem/progenitor cell model.

    PubMed

    Wilson, Nicola K; Schoenfelder, Stefan; Hannah, Rebecca; Sánchez Castillo, Manuel; Schütte, Judith; Ladopoulos, Vasileios; Mitchelmore, Joanna; Goode, Debbie K; Calero-Nieto, Fernando J; Moignard, Victoria; Wilkinson, Adam C; Jimenez-Madrid, Isabel; Kinston, Sarah; Spivakov, Mikhail; Fraser, Peter; Göttgens, Berthold

    2016-03-31

    Comprehensive study of transcriptional control processes will be required to enhance our understanding of both normal and malignant hematopoiesis. Modern sequencing technologies have revolutionized our ability to generate genome-scale expression and histone modification profiles, transcription factor (TF)-binding maps, and also comprehensive chromatin-looping information. Many of these technologies, however, require large numbers of cells, and therefore cannot be applied to rare hematopoietic stem/progenitor cell (HSPC) populations. The stem cell factor-dependent multipotent progenitor cell line HPC-7 represents a well-recognized cell line model for HSPCs. Here we report genome-wide maps for 17 TFs, 3 histone modifications, DNase I hypersensitive sites, and high-resolution promoter-enhancer interactomes in HPC-7 cells. Integrated analysis of these complementary data sets revealed TF occupancy patterns of genomic regions involved in promoter-anchored loops. Moreover, preferential associations between pairs of TFs bound at either ends of chromatin loops led to the identification of 4 previously unrecognized protein-protein interactions between key blood stem cell regulators. All HPC-7 data sets are freely available both through standard repositories and a user-friendly Web interface. Together with previously generated genome-wide data sets, this study integrates HPC-7 data into a genomic resource on par with ENCODE tier 1 cell lines and, importantly, is the only current model with comprehensive genome-scale data that is relevant to HSPC biology.

  10. A High-Resolution SNP Array-Based Linkage Map Anchors a New Domestic Cat Draft Genome Assembly and Provides Detailed Patterns of Recombination.

    PubMed

    Li, Gang; Hillier, LaDeana W; Grahn, Robert A; Zimin, Aleksey V; David, Victor A; Menotti-Raymond, Marilyn; Middleton, Rondo; Hannah, Steven; Hendrickson, Sher; Makunin, Alex; O'Brien, Stephen J; Minx, Pat; Wilson, Richard K; Lyons, Leslie A; Warren, Wesley C; Murphy, William J

    2016-01-01

    High-resolution genetic and physical maps are invaluable tools for building accurate genome assemblies, and interpreting results of genome-wide association studies (GWAS). Previous genetic and physical maps anchored good quality draft assemblies of the domestic cat genome, enabling the discovery of numerous genes underlying hereditary disease and phenotypes of interest to the biomedical science and breeding communities. However, these maps lacked sufficient marker density to order thousands of shorter scaffolds in earlier assemblies, which instead relied heavily on comparative mapping with related species. A high-resolution map would aid in validating and ordering chromosome scaffolds from existing and new genome assemblies. Here, we describe a high-resolution genetic linkage map of the domestic cat genome based on genotyping 453 domestic cats from several multi-generational pedigrees on the Illumina 63K SNP array. The final maps include 58,055 SNP markers placed relative to 6637 markers with unique positions, distributed across all autosomes and the X chromosome. Our final sex-averaged maps span a total autosomal length of 4464 cM, the longest described linkage map for any mammal, confirming length estimates from a previous microsatellite-based map. The linkage map was used to order and orient the scaffolds from a substantially more contiguous domestic cat genome assembly (Felis catus v8.0), which incorporated ∼20 × coverage of Illumina fragment reads. The new genome assembly shows substantial improvements in contiguity, with a nearly fourfold increase in N50 scaffold size to 18 Mb. We use this map to report probable structural errors in previous maps and assemblies, and to describe features of the recombination landscape, including a massive (∼50 Mb) recombination desert (of virtually zero recombination) on the X chromosome that parallels a similar desert on the porcine X chromosome in both size and physical location. PMID:27172201

  11. A High-Resolution SNP Array-Based Linkage Map Anchors a New Domestic Cat Draft Genome Assembly and Provides Detailed Patterns of Recombination.

    PubMed

    Li, Gang; Hillier, LaDeana W; Grahn, Robert A; Zimin, Aleksey V; David, Victor A; Menotti-Raymond, Marilyn; Middleton, Rondo; Hannah, Steven; Hendrickson, Sher; Makunin, Alex; O'Brien, Stephen J; Minx, Pat; Wilson, Richard K; Lyons, Leslie A; Warren, Wesley C; Murphy, William J

    2016-06-01

    High-resolution genetic and physical maps are invaluable tools for building accurate genome assemblies, and interpreting results of genome-wide association studies (GWAS). Previous genetic and physical maps anchored good quality draft assemblies of the domestic cat genome, enabling the discovery of numerous genes underlying hereditary disease and phenotypes of interest to the biomedical science and breeding communities. However, these maps lacked sufficient marker density to order thousands of shorter scaffolds in earlier assemblies, which instead relied heavily on comparative mapping with related species. A high-resolution map would aid in validating and ordering chromosome scaffolds from existing and new genome assemblies. Here, we describe a high-resolution genetic linkage map of the domestic cat genome based on genotyping 453 domestic cats from several multi-generational pedigrees on the Illumina 63K SNP array. The final maps include 58,055 SNP markers placed relative to 6637 markers with unique positions, distributed across all autosomes and the X chromosome. Our final sex-averaged maps span a total autosomal length of 4464 cM, the longest described linkage map for any mammal, confirming length estimates from a previous microsatellite-based map. The linkage map was used to order and orient the scaffolds from a substantially more contiguous domestic cat genome assembly (Felis catus v8.0), which incorporated ∼20 × coverage of Illumina fragment reads. The new genome assembly shows substantial improvements in contiguity, with a nearly fourfold increase in N50 scaffold size to 18 Mb. We use this map to report probable structural errors in previous maps and assemblies, and to describe features of the recombination landscape, including a massive (∼50 Mb) recombination desert (of virtually zero recombination) on the X chromosome that parallels a similar desert on the porcine X chromosome in both size and physical location.

  12. A High-Resolution SNP Array-Based Linkage Map Anchors a New Domestic Cat Draft Genome Assembly and Provides Detailed Patterns of Recombination

    PubMed Central

    Li, Gang; Hillier, LaDeana W.; Grahn, Robert A.; Zimin, Aleksey V.; David, Victor A.; Menotti-Raymond, Marilyn; Middleton, Rondo; Hannah, Steven; Hendrickson, Sher; Makunin, Alex; O’Brien, Stephen J.; Minx, Pat; Wilson, Richard K.; Lyons, Leslie A.; Warren, Wesley C.; Murphy, William J.

    2016-01-01

    High-resolution genetic and physical maps are invaluable tools for building accurate genome assemblies, and interpreting results of genome-wide association studies (GWAS). Previous genetic and physical maps anchored good quality draft assemblies of the domestic cat genome, enabling the discovery of numerous genes underlying hereditary disease and phenotypes of interest to the biomedical science and breeding communities. However, these maps lacked sufficient marker density to order thousands of shorter scaffolds in earlier assemblies, which instead relied heavily on comparative mapping with related species. A high-resolution map would aid in validating and ordering chromosome scaffolds from existing and new genome assemblies. Here, we describe a high-resolution genetic linkage map of the domestic cat genome based on genotyping 453 domestic cats from several multi-generational pedigrees on the Illumina 63K SNP array. The final maps include 58,055 SNP markers placed relative to 6637 markers with unique positions, distributed across all autosomes and the X chromosome. Our final sex-averaged maps span a total autosomal length of 4464 cM, the longest described linkage map for any mammal, confirming length estimates from a previous microsatellite-based map. The linkage map was used to order and orient the scaffolds from a substantially more contiguous domestic cat genome assembly (Felis catus v8.0), which incorporated ∼20 × coverage of Illumina fragment reads. The new genome assembly shows substantial improvements in contiguity, with a nearly fourfold increase in N50 scaffold size to 18 Mb. We use this map to report probable structural errors in previous maps and assemblies, and to describe features of the recombination landscape, including a massive (∼50 Mb) recombination desert (of virtually zero recombination) on the X chromosome that parallels a similar desert on the porcine X chromosome in both size and physical location. PMID:27172201

  13. Genome-wide Comparative Analysis of Annexin Superfamily in Plants

    PubMed Central

    Jami, Sravan Kumar; Clark, Greg B.; Ayele, Belay T.; Ashe, Paula; Kirti, Pulugurtha Bharadwaja

    2012-01-01

    Most annexins are calcium-dependent, phospholipid-binding proteins with suggested functions in response to environmental stresses and signaling during plant growth and development. They have previously been identified and characterized in Arabidopsis and rice, and constitute a multigene family in plants. In this study, we performed a comparative analysis of annexin gene families in the sequenced genomes of Viridiplantae ranging from unicellular green algae to multicellular plants, and identified 149 genes. Phylogenetic studies of these deduced annexins classified them into nine different arbitrary groups. The occurrence and distribution of bona fide type II calcium binding sites within the four annexin domains were found to be different in each of these groups. Analysis of chromosomal distribution of annexin genes in rice, Arabidopsis and poplar revealed their localization on various chromosomes with some members also found on duplicated chromosomal segments leading to gene family expansion. Analysis of gene structure suggests sequential or differential loss of introns during the evolution of land plant annexin genes. Intron positions and phases are well conserved in annexin genes from representative genomes ranging from Physcomitrella to higher plants. The occurrence of alternative motifs such as K/R/HGD was found to be overlapping or at the mutated regions of the type II calcium binding sites indicating potential functional divergence in certain plant annexins. This study provides a basis for further functional analysis and characterization of annexin multigene families in the plant lineage. PMID:23133603

  14. Bulked sample analysis in genetics, genomics and crop improvement.

    PubMed

    Zou, Cheng; Wang, Pingxi; Xu, Yunbi

    2016-10-01

    Biological assay has been based on analysis of all individuals collected from sample populations. Bulked sample analysis (BSA), which works with selected and pooled individuals, has been extensively used in gene mapping through bulked segregant analysis with biparental populations, mapping by sequencing with major gene mutants and pooled genomewide association study using extreme variants. Compared to conventional entire population analysis, BSA significantly reduces the scale and cost by simplifying the procedure. The bulks can be built by selection of extremes or representative samples from any populations and all types of segregants and variants that represent wide ranges of phenotypic variation for the target trait. Methods and procedures for sampling, bulking and multiplexing are described. The samples can be analysed using individual markers, microarrays and high-throughput sequencing at all levels of DNA, RNA and protein. The power of BSA is affected by population size, selection of extreme individuals, sequencing strategies, genetic architecture of the trait and marker density. BSA will facilitate plant breeding through development of diagnostic and constitutive markers, agronomic genomics, marker-assisted selection and selective phenotyping. Applications of BSA in genetics, genomics and crop improvement are discussed with their future perspectives.

  15. Microsporidian genome analysis reveals evolutionary strategies for obligate intracellular growth.

    PubMed

    Cuomo, Christina A; Desjardins, Christopher A; Bakowski, Malina A; Goldberg, Jonathan; Ma, Amy T; Becnel, James J; Didier, Elizabeth S; Fan, Lin; Heiman, David I; Levin, Joshua Z; Young, Sarah; Zeng, Qiandong; Troemel, Emily R

    2012-12-01

    Microsporidia comprise a large phylum of obligate intracellular eukaryotes that are fungal-related parasites responsible for widespread disease, and here we address questions about microsporidia biology and evolution. We sequenced three microsporidian genomes from two species, Nematocida parisii and Nematocida sp1, which are natural pathogens of Caenorhabditis nematodes and provide model systems for studying microsporidian pathogenesis. We performed deep sequencing of transcripts from a time course of N. parisii infection. Examination of pathogen gene expression revealed compact transcripts and a dramatic takeover of host cells by Nematocida. We also performed phylogenomic analyses of Nematocida and other microsporidian genomes to refine microsporidian phylogeny and identify evolutionary events of gene loss, acquisition, and modification. In particular, we found that all microsporidia lost the tumor-suppressor gene retinoblastoma, which we speculate could accelerate the parasite cell cycle and increase the mutation rate. We also found that microsporidia acquired transporters that could import nucleosides to fuel rapid growth. In addition, microsporidian hexokinases gained secretion signal sequences, and in a functional assay these were sufficient to export proteins out of the cell; thus hexokinase may be targeted into the host cell to reprogram it toward biosynthesis. Similar molecular changes appear during formation of cancer cells and may be evolutionary strategies adopted independently by microsporidia to proliferate rapidly within host cells. Finally, analysis of genome polymorphisms revealed evidence for a sexual cycle that may provide genetic diversity to alleviate problems caused by clonal growth. Together these events may explain the emergence and success of these diverse intracellular parasites.

  16. Structural analysis of hepatitis C RNA genome using DNA microarrays

    PubMed Central

    Martell, María; Briones, Carlos; de Vicente, Aránzazu; Piron, María; Esteban, Juan I.; Esteban, Rafael; Guardia, Jaime; Gómez, Jordi

    2004-01-01

    Many studies have tried to identify specific nucleotide sequences in the quasispecies of hepatitis C virus (HCV) that determine resistance or sensitivity to interferon (IFN) therapy, unfortunately without conclusive results. Although viral proteins represent the most evident phenotype of the virus, genomic RNA sequences determine secondary and tertiary structures which are also part of the viral phenotype and can be involved in important biological roles. In this work, a method of RNA structure analysis has been developed based on the hybridization of labelled HCV transcripts to microarrays of complementary DNA oligonucleotides. Hybridizations were carried out at non-denaturing conditions, using appropriate temperature and buffer composition to allow binding to the immobilized probes of the RNA transcript without disturbing its secondary/tertiary structural motifs. Oligonucleotides printed onto the microarray covered the entire 5′ non-coding region (5′NCR), the first three-quarters of the core region, the E2–NS2 junction and the first 400 nt of the NS3 region. We document the use of this methodology to analyse the structural degree of a large region of HCV genomic RNA in two genotypes associated with different responses to IFN treatment. The results reported here show different structural degree along the genome regions analysed, and differential hybridization patterns for distinct genotypes in NS2 and NS3 HCV regions. PMID:15247323

  17. [Ethical and social issues on the human genome analysis].

    PubMed

    Archer, L

    1992-03-01

    The modern technologies for human genome analysis raise a variety of ethical and social questions. The pre-symptomatic diagnostic of diseases of late expression is becoming possible for a rapidly increasing number of situations. The use of that knowledge by employers, insurance companies, schools, and society in general, could lead to discriminations and stigmatizations, in addition to adverse psychological reactions. DNA fingerprinting raises questions of privacy and personal autonomy in its applications to paternity proof, criminal proceedings, and establishment of data banks. The project of the immediate and complete sequencing of the human genome will lead to questions of economical ethics, as well as of access, commercialization and property rights of scientific information and materials obtained. It also favours a reducionistic mentality and international unbalances. The molecular biology of humans, which will follow the complete sequencing of the genome, may foster a rethinking of the concepts of freedom of self-determination (basic for moral responsibility) and of equality. The gene therapy and its possible extension to the betterment of the human species, pose questions of ethical limits to this technology. All these problems will have to be answered in terms of the application of the principle of ethical freedom for self-fulfillment, as a right of the human person, as well as of science and society. Scientific, economic and social interests have to be subordinated to the dignity of the human person.

  18. Comparative genomic analysis of ten Streptococcus pneumoniae temperate bacteriophages.

    PubMed

    Romero, Patricia; Croucher, Nicholas J; Hiller, N Luisa; Hu, Fen Z; Ehrlich, Garth D; Bentley, Stephen D; García, Ernesto; Mitchell, Tim J

    2009-08-01

    Streptococcus pneumoniae is an important human pathogen that often carries temperate bacteriophages. As part of a program to characterize the genetic makeup of prophages associated with clinical strains and to assess the potential roles that they play in the biology and pathogenesis in their host, we performed comparative genomic analysis of 10 temperate pneumococcal phages. All of the genomes are organized into five major gene clusters: lysogeny, replication, packaging, morphogenesis, and lysis clusters. All of the phage particles observed showed a Siphoviridae morphology. The only genes that are well conserved in all the genomes studied are those involved in the integration and the lysis of the host in addition to two genes, of unknown function, within the replication module. We observed that a high percentage of the open reading frames contained no similarities to any sequences catalogued in public databases; however, genes that were homologous to known phage virulence genes, including the pblB gene of Streptococcus mitis and the vapE gene of Dichelobacter nodosus, were also identified. Interestingly, bioinformatic tools showed the presence of a toxin-antitoxin system in the phage phiSpn_6, and this represents the first time that an addition system in a pneumophage has been identified. Collectively, the temperate pneumophages contain a diverse set of genes with various levels of similarity among them. PMID:19502408

  19. The Korea brassica genome project: a glimpse of the brassica genome based on comparative genome analysis with Arabidopsis.

    PubMed

    Yang, Tae-Jin; Kim, Jung-Sun; Lim, Ki-Byung; Kwon, Soo-Jin; Kim, Jin-A; Jin, Mina; Park, Jee Young; Lim, Myung-Ho; Kim, Ho-Il; Kim, Seog Hyung; Lim, Yong Pyo; Park, Beom-Seok

    2005-01-01

    A complete genome sequence provides unlimited information in the sequenced organism as well as in related taxa. According to the guidance of the Multinational Brassica Genome Project (MBGP), the Korea Brassica Genome Project (KBGP) is sequencing chromosome 1 (cytogenetically oriented chromosome #1) of Brassica rapa. We have selected 48 seed BACs on chromosome 1 using EST genetic markers and FISH analyses. Among them, 30 BAC clones have been sequenced and 18 are on the way. Comparative genome analyses of the EST sequences and sequenced BAC clones from Brassica chromosome 1 revealed their homeologous partner regions on the Arabidopsis genome and a syntenic comparative map between Brassica chromosome 1 and Arabidopsis chromosomes. In silico chromosome walking and clone validation have been successfully applied to extending sequence contigs based on the comparative map and BAC end sequences. In addition, we have defined the (peri)centromeric heterochromatin blocks with centromeric tandem repeats, rDNA and centromeric retrotransposons. In-depth sequence analyses of five homeologous BAC clones and an Arabidopsis chromosomal region reveal overall co-linearity, with 82% sequence similarity. The data indicate that the Brassica genome has undergone triplication and subsequent gene losses after the divergence of Arabidopsis and Brassica. Based on in-depth comparative genome analyses, we propose a comparative genomics approach for conquering the Brassica genome. In 2005 we intend to construct an integrated physical map, including sequence information from 500 BAC clones and integration of fingerprinting data and end sequence data of more than 100,000 BAC clones.

  20. Whole-genome analysis of multienvironment or multitrait QTL in MAGIC.

    PubMed

    Verbyla, Arūnas P; Cavanagh, Colin R; Verbyla, Klara L

    2014-09-18

    Multiparent Advanced Generation Inter-Cross (MAGIC) populations are now being utilized to more accurately identify the underlying genetic basis of quantitative traits through quantitative trait loci (QTL) analyses and subsequent gene discovery. The expanded genetic diversity present in such populations and the amplified number of recombination events mean that QTL can be identified at a higher resolution. Most QTL analyses are conducted separately for each trait within a single environment. Separate analysis does not take advantage of the underlying correlation structure found in multienvironment or multitrait data. By using this information in a joint analysis-be it multienvironment or multitrait - it is possible to gain a greater understanding of genotype- or QTL-by-environment interactions or of pleiotropic effects across traits. Furthermore, this can result in improvements in accuracy for a range of traits or in a specific target environment and can influence selection decisions. Data derived from MAGIC populations allow for founder probabilities of all founder alleles to be calculated for each individual within the population. This presents an additional layer of complexity and information that can be utilized to identify QTL. A whole-genome approach is proposed for multienvironment and multitrait QTL analysis in MAGIC. The whole-genome approach simultaneously incorporates all founder probabilities at each marker for all individuals in the analysis, rather than using a genome scan. A dimension reduction technique is implemented, which allows for high-dimensional genetic data. For each QTL identified, sizes of effects for each founder allele, the percentage of genetic variance explained, and a score to reflect the strength of the QTL are found. The approach was demonstrated to perform well in a small simulation study and for two experiments, using a wheat MAGIC population.

  1. Comparative Genomic Analysis of Mannheimia haemolytica from Bovine Sources.

    PubMed

    Klima, Cassidy L; Cook, Shaun R; Zaheer, Rahat; Laing, Chad; Gannon, Vick P; Xu, Yong; Rasmussen, Jay; Potter, Andrew; Hendrick, Steve; Alexander, Trevor W; McAllister, Tim A

    2016-01-01

    Bovine respiratory disease is a common health problem in beef production. The primary bacterial agent involved, Mannheimia haemolytica, is a target for antimicrobial therapy and at risk for associated antimicrobial resistance development. The role of M. haemolytica in pathogenesis is linked to serotype with serotypes 1 (S1) and 6 (S6) isolated from pneumonic lesions and serotype 2 (S2) found in the upper respiratory tract of healthy animals. Here, we sequenced the genomes of 11 strains of M. haemolytica, representing all three serotypes and performed comparative genomics analysis to identify genetic features that may contribute to pathogenesis. Possible virulence associated genes were identified within 14 distinct prophage, including a periplasmic chaperone, a lipoprotein, peptidoglycan glycosyltransferase and a stress response protein. Prophage content ranged from 2-8 per genome, but was higher in S1 and S6 strains. A type I-C CRISPR-Cas system was identified in each strain with spacer diversity and organization conserved among serotypes. The majority of spacers occur in S1 and S6 strains and originate from phage suggesting that serotypes 1 and 6 may be more resistant to phage predation. However, two spacers complementary to the host chromosome targeting a UDP-N-acetylglucosamine 2-epimerase and a glycosyl transferases group 1 gene are present in S1 and S6 strains only indicating these serotypes may employ CRISPR-Cas to regulate gene expression to avoid host immune responses or enhance adhesion during infection. Integrative conjugative elements are present in nine of the eleven genomes. Three of these harbor extensive multi-drug resistance cassettes encoding resistance against the majority of drugs used to combat infection in beef cattle, including macrolides and tetracyclines used in human medicine. The findings here identify key features that are likely contributing to serotype related pathogenesis and specific targets for vaccine design intended to reduce the

  2. Analysis of deletion within the reindeer pseudocowpoxvirus genome.

    PubMed

    Hautaniemi, Maria; Vaccari, Francesca; Scagliarini, Alessandra; Scacliarini, Alessandra; Laaksonen, Sauli; Huovilainen, Anita; McInnes, Colin J

    2011-09-01

    Cases of contagious pustular stomatitis have been reported in Finnish reindeer for many years. Two species of the genus Parapoxvirus of the family Poxviridae have been identified as the causative agent of the disease; orf virus (ORFV) was found during the 1992-1993 epidemic and pseudocowpoxvirus (PCPV) was connected to the 1999-2000 epidemic. The genome of reindeer parapoxvirus from the latter outbreak, isolate F00.120R, was recently sequenced and confirmed as PCPV. The six gene deletion of the right terminus of the F00.120R genome, in comparison to ORFV, was investigated in an attempt to use it in differentiating viruses causing pustular stomatitis in reindeer. The present study describes discovery and analysis of genes 116-121 in reindeer PCPV and in an Italian field isolate of bovine PCPV. The results show that a 5431 bp sequence containing genes 116-121 was likely to have been deleted from the F00.120R genome between the 6th and 7th passage in cell culture, and that these genes are present in other isolates of reindeer and bovine PCPV isolated in Finland during the years 2005-2010. The data presented here extends our knowledge of the PCPV genome, confirming that it contains homologues of all known ORFV genes and further reinforces their close genetic relationship. The similarity between the EEV envelope and GM-CSF inhibitory factor genes from reindeer PCPV and ORFV isolates, Finnish sheep ORFV and cattle PCPV isolates indicate that these viruses have been circulating among Finnish reindeer, cattle and sheep over a long period of time.

  3. Comparative Genomic Analysis of Mannheimia haemolytica from Bovine Sources

    PubMed Central

    Klima, Cassidy L.; Cook, Shaun R.; Zaheer, Rahat; Laing, Chad; Gannon, Vick P.; Xu, Yong; Rasmussen, Jay; Potter, Andrew; Hendrick, Steve; Alexander, Trevor W.; McAllister, Tim A.

    2016-01-01

    Bovine respiratory disease is a common health problem in beef production. The primary bacterial agent involved, Mannheimia haemolytica, is a target for antimicrobial therapy and at risk for associated antimicrobial resistance development. The role of M. haemolytica in pathogenesis is linked to serotype with serotypes 1 (S1) and 6 (S6) isolated from pneumonic lesions and serotype 2 (S2) found in the upper respiratory tract of healthy animals. Here, we sequenced the genomes of 11 strains of M. haemolytica, representing all three serotypes and performed comparative genomics analysis to identify genetic features that may contribute to pathogenesis. Possible virulence associated genes were identified within 14 distinct prophage, including a periplasmic chaperone, a lipoprotein, peptidoglycan glycosyltransferase and a stress response protein. Prophage content ranged from 2–8 per genome, but was higher in S1 and S6 strains. A type I-C CRISPR-Cas system was identified in each strain with spacer diversity and organization conserved among serotypes. The majority of spacers occur in S1 and S6 strains and originate from phage suggesting that serotypes 1 and 6 may be more resistant to phage predation. However, two spacers complementary to the host chromosome targeting a UDP-N-acetylglucosamine 2-epimerase and a glycosyl transferases group 1 gene are present in S1 and S6 strains only indicating these serotypes may employ CRISPR-Cas to regulate gene expression to avoid host immune responses or enhance adhesion during infection. Integrative conjugative elements are present in nine of the eleven genomes. Three of these harbor extensive multi-drug resistance cassettes encoding resistance against the majority of drugs used to combat infection in beef cattle, including macrolides and tetracyclines used in human medicine. The findings here identify key features that are likely contributing to serotype related pathogenesis and specific targets for vaccine design intended to reduce the

  4. Statistical analysis of simple repeats in the human genome

    NASA Astrophysics Data System (ADS)

    Piazza, F.; Liò, P.

    2005-03-01

    The human genome contains repetitive DNA at different level of sequence length, number and dispersion. Highly repetitive DNA is particularly rich in homo- and di-nucleotide repeats, while middle repetitive DNA is rich of families of interspersed, mobile elements hundreds of base pairs (bp) long, among which belong the Alu families. A link between homo- and di-polymeric tracts and mobile elements has been recently highlighted. In particular, the mobility of Alu repeats, which form 10% of the human genome, has been correlated with the length of poly(A) tracts located at one end of the Alu. These tracts have a rigid and non-bendable structure and have an inhibitory effect on nucleosomes, which normally compact the DNA. We performed a statistical analysis of the genome-wide distribution of lengths and inter-tract separations of poly(X) and poly(XY) tracts in the human genome. Our study shows that in humans the length distributions of these sequences reflect the dynamics of their expansion and DNA replication. By means of general tools from linguistics, we show that the latter play the role of highly-significant content-bearing terms in the DNA text. Furthermore, we find that such tracts are positioned in a non-random fashion, with an apparent periodicity of 150 bases. This allows us to extend the link between repetitive, highly mobile elements such as Alus and low-complexity words in human DNA. More precisely, we show that Alus are sources of poly(X) tracts, which in turn affect in a subtle way the combination and diversification of gene expression and the fixation of multigene families.

  5. Paired-end genomic signature tags: a method for the functional analysis of genomes and epigenomes.

    PubMed

    Dunn, John J; McCorkle, Sean R; Everett, Logan; Anderson, Carl W

    2007-01-01

    Because paired-end genomic signature tags are sequenced-based, they have the potential to become an alternate tool to tiled microarray hybridization as a method for genome-wide localization of transcription factors and other sequence-specific DNA binding proteins. As outlined here the method also can be used for global analysis of DNA methylation. One advantage of this approach is the ability to easily switch between different genome types without having to fabricate a new microarray for each and every DNA type. However, the method does have some disadvantages. Among the most rate-limiting steps of our PE-GST protocol are the need to concatemerize the diTAGs, size fractionate them and then clone them prior to sequencing. This is usually followed by additional steps to amplify and size select for long (> or = 500) concatemer inserts prior to sequencing. These time-consuming steps are important for standard DNA sequencing as they increase efficiency approximately 20-30-fold since each amplified concatemer can now provide information on multiple tags; the limitation on data acqui- sition is read length during sequencing. However, the development of new sequencing methods such as Life Sciences' 454 new nanotechnology-based sequencing instrument (41) could increase tag sequencing efficiency by several orders of magnitude (> or = 100,000 diTAG reads/run), which is sufficient to provide in-depth global analysis of all ChIP PE-GSTs in a single run. This is because the lengths of our paired-end diTAGs (approximately 60 bp) fall well within the region of high accuracy for read lengths on this instrument. In principle, sequence analysis of diTAGs could begin as soon as they are generated, thereby completely bypassing the need for the concatemerization, sizing, downstream cloning steps and sequencing template purification. In addition, our protocol places any one of several unique four-base long nucleotide sequences, such as GATC, between each and every diTAG pair, which could

  6. Industrial Acetogenic Biocatalysts: A Comparative Metabolic and Genomic Analysis.

    PubMed

    Bengelsdorf, Frank R; Poehlein, Anja; Linder, Sonja; Erz, Catarina; Hummel, Tim; Hoffmeister, Sabrina; Daniel, Rolf; Dürre, Peter

    2016-01-01

    Synthesis gas (syngas) fermentation by anaerobic acetogenic bacteria employing the Wood-Ljungdahl pathway is a bioprocess for production of biofuels and biocommodities. The major fermentation products of the most relevant biocatalytic strains (Clostridium ljungdahlii, C. autoethanogenum, C. ragsdalei, and C. coskatii) are acetic acid and ethanol. A comparative metabolic and genomic analysis using the mentioned biocatalysts might offer targets for metabolic engineering and thus improve the production of compounds apart from ethanol. Autotrophic growth and product formation of the four wild type (WT) strains were compared in uncontrolled batch experiments. The genomes of C. ragsdalei and C. coskatii were sequenced and the genome sequences of all four biocatalytic strains analyzed in comparative manner. Growth and product spectra (acetate, ethanol, 2,3-butanediol) of C. autoethanogenum, C. ljungdahlii, and C. ragsdalei were rather similar. In contrast, C. coskatii produced significantly less ethanol and its genome sequence lacks two genes encoding aldehyde:ferredoxin oxidoreductases (AOR). Comparative genome sequence analysis of the four WT strains revealed high average nucleotide identity (ANI) of C. ljungdahlii and C. autoethanogenum (99.3%) and C. coskatii (98.3%). In contrast, C. ljungdahlii WT and C. ragsdalei WT showed an ANI-based similarity of only 95.8%. Additionally, recombinant C. ljungdahlii strains were constructed that harbor an artificial acetone synthesis operon (ASO) consisting of the following genes: adc, ctfA, ctfB, and thlA (encoding acetoacetate decarboxylase, acetoacetyl-CoA:acetate/butyrate:CoA-transferase subunits A and B, and thiolase) under the control of thlA promoter (P thlA ) from C. acetobutylicum or native pta-ack promoter (P pta-ack ) from C. ljungdahlii. Respective recombinant strains produced 2-propanol rather than acetone, due to the presence of a NADPH-dependent primary-secondary alcohol dehydrogenase that converts acetone to 2

  7. Radiation induced genome instability: multiscale modelling and data analysis

    NASA Astrophysics Data System (ADS)

    Andreev, Sergey; Eidelman, Yuri

    2012-07-01

    Genome instability (GI) is thought to be an important step in cancer induction and progression. Radiation induced GI is usually defined as genome alterations in the progeny of irradiated cells. The aim of this report is to demonstrate an opportunity for integrative analysis of radiation induced GI on the basis of multiscale modelling. Integrative, systems level modelling is necessary to assess different pathways resulting in GI in which a variety of genetic and epigenetic processes are involved. The multilevel modelling includes the Monte Carlo based simulation of several key processes involved in GI: DNA double strand breaks (DSBs) generation in cells initially irradiated as well as in descendants of irradiated cells, damage transmission through mitosis. Taking the cell-cycle-dependent generation of DNA/chromosome breakage into account ensures an advantage in estimating the contribution of different DNA damage response pathways to GI, as to nonhomologous vs homologous recombination repair mechanisms, the role of DSBs at telomeres or interstitial chromosomal sites, etc. The preliminary estimates show that both telomeric and non-telomeric DSB interactions are involved in delayed effects of radiation although differentially for different cell types. The computational experiments provide the data on the wide spectrum of GI endpoints (dicentrics, micronuclei, nonclonal translocations, chromatid exchanges, chromosome fragments) similar to those obtained experimentally for various cell lines under various experimental conditions. The modelling based analysis of experimental data demonstrates that radiation induced GI may be viewed as processes of delayed DSB induction/interaction/transmission being a key for quantification of GI. On the other hand, this conclusion is not sufficient to understand GI as a whole because factors of DNA non-damaging origin can also induce GI. Additionally, new data on induced pluripotent stem cells reveal that GI is acquired in normal mature

  8. Industrial Acetogenic Biocatalysts: A Comparative Metabolic and Genomic Analysis

    PubMed Central

    Bengelsdorf, Frank R.; Poehlein, Anja; Linder, Sonja; Erz, Catarina; Hummel, Tim; Hoffmeister, Sabrina; Daniel, Rolf; Dürre, Peter

    2016-01-01

    Synthesis gas (syngas) fermentation by anaerobic acetogenic bacteria employing the Wood–Ljungdahl pathway is a bioprocess for production of biofuels and biocommodities. The major fermentation products of the most relevant biocatalytic strains (Clostridium ljungdahlii, C. autoethanogenum, C. ragsdalei, and C. coskatii) are acetic acid and ethanol. A comparative metabolic and genomic analysis using the mentioned biocatalysts might offer targets for metabolic engineering and thus improve the production of compounds apart from ethanol. Autotrophic growth and product formation of the four wild type (WT) strains were compared in uncontrolled batch experiments. The genomes of C. ragsdalei and C. coskatii were sequenced and the genome sequences of all four biocatalytic strains analyzed in comparative manner. Growth and product spectra (acetate, ethanol, 2,3-butanediol) of C. autoethanogenum, C. ljungdahlii, and C. ragsdalei were rather similar. In contrast, C. coskatii produced significantly less ethanol and its genome sequence lacks two genes encoding aldehyde:ferredoxin oxidoreductases (AOR). Comparative genome sequence analysis of the four WT strains revealed high average nucleotide identity (ANI) of C. ljungdahlii and C. autoethanogenum (99.3%) and C. coskatii (98.3%). In contrast, C. ljungdahlii WT and C. ragsdalei WT showed an ANI-based similarity of only 95.8%. Additionally, recombinant C. ljungdahlii strains were constructed that harbor an artificial acetone synthesis operon (ASO) consisting of the following genes: adc, ctfA, ctfB, and thlA (encoding acetoacetate decarboxylase, acetoacetyl-CoA:acetate/butyrate:CoA-transferase subunits A and B, and thiolase) under the control of thlA promoter (PthlA) from C. acetobutylicum or native pta-ack promoter (Ppta-ack) from C. ljungdahlii. Respective recombinant strains produced 2-propanol rather than acetone, due to the presence of a NADPH-dependent primary-secondary alcohol dehydrogenase that converts acetone to 2

  9. Industrial Acetogenic Biocatalysts: A Comparative Metabolic and Genomic Analysis.

    PubMed

    Bengelsdorf, Frank R; Poehlein, Anja; Linder, Sonja; Erz, Catarina; Hummel, Tim; Hoffmeister, Sabrina; Daniel, Rolf; Dürre, Peter

    2016-01-01

    Synthesis gas (syngas) fermentation by anaerobic acetogenic bacteria employing the Wood-Ljungdahl pathway is a bioprocess for production of biofuels and biocommodities. The major fermentation products of the most relevant biocatalytic strains (Clostridium ljungdahlii, C. autoethanogenum, C. ragsdalei, and C. coskatii) are acetic acid and ethanol. A comparative metabolic and genomic analysis using the mentioned biocatalysts might offer targets for metabolic engineering and thus improve the production of compounds apart from ethanol. Autotrophic growth and product formation of the four wild type (WT) strains were compared in uncontrolled batch experiments. The genomes of C. ragsdalei and C. coskatii were sequenced and the genome sequences of all four biocatalytic strains analyzed in comparative manner. Growth and product spectra (acetate, ethanol, 2,3-butanediol) of C. autoethanogenum, C. ljungdahlii, and C. ragsdalei were rather similar. In contrast, C. coskatii produced significantly less ethanol and its genome sequence lacks two genes encoding aldehyde:ferredoxin oxidoreductases (AOR). Comparative genome sequence analysis of the four WT strains revealed high average nucleotide identity (ANI) of C. ljungdahlii and C. autoethanogenum (99.3%) and C. coskatii (98.3%). In contrast, C. ljungdahlii WT and C. ragsdalei WT showed an ANI-based similarity of only 95.8%. Additionally, recombinant C. ljungdahlii strains were constructed that harbor an artificial acetone synthesis operon (ASO) consisting of the following genes: adc, ctfA, ctfB, and thlA (encoding acetoacetate decarboxylase, acetoacetyl-CoA:acetate/butyrate:CoA-transferase subunits A and B, and thiolase) under the control of thlA promoter (P thlA ) from C. acetobutylicum or native pta-ack promoter (P pta-ack ) from C. ljungdahlii. Respective recombinant strains produced 2-propanol rather than acetone, due to the presence of a NADPH-dependent primary-secondary alcohol dehydrogenase that converts acetone to 2

  10. High-resolution analysis of DNA synthesis start sites and nucleosome architecture at efficient mammalian replication origins.

    PubMed

    Lombraña, Rodrigo; Almeida, Ricardo; Revuelta, Isabel; Madeira, Sofia; Herranz, Gonzalo; Saiz, Néstor; Bastolla, Ugo; Gómez, María

    2013-10-01

    DNA replication origins are poorly characterized genomic regions that are essential to recruit and position the initiation complex to start DNA synthesis. Despite the lack of specific replicator sequences, initiation of replication does not occur at random sites in the mammalian genome. This has lead to the view that DNA accessibility could be a major determinant of mammalian origins. Here, we performed a high-resolution analysis of nucleosome architecture and initiation sites along several origins of different genomic location and firing efficiencies. We found that mammalian origins are highly variable in nucleosome conformation and initiation patterns. Strikingly, initiation sites at efficient CpG island-associated origins always occur at positions of high-nucleosome occupancy. Origin recognition complex (ORC) binding sites, however, occur at adjacent but distinct positions marked by labile nucleosomes. We also found that initiation profiles mirror nucleosome architecture, both at endogenous origins and at a transgene in a heterologous system. Our studies provide a unique insight into the relationship between chromatin structure and initiation sites in the mammalian genome that has direct implications for how the replication programme can be accommodated to diverse epigenetic scenarios.

  11. Orchestrating high-throughput genomic analysis with Bioconductor.

    PubMed

    Huber, Wolfgang; Carey, Vincent J; Gentleman, Robert; Anders, Simon; Carlson, Marc; Carvalho, Benilton S; Bravo, Hector Corrada; Davis, Sean; Gatto, Laurent; Girke, Thomas; Gottardo, Raphael; Hahne, Florian; Hansen, Kasper D; Irizarry, Rafael A; Lawrence, Michael; Love, Michael I; MacDonald, James; Obenchain, Valerie; Oleś, Andrzej K; Pagès, Hervé; Reyes, Alejandro; Shannon, Paul; Smyth, Gordon K; Tenenbaum, Dan; Waldron, Levi; Morgan, Martin

    2015-02-01

    Bioconductor is an open-source, open-development software project for the analysis and comprehension of high-throughput data in genomics and molecular biology. The project aims to enable interdisciplinary research, collaboration and rapid development of scientific software. Based on the statistical programming language R, Bioconductor comprises 934 interoperable packages contributed by a large, diverse community of scientists. Packages cover a range of bioinformatic and statistical applications. They undergo formal initial review and continuous automated testing. We present an overview for prospective users and contributors. PMID:25633503

  12. Construction of an integrated database to support genomic sequence analysis

    SciTech Connect

    Gilbert, W.; Overbeek, R.

    1994-11-01

    The central goal of this project is to develop an integrated database to support comparative analysis of genomes including DNA sequence data, protein sequence data, gene expression data and metabolism data. In developing the logic-based system GenoBase, a broader integration of available data was achieved due to assistance from collaborators. Current goals are to easily include new forms of data as they become available and to easily navigate through the ensemble of objects described within the database. This report comments on progress made in these areas.

  13. Orchestrating high-throughput genomic analysis with Bioconductor

    PubMed Central

    Huber, Wolfgang; Carey, Vincent J.; Gentleman, Robert; Anders, Simon; Carlson, Marc; Carvalho, Benilton S.; Bravo, Hector Corrada; Davis, Sean; Gatto, Laurent; Girke, Thomas; Gottardo, Raphael; Hahne, Florian; Hansen, Kasper D.; Irizarry, Rafael A.; Lawrence, Michael; Love, Michael I.; MacDonald, James; Obenchain, Valerie; Oleś, Andrzej K.; Pagès, Hervé; Reyes, Alejandro; Shannon, Paul; Smyth, Gordon K.; Tenenbaum, Dan; Waldron, Levi; Morgan, Martin

    2015-01-01

    Bioconductor is an open-source, open-development software project for the analysis and comprehension of high-throughput data in genomics and molecular biology. The project aims to enable interdisciplinary research, collaboration and rapid development of scientific software. Based on the statistical programming language R, Bioconductor comprises 934 interoperable packages contributed by a large, diverse community of scientists. Packages cover a range of bioinformatic and statistical applications. They undergo formal initial review and continuous automated testing. We present an overview for prospective users and contributors. PMID:25633503

  14. Wavelet analysis in current cancer genome research: a survey.

    PubMed

    Meng, Tao; Soliman, Ahmed T; Shyu, Mei-Ling; Yang, Yimin; Chen, Shu-Ching; Iyengar, S S; Yordy, John S; Iyengar, Puneeth

    2013-01-01

    With the rapid development of next generation sequencing technology, the amount of biological sequence data of the cancer genome increases exponentially, which calls for efficient and effective algorithms that may identify patterns hidden underneath the raw data that may distinguish cancer Achilles' heels. From a signal processing point of view, biological units of information, including DNA and protein sequences, have been viewed as one-dimensional signals. Therefore, researchers have been applying signal processing techniques to mine the potentially significant patterns within these sequences. More specifically, in recent years, wavelet transforms have become an important mathematical analysis tool, with a wide and ever increasing range of applications. The versatility of wavelet analytic techniques has forged new interdisciplinary bounds by offering common solutions to apparently diverse problems and providing a new unifying perspective on problems of cancer genome research. In this paper, we provide a survey of how wavelet analysis has been applied to cancer bioinformatics questions. Specifically, we discuss several approaches of representing the biological sequence data numerically and methods of using wavelet analysis on the numerical sequences. PMID:24407303

  15. Comparative analysis of genomic signal processing for microarray data clustering.

    PubMed

    Istepanian, Robert S H; Sungoor, Ala; Nebel, Jean-Christophe

    2011-12-01

    Genomic signal processing is a new area of research that combines advanced digital signal processing methodologies for enhanced genetic data analysis. It has many promising applications in bioinformatics and next generation of healthcare systems, in particular, in the field of microarray data clustering. In this paper we present a comparative performance analysis of enhanced digital spectral analysis methods for robust clustering of gene expression across multiple microarray data samples. Three digital signal processing methods: linear predictive coding, wavelet decomposition, and fractal dimension are studied to provide a comparative evaluation of the clustering performance of these methods on several microarray datasets. The results of this study show that the fractal approach provides the best clustering accuracy compared to other digital signal processing and well known statistical methods.

  16. Complete genome sequencing and comparative analysis of the linezolid-resistant Enterococcus faecalis strain DENG1.

    PubMed

    Yu, Zhijian; Chen, Zhong; Cheng, Hang; Zheng, Jinxin; Li, Duoyun; Deng, Xiangbin; Pan, Weiguang; Yang, Weizhi; Deng, Qiwen

    2014-07-01

    Genome level analysis of bacterial strains provides information on genetic composition and resistance mechanisms to clinically relevant antibiotics. To date, whole genome characterization of linezolid-resistant Enterococcus faecalis isolated in the clinic is lacking. In this study, we report the entire genome sequence, genomic characteristics and virulence factors of a pathogenic E. faecalis strain, DENG1. Our results showed considerable differences in genomic characteristics and virulence factors compared with other E. faecalis strains (V583 and OG1RF). The genome of this LZD-resistant E. faecalis strain can be used as a reference to study the mechanism of LZD resistance and the phylogenetic relationship of E. faecalis strains worldwide.

  17. The integrated microbial genomes (IMG) system in 2007: datacontent and analysis tool extensions

    SciTech Connect

    Markowitz, Victor M.; Szeto, Ernest; Palaniappan, Krishna; Grechkin, Yuri; Chu, Ken; Chen, I-Min A.; Dubchak, Inna; Anderson, Iain; Lykidis, Athanasios; Mavromatis, Konstantinos; Ivanova, Natalia N.; Kyrpides, Nikos C.

    2007-08-01

    The Integrated Microbial Genomes (IMG) system is a data management, analysis and annotation platform for all publicly available genomes. IMG contains both draft and complete JGI microbial genomes integrated with all other publicly available genomes from all three domains of life, together with a large number of plasmids and viruses. IMG provides tools and viewers for analyzing and annotating genomes, genes and functions, individually or in a comparative context. Since its first release in 2005, IMG's data content and analytical capabilities have been constantly expanded through quarterly releases. IMG is provided by the DOE-Joint Genome Institute (JGI) and is available from http://img.jgi.doe.gov.

  18. A parallel solution for high resolution histological image analysis.

    PubMed

    Bueno, G; González, R; Déniz, O; García-Rojo, M; González-García, J; Fernández-Carrobles, M M; Vállez, N; Salido, J

    2012-10-01

    This paper describes a general methodology for developing parallel image processing algorithms based on message passing for high resolution images (on the order of several Gigabytes). These algorithms have been applied to histological images and must be executed on massively parallel processing architectures. Advances in new technologies for complete slide digitalization in pathology have been combined with developments in biomedical informatics. However, the efficient use of these digital slide systems is still a challenge. The image processing that these slides are subject to is still limited both in terms of data processed and processing methods. The work presented here focuses on the need to design and develop parallel image processing tools capable of obtaining and analyzing the entire gamut of information included in digital slides. Tools have been developed to assist pathologists in image analysis and diagnosis, and they cover low and high-level image processing methods applied to histological images. Code portability, reusability and scalability have been tested by using the following parallel computing architectures: distributed memory with massive parallel processors and two networks, INFINIBAND and Myrinet, composed of 17 and 1024 nodes respectively. The parallel framework proposed is flexible, high performance solution and it shows that the efficient processing of digital microscopic images is possible and may offer important benefits to pathology laboratories.

  19. High-Resolution Analysis and Modeling of GRACE Accelerometer Observations

    NASA Astrophysics Data System (ADS)

    Flury, J.; Bettadpur, S.; Tapley, B. D.

    2007-12-01

    A better understanding and modeling of high-resolution GRACE accelerometer data serves three purposes: (1) to ensure that the best possible data are used in the GRACE gravity field processing, (2) to obtain precise and clean non-gravitational accelerations for aeronomy research, and (3) to understand and quantify disturbances which may play a role for future space-borne accelerometry. The external non-gravitational forces acting on the twin GRACE satellites are superimposed by a complex signal pattern of satellite-induced effects, originating from switching events in electrical circuits of on-board heaters and magnetic torquers, from vibrations and thruster accelerations. For each of these processes, we compared and averaged 10 Hz acceleration signals from a large number of events from long accelerometer time series. The analysis results provide constraints, e.g., on thrust accuracy, misalignments, and vibration frequencies. These constraints may help to understand the underlying physics. We modeled and reduced acceleration signals due to thrusters and heater switching and obtained considerably smoother and cleaner signals of external non-gravitational accelerations which may be useful for applications in aeronomy research.

  20. Accuracy Enhancement of Inertial Sensors Utilizing High Resolution Spectral Analysis

    PubMed Central

    Noureldin, Aboelmagd; Armstrong, Justin; El-Shafie, Ahmed; Karamat, Tashfeen; McGaughey, Don; Korenberg, Michael; Hussain, Aini

    2012-01-01

    In both military and civilian applications, the inertial navigation system (INS) and the global positioning system (GPS) are two complementary technologies that can be integrated to provide reliable positioning and navigation information for land vehicles. The accuracy enhancement of INS sensors and the integration of INS with GPS are the subjects of widespread research. Wavelet de-noising of INS sensors has had limited success in removing the long-term (low-frequency) inertial sensor errors. The primary objective of this research is to develop a novel inertial sensor accuracy enhancement technique that can remove both short-term and long-term error components from inertial sensor measurements prior to INS mechanization and INS/GPS integration. A high resolution spectral analysis technique called the fast orthogonal search (FOS) algorithm is used to accurately model the low frequency range of the spectrum, which includes the vehicle motion dynamics and inertial sensor errors. FOS models the spectral components with the most energy first and uses an adaptive threshold to stop adding frequency terms when fitting a term does not reduce the mean squared error more than fitting white noise. The proposed method was developed, tested and validated through road test experiments involving both low-end tactical grade and low cost MEMS-based inertial systems. The results demonstrate that in most cases the position accuracy during GPS outages using FOS de-noised data is superior to the position accuracy using wavelet de-noising.

  1. Super-resolution analysis for passive microwave images using FIPOCS

    NASA Astrophysics Data System (ADS)

    Wang, Xue; Wu, Jin; Wang, Jin; Adjouadi, Malek

    2013-03-01

    improve application of passive microwave imaging for object detection. In this study, we propose the FIPOCS (Fractal interpolation with Improved Projection onto Convex Sets) technique to enhance resolution. The experimental result shows that the resolution of passive microwave image is improved when utilizing the fractal interpolation to the LR image before applying the IPOCS technique.

  2. IMG 4 version of the integrated microbial genomes comparative analysis system.

    PubMed

    Markowitz, Victor M; Chen, I-Min A; Palaniappan, Krishna; Chu, Ken; Szeto, Ernest; Pillay, Manoj; Ratner, Anna; Huang, Jinghua; Woyke, Tanja; Huntemann, Marcel; Anderson, Iain; Billis, Konstantinos; Varghese, Neha; Mavromatis, Konstantinos; Pati, Amrita; Ivanova, Natalia N; Kyrpides, Nikos C

    2014-01-01

    The Integrated Microbial Genomes (IMG) data warehouse integrates genomes from all three domains of life, as well as plasmids, viruses and genome fragments. IMG provides tools for analyzing and reviewing the structural and functional annotations of genomes in a comparative context. IMG's data content and analytical capabilities have increased continuously since its first version released in 2005. Since the last report published in the 2012 NAR Database Issue, IMG's annotation and data integration pipelines have evolved while new tools have been added for recording and analyzing single cell genomes, RNA Seq and biosynthetic cluster data. Different IMG datamarts provide support for the analysis of publicly available genomes (IMG/W: http://img.jgi.doe.gov/w), expert review of genome annotations (IMG/ER: http://img.jgi.doe.gov/er) and teaching and training in the area of microbial genome analysis (IMG/EDU: http://img.jgi.doe.gov/edu). PMID:24165883

  3. IMG 4 version of the integrated microbial genomes comparative analysis system

    SciTech Connect

    Markowitz, Victor M.; Chen, I-Min A.; Palaniappan, Krishna; Chu, Ken; Szeto, Ernest; Pillay, Manoj; Ratner, Anna; Huang, Jinghua; Woyke, Tanja; Huntemann, Marcel; Anderson, Iain; Billis, Konstantinos; Varghese, Neha; Mavromatis, Konstantinos; Pati, Amrita; Ivanova, Natalia N.; Kyrpides, Nikos C.

    2013-10-27

    The Integrated Microbial Genomes (IMG) data warehouse integrates genomes from all three domains of life, as well as plasmids, viruses and genome fragments. IMG provides tools for analyzing and reviewing the structural and functional annotations of genomes in a comparative context. IMG’s data content and analytical capabilities have increased continuously since its first version released in 2005. Since the last report published in the 2012 NAR Database Issue, IMG’s annotation and data integration pipelines have evolved while new tools have been added for recording and analyzing single cell genomes, RNA Seq and biosynthetic cluster data. Finally, different IMG datamarts provide support for the analysis of publicly available genomes (IMG/W: http://img.jgi.doe.gov/w), expert review of genome annotations (IMG/ER: http://img.jgi.doe.gov/er) and teaching and training in the area of microbial genome analysis (IMG/EDU: http://img.jgi.doe.gov/edu).

  4. High resolution copy number variation data in the NCI-60 cancer cell lines from whole genome microarrays accessible through CellMiner.

    PubMed

    Varma, Sudhir; Pommier, Yves; Sunshine, Margot; Weinstein, John N; Reinhold, William C

    2014-01-01

    Array-based comparative genomic hybridization (aCGH) is a powerful technique for detecting gene copy number variation. It is generally considered to be robust and convenient since it measures DNA rather than RNA. In the current study, we combine copy number estimates from four different platforms (Agilent 44 K, NimbleGen 385 K, Affymetrix 500 K and Illumina Human1Mv1_C) to compute a reliable, high-resolution, easy to understand output for the measure of copy number changes in the 60 cancer cells of the NCI-DTP (the NCI-60). We then relate the results to gene expression. We explain how to access that database using our CellMiner web-tool and provide an example of the ease of comparison with transcript expression, whole exome sequencing, microRNA expression and response to 20,000 drugs and other chemical compounds. We then demonstrate how the data can be analyzed integratively with transcript expression data for the whole genome (26,065 genes). Comparison of copy number and expression levels shows an overall medium high correlation (median r = 0.247), with significantly higher correlations (median r = 0.408) for the known tumor suppressor genes. That observation is consistent with the hypothesis that gene loss is an important mechanism for tumor suppressor inactivation. An integrated analysis of concurrent DNA copy number and gene expression change is presented. Limiting attention to focal DNA gains or losses, we identify and reveal novel candidate tumor suppressors with matching alterations in transcript level.

  5. High-resolution melt analysis of DNA methylation to discriminate semen in biological stains.

    PubMed

    Antunes, Joana; Silva, Deborah S B S; Balamurugan, Kuppareddi; Duncan, George; Alho, Clarice S; McCord, Bruce

    2016-02-01

    The goal of this study was to develop a method for the detection of semen in biological stains using high-resolution melt (HRM) analysis and DNA methylation. To perform this task, we used an epigenetic locus that targets a tissue-specific differentially methylated region for semen. This specific locus, ZC3H12D, contains methylated CpG sites that are hypomethylated in semen and hypermethylated in blood and saliva. Using this procedure, DNA from forensic stains can be isolated, processed using bisulfite-modified polymerase chain reaction (PCR), and detected by real-time PCR with HRM capability. The method described in this article is robust; we were able to obtain results from samples with as little as 1 ng of genomic DNA. Samples inhibited by humic acid still produced reliable results. Furthermore, the procedure is specific and will not amplify non-bisulfite-modified DNA. Because this process can be performed using real-time PCR and is quantitative, it fits nicely within the workflow of current forensic DNA laboratories. As a result, it should prove to be a useful technique for processing trace evidence samples for serological analysis.

  6. Flux Coupling Analysis of Genome-Scale Metabolic Network Reconstructions

    PubMed Central

    Burgard, Anthony P.; Nikolaev, Evgeni V.; Schilling, Christophe H.; Maranas, Costas D.

    2004-01-01

    In this paper, we introduce the Flux Coupling Finder (FCF) framework for elucidating the topological and flux connectivity features of genome-scale metabolic networks. The framework is demonstrated on genome-scale metabolic reconstructions of Helicobacter pylori, Escherichia coli, and Saccharomyces cerevisiae. The analysis allows one to determine whether any two metabolic fluxes, v1 and v2, are (1) directionally coupled, if a non-zero flux for v1 implies a non-zero flux for v2 but not necessarily the reverse; (2) partially coupled, if a non-zero flux for v1 implies a non-zero, though variable, flux for v2 and vice versa; or (3) fully coupled, if a non-zero flux for v1 implies not only a non-zero but also a fixed flux for v2 and vice versa. Flux coupling analysis also enables the global identification of blocked reactions, which are all reactions incapable of carrying flux under a certain condition; equivalent knockouts, defined as the set of all possible reactions whose deletion forces the flux through a particular reaction to zero; and sets of affected reactions denoting all reactions whose fluxes are forced to zero if a particular reaction is deleted. The FCF approach thus provides a novel and versatile tool for aiding metabolic reconstructions and guiding genetic manipulations. PMID:14718379

  7. Genome-wide analysis of DNA methylation in hepatoblastoma tissues

    PubMed Central

    Cui, Ximao; Liu, Baihui; Zheng, Shan; Dong, Kuiran; Dong, Rui

    2016-01-01

    DNA methylation has a crucial role in cancer biology. In the present study, a genome-wide analysis of DNA methylation in hepatoblastoma (HB) tissues was performed to verify differential methylation levels between HB and normal tissues. As alpha-fetoprotein (AFP) has a critical role in HB, AFP methylation levels were also detected using pyrosequencing. Normal and HB liver tissue samples (frozen tissue) were obtained from patients with HB. Genome-wide analysis of DNA methylation in these tissues was performed using an Infinium HumanMethylation450 BeadChip, and the results were confirmed with reverse transcription-quantitative polymerase chain reaction. The Infinium HumanMethylation450 BeadChip demonstrated distinctively less methylation in HB tissues than in non-tumor tissues. In addition, methylation enrichment was observed in positions near the transcription start site of AFP, which exhibited lower methylation levels in HB tissues than in non-tumor liver tissues. Lastly, a significant negative correlation was observed between AFP messenger RNA expression and DNA methylation percentage, using linear Pearson's R correlation coefficients. The present results demonstrate differential methylation levels between HB and normal tissues, and imply that aberrant methylation of AFP in HB could reflect HB development. Expansion of these findings could provide useful insight into HB biology. PMID:27446465

  8. Monochromosomal hybrids for the analysis of the human genome

    SciTech Connect

    Athwal, R.S.

    1992-01-01

    We have already produced monochromosomal hybrids for 2/3 of the human genome and we have generated sufficient biological materials to complete the proposed panels of hybrid cell lines. We have developed experimental procedures to identify marked chromosomes in human cell lines prior to their transfer to rodent cells. This would eliminate redundancy in the production of monochromosomal hybrids and therefore help expedite completion of the hybrid cell panels. We have also developed a highly sensitive method to identify human chromosomes in hybrid cells. Monochromosomal hybrids produced in our lab are used in a number of laboratories for experiments on gene mapping, gene isolation, chromosome fractionation and genetic analysis for complementation of cellular phenotypes such as DNA repair and regulation of cell growth. Monochromosomal hybrids cell lines are freely available to scientific community for experiments on gene mapping and analysis of the human genome. We are preparing large quantities of DNA from each hybrid cell line which will be available to the research community for various experiments.

  9. Comparative genomic analysis of novel Acinetobacter symbionts: A combined systems biology and genomics approach

    PubMed Central

    Gupta, Vipin; Haider, Shazia; Sood, Utkarsh; Gilbert, Jack A.; Ramjee, Meenakshi; Forbes, Ken; Singh, Yogendra; Lopes, Bruno S.; Lal, Rup

    2016-01-01

    The increasing trend of antibiotic resistance in Acinetobacter drastically limits the range of therapeutic agents required to treat multidrug resistant (MDR) infections. This study focused on analysis of novel Acinetobacter strains using a genomics and systems biology approach. Here we used a network theory method for pathogenic and non-pathogenic Acinetobacter spp. to identify the key regulatory proteins (hubs) in each strain. We identified nine key regulatory proteins, guaA, guaB, rpsB, rpsI, rpsL, rpsE, rpsC, rplM and trmD, which have functional roles as hubs in a hierarchical scale-free fractal protein-protein interaction network. Two key hubs (guaA and guaB) were important for insect-associated strains, and comparative analysis identified guaA as more important than guaB due to its role in effective module regulation. rpsI played a significant role in all the novel strains, while rplM was unique to sheep-associated strains. rpsM, rpsB and rpsI were involved in the regulation of overall network topology across all Acinetobacter strains analyzed in this study. Future analysis will investigate whether these hubs are useful as drug targets for treating Acinetobacter infections. PMID:27378055

  10. Whole genomic DNA sequencing and comparative genomic analysis of Arthrospira platensis: high genome plasticity and genetic diversity

    PubMed Central

    Xu, Teng; Qin, Song; Hu, Yongwu; Song, Zhijian; Ying, Jianchao; Li, Peizhen; Dong, Wei; Zhao, Fangqing; Yang, Huanming; Bao, Qiyu

    2016-01-01

    Arthrospira platensis is a multi-cellular and filamentous non-N2-fixing cyanobacterium that is capable of performing oxygenic photosynthesis. In this study, we determined the nearly complete genome sequence of A. platensis YZ. A. platensis YZ genome is a single, circular chromosome of 6.62 Mb in size. Phylogenetic and comparative genomic analyses revealed that A. platensis YZ was more closely related to A. platensis NIES-39 than Arthrospira sp. PCC 8005 and A. platensis C1. Broad gene gains were identified between A. platensis YZ and three other Arthrospira speices, some of which have been previously demonstrated that can be laterally transferred among different species, such as restriction-modification systems-coding genes. Moreover, unprecedented extensive chromosomal rearrangements among different strains were observed. The chromosomal rearrangements, particularly the chromosomal inversions, were analysed and estimated to be closely related to palindromes that involved long inverted repeat sequences and the extensively distributed type IIR restriction enzyme in the Arthrospira genome. In addition, species from genus Arthrospira unanimously contained the highest rate of repetitive sequence compared with the other species of order Oscillatoriales, suggested that sequence duplication significantly contributed to Arthrospira genome phylogeny. These results provided in-depth views into the genomic phylogeny and structural variation of A. platensis, as well as provide a valuable resource for functional genomics studies. PMID:27330141

  11. Whole genomic DNA sequencing and comparative genomic analysis of Arthrospira platensis: high genome plasticity and genetic diversity.

    PubMed

    Xu, Teng; Qin, Song; Hu, Yongwu; Song, Zhijian; Ying, Jianchao; Li, Peizhen; Dong, Wei; Zhao, Fangqing; Yang, Huanming; Bao, Qiyu

    2016-08-01

    Arthrospira platensis is a multi-cellular and filamentous non-N2-fixing cyanobacterium that is capable of performing oxygenic photosynthesis. In this study, we determined the nearly complete genome sequence of A. platensis YZ. A. platensis YZ genome is a single, circular chromosome of 6.62 Mb in size. Phylogenetic and comparative genomic analyses revealed that A. platensis YZ was more closely related to A. platensis NIES-39 than Arthrospira sp. PCC 8005 and A. platensis C1. Broad gene gains were identified between A. platensis YZ and three other Arthrospira speices, some of which have been previously demonstrated that can be laterally transferred among different species, such as restriction-modification systems-coding genes. Moreover, unprecedented extensive chromosomal rearrangements among different strains were observed. The chromosomal rearrangements, particularly the chromosomal inversions, were analysed and estimated to be closely related to palindromes that involved long inverted repeat sequences and the extensively distributed type IIR restriction enzyme in the Arthrospira genome. In addition, species from genus Arthrospira unanimously contained the highest rate of repetitive sequence compared with the other species of order Oscillatoriales, suggested that sequence duplication significantly contributed to Arthrospira genome phylogeny. These results provided in-depth views into the genomic phylogeny and structural variation of A. platensis, as well as provide a valuable resource for functional genomics studies. PMID:27330141

  12. Whole genomic DNA sequencing and comparative genomic analysis of Arthrospira platensis: high genome plasticity and genetic diversity.

    PubMed

    Xu, Teng; Qin, Song; Hu, Yongwu; Song, Zhijian; Ying, Jianchao; Li, Peizhen; Dong, Wei; Zhao, Fangqing; Yang, Huanming; Bao, Qiyu

    2016-08-01

    Arthrospira platensis is a multi-cellular and filamentous non-N2-fixing cyanobacterium that is capable of performing oxygenic photosynthesis. In this study, we determined the nearly complete genome sequence of A. platensis YZ. A. platensis YZ genome is a single, circular chromosome of 6.62 Mb in size. Phylogenetic and comparative genomic analyses revealed that A. platensis YZ was more closely related to A. platensis NIES-39 than Arthrospira sp. PCC 8005 and A. platensis C1. Broad gene gains were identified between A. platensis YZ and three other Arthrospira speices, some of which have been previously demonstrated that can be laterally transferred among different species, such as restriction-modification systems-coding genes. Moreover, unprecedented extensive chromosomal rearrangements among different strains were observed. The chromosomal rearrangements, particularly the chromosomal inversions, were analysed and estimated to be closely related to palindromes that involved long inverted repeat sequences and the extensively distributed type IIR restriction enzyme in the Arthrospira genome. In addition, species from genus Arthrospira unanimously contained the highest rate of repetitive sequence compared with the other species of order Oscillatoriales, suggested that sequence duplication significantly contributed to Arthrospira genome phylogeny. These results provided in-depth views into the genomic phylogeny and structural variation of A. platensis, as well as provide a valuable resource for functional genomics studies.

  13. Comprehensive high-resolution genomic profiling and cytogenetics of two pediatric and one adult medulloblastoma.

    PubMed

    Holland, Heidrun; Xu, Li-Xin; Ahnert, Peter; Kirsten, Holger; Koschny, Ronald; Bauer, Manfred; Schober, Ralf; Meixensberger, Jürgen; Krupp, Wolfgang

    2013-09-01

    Medulloblastoma (WHO grade IV) is a rare, malignant, invasive, embryonal tumor which mainly occurs in children and represents less than 1% of all adult brain tumors. Systematic comprehensive genetic analyses on medulloblastomas are rare but necessary to provide more detailed information. Therefore, we performed comprehensive cytogenetic analyses (blood and tissue) of two pediatric and one adult medulloblastoma, using trypsin-Giemsa staining, spectral karyotyping (tissues only), SNP-arrays, and gene expression analyses. We confirmed frequently detected chromosomal aberrations in medulloblastoma, such as +7q, -8p/q, -9q, -11q, -12q, and +17q and identified novel genetic events. Applying SNP-array, we identified constitutional de novo losses 5q21.1, 15q11.2, 17q21.31, 19p12 (pediatric medulloblastoma), 9p21.1, 19p12, 19q13.3, 21q11.2 (adult medulloblastoma) and gains 16p11.1-16p11.2, 18p11.32, Yq11.223-Yq11.23 (pediatric medulloblastoma), Xp22.31 (adult medulloblastoma) possibly representing inherited causal events for medulloblastoma formation. We show evidence for somatic segmental uniparental disomy in regions 1p36, 6q16.3, 6q24.1, 14q21.2, 17p13.3, and 17q22 not previously described for primary medulloblastoma. Gene expression analysis supported classification of the adult medulloblastoma to the WNT-subgroup and classification of pediatric medulloblastomas to group 3 tumors. Analyses of tumors and matched normal tissues (blood) with a combination of complementary techniques will help to further elucidate potentially causal genetic events for medulloblastomas.

  14. Whole-Genome Sequence Analysis and Genome-Wide Virulence Gene Identification of Riemerella anatipestifer Strain Yb2.

    PubMed

    Wang, Xiaolan; Ding, Chan; Wang, Shaohui; Han, Xiangan; Yu, Shengqing

    2015-08-01

    Riemerella anatipestifer is a well-described pathogen of waterfowl and other avian species that can cause septicemic and exudative diseases. In this study, we sequenced the complete genome of R. anatipestifer strain Yb2 and analyzed it against the published genomic sequences of R. anatipestifer strains DSM15868, RA-GD, RA-CH-1, and RA-CH-2. The Yb2 genome contains one circular chromosome of 2,184,066 bp with a 35.73% GC content and no plasmid. The genome has 2,021 open reading frames that occupy 90.88% of the genome. A comparative genomic analysis revealed that genome organization is highly conserved among R. anatipestifer strains, except for four inversions of a sequence segment in Yb2. A phylogenetic analysis found that the closest neighbor of Yb2 is RA-GD. Furthermore, we constructed a library of 3,175 mutants by random transposon mutagenesis, and 100 mutants exhibiting more than 100-fold-attenuated virulence were obtained by animal screening experiments. Southern blot analysis and genetic characterization of the mutants led to the identification of 49 virulence genes. Of these, 25 encode cytoplasmic proteins, 6 encode cytoplasmic membrane proteins, 4 encode outer membrane proteins, and the subcellular localization of the remaining 14 gene products is unknown. The functional classification of orthologous-group clusters revealed that 16 genes are associated with metabolism, 6 are associated with cellular processing and signaling, and 4 are associated with information storage and processing. The functions of the other 23 genes are poorly characterized or unknown. This genome-wide study identified genes important to the virulence of R. anatipestifer. PMID:26002892

  15. The genome clinic: a multidisciplinary approach to assessing the opportunities and challenges of integrating genomic analysis into clinical care.

    PubMed

    Bowdin, Sarah; Ray, Peter N; Cohn, Ronald D; Meyn, M Stephen

    2014-05-01

    Our increasing knowledge of how genomic variants affect human health and the falling costs of whole-genome sequencing are driving the development of individualized genetic medicine. This new clinical paradigm uses knowledge of an individual's genomic variants to guide health care decisions throughout life, to anticipate, diagnose, and manage disease. While individualized genetic medicine offers the promise of transformative change in health care, it forces us to reconsider existing ethical, scientific, and clinical paradigms. The potential benefits of presymptomatic identification of at risk individuals, improved diagnostics, individualized therapy, accurate prognosis, and avoidance of adverse drug reactions coexist with the potential risks of uninterpretable results, psychological harm, outmoded counseling models, and increased health care costs. Here, we review the challenges of integrating genomic analysis into clinical practice and describe a prototype for implementing genetic medicine. Our multidisciplinary team of bioinformaticians, health economists, ethicists, geneticists, genetic counselors, and clinicians has designed a "Genome Clinic" research project that addresses multiple challenges in genomic medicine-ranging from the development of bioinformatics tools for the clinical assessment of genomic variants and the discovery of disease genes to health policy inquiries, assessment of clinical care models, patient preference, and the ethics of consent.

  16. Genomic Analysis of Natural Variation for Seed and Plant Size in Maize ( JGI Seventh Annual User Meeting 2012: Genomics of Energy and Environment)

    SciTech Connect

    Kaeppler, Shawn

    2012-03-21

    Shawn Kaeppler from the University of Wisconsin-Madison on "Genomic Analysis of Biofuel Traits in Maize and Switchgrass" at the 7th Annual Genomics of Energy & Environment Meeting on March 21, 2012 in Walnut Creek, Calif

  17. Genomic Analysis of Natural Variation for Seed and Plant Size in Maize ( JGI Seventh Annual User Meeting 2012: Genomics of Energy and Environment)

    ScienceCinema

    Kaeppler, Shawn [University of Wisconsin, Madison

    2016-07-12

    Shawn Kaeppler from the University of Wisconsin-Madison on "Genomic Analysis of Biofuel Traits in Maize and Switchgrass" at the 7th Annual Genomics of Energy & Environment Meeting on March 21, 2012 in Walnut Creek, Calif

  18. Imaging genomics

    PubMed Central

    Thompson, Paul M.; Martin, Nicholas G.; Wright, Margaret J.

    2010-01-01

    Purpose of review Imaging genomics is an emerging field that is rapidly identifying genes that influence the brain, cognition, and risk for disease. Worldwide, thousands of individuals are being scanned with high-throughput genotyping (genome-wide scans), and new imaging techniques [high angular resolution diffusion imaging and resting state functional magnetic resonance imaging (MRI)] that provide fine-grained measures of the brain’s structural and functional connectivity. Along with clinical diagnosis and cognitive testing, brain imaging offers highly reproducible measures that can be subjected to genetic analysis. Recent findings Recent studies of twin, pedigree, and population-based datasets have discovered several candidate genes that consistently show small to moderate effects on brain measures. Many studies measure single phenotypes from the images, such as hippocampal volume, but voxel-wise genomic methods can plot the profile of genetic association at each 3D point in the brain. This exploits the full arsenal of imaging statistics to discover and replicate gene effects. Summary Imaging genomics efforts worldwide are now working together to discover and replicate many promising leads. By studying brain phenotypes closer to causative gene action, larger gene effects are detectable with realistic sample sizes obtainable from meta-analysis of smaller studies. Imaging genomics has broad applications to dementia, mental illness, and public health. PMID:20581684

  19. Establishing a framework for comparative analysis of genome sequences

    SciTech Connect

    Bansal, A.K.

    1995-06-01

    This paper describes a framework and a high-level language toolkit for comparative analysis of genome sequence alignment The framework integrates the information derived from multiple sequence alignment and phylogenetic tree (hypothetical tree of evolution) to derive new properties about sequences. Multiple sequence alignments are treated as an abstract data type. Abstract operations have been described to manipulate a multiple sequence alignment and to derive mutation related information from a phylogenetic tree by superimposing parsimonious analysis. The framework has been applied on protein alignments to derive constrained columns (in a multiple sequence alignment) that exhibit evolutionary pressure to preserve a common property in a column despite mutation. A Prolog toolkit based on the framework has been implemented and demonstrated on alignments containing 3000 sequences and 3904 columns.

  20. Complete mitochondrial genome of Cervus elaphus songaricus (Cetartiodactyla: Cervinae) and a phylogenetic analysis with related species.

    PubMed

    Li, Yiqing; Ba, Hengxing; Yang, Fuhe

    2016-01-01

    Complete mitochondrial genome of Tianshan wapiti, Cervus elaphus songaricus, is 16,419 bp in length and contains 13 protein-coding genes, 2 rRNA genes, 22 tRNA genes and 1 control region. The phylogenetic trees were reconstructed with the concatenated nucleotide sequences of the 13 protein-coding genes using maximum parsimony (MP) and Bayesian inference (BI) methods. MP and BI phylogenetic trees here showed an identical tree topology. The monopoly of red deer, wapiti and sika deer was well supported, and wapiti was found to share a closer relationship with sika deer. Tianshan wapiti shared a closer relationship with xanthopygus than yarkandensis. Rusa unicolor and Rucervus eldi were given a basal phylogenetic position. Our phylogenetic analysis provided a robust phylogenetic resolution spanning the entire evolutionary relationship of the subfamily Cervinae. PMID:24725059

  1. High resolution mapping of Twist to DNA in Drosophila embryos: Efficient functional analysis and evolutionary conservation.

    PubMed

    Ozdemir, Anil; Fisher-Aylor, Katherine I; Pepke, Shirley; Samanta, Manoj; Dunipace, Leslie; McCue, Kenneth; Zeng, Lucy; Ogawa, Nobuo; Wold, Barbara J; Stathopoulos, Angelike

    2011-04-01

    Cis-regulatory modules (CRMs) function by binding sequence specific transcription factors, but the relationship between in vivo physical binding and the regulatory capacity of factor-bound DNA elements remains uncertain. We investigate this relationship for the well-studied Twist factor in Drosophila melanogaster embryos by analyzing genome-wide factor occupancy and testing the functional significance of Twist occupied regions and motifs within regions. Twist ChIP-seq data efficiently identified previously studied Twist-dependent CRMs and robustly predicted new CRM activity in transgenesis, with newly identified Twist-occupied regions supporting diverse spatiotemporal patterns (>74% positive, n = 31). Some, but not all, candidate CRMs require Twist for proper expression in the embryo. The Twist motifs most favored in genome ChIP data (in vivo) differed from those most favored by Systematic Evolution of Ligands by EXponential enrichment (SELEX) (in vitro). Furthermore, the majority of ChIP-seq signals could be parsimoniously explained by a CABVTG motif located within 50 bp of the ChIP summit and, of these, CACATG was most prevalent. Mutagenesis experiments demonstrated that different Twist E-box motif types are not fully interchangeable, suggesting that the ChIP-derived consensus (CABVTG) includes sites having distinct regulatory outputs. Further analysis of position, frequency of occurrence, and sequence conservation revealed significant enrichment and conservation of CABVTG E-box motifs near Twist ChIP-seq signal summits, preferential conservation of ±150 bp surrounding Twist occupied summits, and enrichment of GA- and CA-repeat sequences near Twist occupied summits. Our results show that high resolution in vivo occupancy data can be used to drive efficient discovery and dissection of global and local cis-regulatory logic.

  2. High-resolution array comparative genomic hybridization of chromosome 8q: evaluation of putative progression markers for gastroesophageal junction adenocarcinomas.

    PubMed

    van Duin, M; van Marion, R; Vissers, K J; Hop, W C J; Dinjens, W N M; Tilanus, H W; Siersema, P D; van Dekken, H

    2007-01-01

    Amplification of 8q is frequently found in gastroesophageal junction (GEJ) cancer. It is usually detected in high-grade, high-stage GEJ adenocarcinomas. Moreover, it has been implicated in tumor progression in other cancer types. In this study, a detailed genomic analysis of 8q was performed on a series of GEJ adenocarcinomas, including 22 primary adenocarcinomas, 13 cell lines and two xenografts, by array comparative genomic hybridization (aCGH) with a whole chromosome 8q contig array. Of the 37 specimens, 21 originated from the esophagus and 16 were derived from the gastric cardia. Commonly overrepresented regions were identified at distal 8q, i.e. 124-125 Mb (8q24.13), at 127-128 Mb (8q24.21), and at 141-142 Mb (8q24.3). From these regions six genes were selected with putative relevance to cancer: ANXA13, MTSS1, FAM84B (alias NSE2), MYC, C8orf17 (alias MOST-1) and PTK2 (alias FAK). In addition, the gene EXT1 was selected since it was found in a specific amplification in cell line SK-GT-5. Quantitative RT-PCR analysis of these seven genes was subsequently performed on a panel of 24 gastroesophageal samples, including 13 cell lines, two xenografts and nine normal stomach controls. Significant overexpression was found for MYC and EXT1 in GEJ adenocarcinoma cell lines and xenografts compared to normal controls. Expression of the genes MTSS1, FAM84B and C8orf17 was found to be significantly decreased in this set of cell lines and xenografts. We conclude that, firstly, there are other genes than MYC involved in the 8q amplification in GEJ cancer. Secondly, the differential expression of these genes contributes to unravel the biology of GEJ adenocarcinomas.

  3. High-resolution genome-wide scan of genes, gene-networks and cellular systems impacting the yeast ionome

    PubMed Central

    2012-01-01

    Background To balance the demand for uptake of essential elements with their potential toxicity living cells have complex regulatory mechanisms. Here, we describe a genome-wide screen to identify genes that impact the elemental composition (‘ionome’) of yeast Saccharomyces cerevisiae. Using inductively coupled plasma – mass spectrometry (ICP-MS) we quantify Ca, Cd, Co, Cu, Fe, K, Mg, Mn, Mo, Na, Ni, P, S and Zn in 11890 mutant strains, including 4940 haploid and 1127 diploid deletion strains, and 5798 over expression strains. Results We identified 1065 strains with an altered ionome, including 584 haploid and 35 diploid deletion strains, and 446 over expression strains. Disruption of protein metabolism or trafficking has the highest likelihood of causing large ionomic changes, with gene dosage also being important. Gene over expression produced more extreme ionomic changes, but over expression and loss of function phenotypes are generally not related. Ionomic clustering revealed the existence of only a small number of possible ionomic profiles suggesting fitness tradeoffs that constrain the ionome. Clustering also identified important roles for the mitochondria, vacuole and ESCRT pathway in regulation of the ionome. Network analysis identified hub genes such as PMR1 in Mn homeostasis, novel members of ionomic networks such as SMF3 in vacuolar retrieval of Mn, and cross-talk between the mitochondria and the vacuole. All yeast ionomic data can be searched and downloaded at http://www.ionomicshub.org. Conclusions Here, we demonstrate the power of high-throughput ICP-MS analysis to functionally dissect the ionome on a genome-wide scale. The information this reveals has the potential to benefit both human health and agriculture. PMID:23151179

  4. Population genomic analysis reveals highly conserved mitochondrial genomes in the yeast species Lachancea thermotolerans.

    PubMed

    Freel, Kelle C; Friedrich, Anne; Hou, Jing; Schacherer, Joseph

    2014-10-01

    The increasing availability of mitochondrial (mt) sequence data from various yeasts provides a tool to study genomic evolution within and between different species. While the genomes from a range of lineages are available, there is a lack of information concerning intraspecific mtDNA diversity. Here, we analyzed the mt genomes of 50 strains from Lachancea thermotolerans, a protoploid yeast species that has been isolated from several locations (Europe, Asia, Australia, South Africa, and North / South America) and ecological sources (fruit, tree exudate, plant material, and grape and agave fermentations). Protein-coding genes from the mtDNA were used to construct a phylogeny, which reflected a similar, yet less resolved topology than the phylogenetic tree of 50 nuclear genes. In comparison to its sister species Lachancea kluyveri, L. thermotolerans has a smaller mt genome. This is due to shorter intergenic regions and fewer introns, of which the latter are only found in COX1. We revealed that L. kluyveri and L. thermotolerans share similar levels of intraspecific divergence concerning the nuclear genomes. However, L. thermotolerans has a more highly conserved mt genome with the coding regions characterized by low rates of nonsynonymous substitution. Thus, in the mt genomes of L. thermotolerans, stronger purifying selection and lower mutation rates potentially shape genome diversity in contract to what was found for L. kluyveri, demonstrating that the factors driving mt genome evolution are different even between closely related species. PMID:25212859

  5. Combined Analysis of the Chloroplast Genome and Transcriptome of the Antarctic Vascular Plant Deschampsia antarctica Desv

    PubMed Central

    Lee, Jungeun; Kang, Yoonjee; Shin, Seung Chul; Park, Hyun; Lee, Hyoungseok

    2014-01-01

    Background Antarctic hairgrass (Deschampsia antarctica Desv.) is the only natural grass species in the maritime Antarctic. It has been researched as an important ecological marker and as an extremophile plant for studies on stress tolerance. Despite its importance, little genomic information is available for D. antarctica. Here, we report the complete chloroplast genome, transcriptome profiles of the coding/noncoding genes, and the posttranscriptional processing by RNA editing in the chloroplast system. Results The complete chloroplast genome of D. antarctica is 135,362 bp in length with a typical quadripartite structure, including the large (LSC: 79,881 bp) and small (SSC: 12,519 bp) single-copy regions, separated by a pair of identical inverted repeats (IR: 21,481 bp). It contains 114 unique genes, including 81 unique protein-coding genes, 29 tRNA genes, and 4 rRNA genes. Sequence divergence analysis with other plastomes from the BEP clade of the grass family suggests a sister relationship between D. antarctica, Festuca arundinacea and Lolium perenne of the Poeae tribe, based on the whole plastome. In addition, we conducted high-resolution mapping of the chloroplast-derived transcripts. Thus, we created an expression profile for 81 protein-coding genes and identified ndhC, psbJ, rps19, psaJ, and psbA as the most highly expressed chloroplast genes. Small RNA-seq analysis identified 27 small noncoding RNAs of chloroplast origin that were preferentially located near the 5′- or 3′-ends of genes. We also found >30 RNA-editing sites in the D. antarctica chloroplast genome, with a dominance of C-to-U conversions. Conclusions We assembled and characterized the complete chloroplast genome sequence of D. antarctica and investigated the features of the plastid transcriptome. These data may contribute to a better understanding of the evolution of D. antarctica within the Poaceae family for use in molecular phylogenetic studies and may also help researchers understand the

  6. CoryneBase: Corynebacterium genomic resources and analysis tools at your fingertips.

    PubMed

    Heydari, Hamed; Siow, Cheuk Chuen; Tan, Mui Fern; Jakubovics, Nick S; Wee, Wei Yee; Mutha, Naresh V R; Wong, Guat Jah; Ang, Mia Yang; Yazdi, Amir Hessam; Choo, Siew Woh

    2014-01-01

    Corynebacteria are used for a wide variety of industrial purposes but some species are associated with human diseases. With increasing number of corynebacterial genomes having been sequenced, comparative analysis of these strains may provide better understanding of their biology, phylogeny, virulence and taxonomy that may lead to the discoveries of beneficial industrial strains or contribute to better management of diseases. To facilitate the ongoing research of corynebacteria, a specialized central repository and analysis platform for the corynebacterial research community is needed to host the fast-growing amount of genomic data and facilitate the analysis of these data. Here we present CoryneBase, a genomic database for Corynebacterium with diverse functionality for the analysis of genomes aimed to provide: (1) annotated genome sequences of Corynebacterium where 165,918 coding sequences and 4,180 RNAs can be found in 27 species; (2) access to comprehensive Corynebacterium data through the use of advanced web technologies for interactive web interfaces; and (3) advanced bioinformatic analysis tools consisting of standard BLAST for homology search, VFDB BLAST for sequence homology search against the Virulence Factor Database (VFDB), Pairwise Genome Comparison (PGC) tool for comparative genomic analysis, and a newly designed Pathogenomics Profiling Tool (PathoProT) for comparative pathogenomic analysis. CoryneBase offers the access of a range of Corynebacterium genomic resources as well as analysis tools for comparative genomics and pathogenomics. It is publicly available at http://corynebacterium.um.edu.my/.

  7. High-resolution abundance analysis of HD 140283

    NASA Astrophysics Data System (ADS)

    Siqueira-Mello, C.; Andrievsky, S. M.; Barbuy, B.; Spite, M.; Spite, F.; Korotin, S. A.

    2015-12-01

    Context. HD 140283 is a reference subgiant that is metal poor and confirmed to be a very old star. The element abundances of this type of old star can constrain the nature and nucleosynthesis processes that occurred in its (even older) progenitors. The present study may shed light on nucleosynthesis processes yielding heavy elements early in the Galaxy. Aims: A detailed analysis of a high-quality spectrum is carried out, with the intent of providing a reference on stellar lines and abundances of a very old, metal-poor subgiant. We aim to derive abundances from most available and measurable spectral lines. Methods: The analysis is carried out using high-resolution (R = 81 000) and high signal-to-noise ratio (800 analysis in non-LTE (NLTE) is based on the MULTI code. We present LTE abundances for 26 elements, and NLTE calculations for the species C i, O i, Na i, Mg i, Al i, K i, Ca i, Sr ii, and Ba ii lines. Results: The abundance analysis provided an extensive line list suitable for metal-poor subgiant stars. The results for Li, CNO, α-, and iron peak elements are in good agreement with literature. The newly NLTE Ba abundance, along with a NLTE Eu correction and a 3D Ba correction from literature, leads to [Eu/Ba] = + 0.59 ± 0.18. This result confirms a dominant r-process contribution, possibly together with a very small contribution from the main s-process, to the neutron-capture elements in HD 140283. Overabundances of the lighter heavy elements and the high abundances derived for Ba, La, and Ce favour the operation of the weak r-process in HD 140283

  8. Breeding nursery tissue collection for possible genomic analysis

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Phenotyping is considered a major bottleneck in breeding programs. With new genomic technologies, high throughput genotype schemes are constantly being developed. However, every genomic technology requires phenotypic data to inform prediction models generated from the technology. Forage breeders con...

  9. Genomic analysis of stress response against arsenic in Caenorhabditis elegans.

    PubMed

    Sahu, Surasri N; Lewis, Jada; Patel, Isha; Bozdag, Serdar; Lee, Jeong H; Sprando, Robert; Cinar, Hediye Nese

    2013-01-01

    Arsenic, a known human carcinogen, is widely distributed around the world and found in particularly high concentrations in certain regions including Southwestern US, Eastern Europe, India, China, Taiwan and Mexico. Chronic arsenic poisoning affects millions of people worldwide and is associated with increased risk of many diseases including arthrosclerosis, diabetes and cancer. In this study, we explored genome level global responses to high and low levels of arsenic exposure in Caenorhabditis elegans using Affymetrix expression microarrays. This experimental design allows us to do microarray analysis of dose-response relationships of global gene expression patterns. High dose (0.03%) exposure caused stronger global gene expression changes in comparison with low dose (0.003%) exposure, suggesting a positive dose-response correlation. Biological processes such as oxidative stress, and iron metabolism, which were previously reported to be involved in arsenic toxicity studies using cultured cells, experimental animals, and humans, were found to be affected in C. elegans. We performed genome-wide gene expression comparisons between our microarray data and publicly available C. elegans microarray datasets of cadmium, and sediment exposure samples of German rivers Rhine and Elbe. Bioinformatics analysis of arsenic-responsive regulatory networks were done using FastMEDUSA program. FastMEDUSA analysis identified cancer-related genes, particularly genes associated with leukemia, such as dnj-11, which encodes a protein orthologous to the mammalian ZRF1/MIDA1/MPP11/DNAJC2 family of ribosome-associated molecular chaperones. We analyzed the protective functions of several of the identified genes using RNAi. Our study indicates that C. elegans could be a substitute model to study the mechanism of metal toxicity using high-throughput expression data and bioinformatics tools such as FastMEDUSA.

  10. Genomic analysis of stress response against arsenic in Caenorhabditis elegans.

    PubMed

    Sahu, Surasri N; Lewis, Jada; Patel, Isha; Bozdag, Serdar; Lee, Jeong H; Sprando, Robert; Cinar, Hediye Nese

    2013-01-01

    Arsenic, a known human carcinogen, is widely distributed around the world and found in particularly high concentrations in certain regions including Southwestern US, Eastern Europe, India, China, Taiwan and Mexico. Chronic arsenic poisoning affects millions of people worldwide and is associated with increased risk of many diseases including arthrosclerosis, diabetes and cancer. In this study, we explored genome level global responses to high and low levels of arsenic exposure in Caenorhabditis elegans using Affymetrix expression microarrays. This experimental design allows us to do microarray analysis of dose-response relationships of global gene expression patterns. High dose (0.03%) exposure caused stronger global gene expression changes in comparison with low dose (0.003%) exposure, suggesting a positive dose-response correlation. Biological processes such as oxidative stress, and iron metabolism, which were previously reported to be involved in arsenic toxicity studies using cultured cells, experimental animals, and humans, were found to be affected in C. elegans. We performed genome-wide gene expression comparisons between our microarray data and publicly available C. elegans microarray datasets of cadmium, and sediment exposure samples of German rivers Rhine and Elbe. Bioinformatics analysis of arsenic-responsive regulatory networks were done using FastMEDUSA program. FastMEDUSA analysis identified cancer-related genes, particularly genes associated with leukemia, such as dnj-11, which encodes a protein orthologous to the mammalian ZRF1/MIDA1/MPP11/DNAJC2 family of ribosome-associated molecular chaperones. We analyzed the protective functions of several of the identified genes using RNAi. Our study indicates that C. elegans could be a substitute model to study the mechanism of metal toxicity using high-throughput expression data and bioinformatics tools such as FastMEDUSA. PMID:23894281

  11. Genomic Analysis of Stress Response against Arsenic in Caenorhabditis elegans

    PubMed Central

    Sahu, Surasri N.; Lewis, Jada; Patel, Isha; Bozdag, Serdar; Lee, Jeong H.; Sprando, Robert; Cinar, Hediye Nese

    2013-01-01

    Arsenic, a known human carcinogen, is widely distributed around the world and found in particularly high concentrations in certain regions including Southwestern US, Eastern Europe, India, China, Taiwan and Mexico. Chronic arsenic poisoning affects millions of people worldwide and is associated with increased risk of many diseases including arthrosclerosis, diabetes and cancer. In this study, we explored genome level global responses to high and low levels of arsenic exposure in Caenorhabditis elegans using Affymetrix expression microarrays. This experimental design allows us to do microarray analysis of dose-response relationships of global gene expression patterns. High dose (0.03%) exposure caused stronger global gene expression changes in comparison with low dose (0.003%) exposure, suggesting a positive dose-response correlation. Biological processes such as oxidative stress, and iron metabolism, which were previously reported to be involved in arsenic toxicity studies using cultured cells, experimental animals, and humans, were found to be affected in C. elegans. We performed genome-wide gene expression comparisons between our microarray data and publicly available C. elegans microarray datasets of cadmium, and sediment exposure samples of German rivers Rhine and Elbe. Bioinformatics analysis of arsenic-responsive regulatory networks were done using FastMEDUSA program. FastMEDUSA analysis identified cancer-related genes, particularly genes associated with leukemia, such as dnj-11, which encodes a protein orthologous to the mammalian ZRF1/MIDA1/MPP11/DNAJC2 family of ribosome-associated molecular chaperones. We analyzed the protective functions of several of the identified genes using RNAi. Our study indicates that C. elegans could be a substitute model to study the mechanism of metal toxicity using high-throughput expression data and bioinformatics tools such as FastMEDUSA. PMID:23894281

  12. Delineation of Steroid-Degrading Microorganisms through Comparative Genomic Analysis

    PubMed Central

    Bergstrand, Lee H.; Cardenas, Erick; Holert, Johannes; Van Hamme, Jonathan D.

    2016-01-01

    ABSTRACT Steroids are ubiquitous in natural environments and are a significant growth substrate for microorganisms. Microbial steroid metabolism is also important for some pathogens and for biotechnical applications. This study delineated the distribution of aerobic steroid catabolism pathways among over 8,000 microorganisms whose genomes are available in the NCBI RefSeq database. Combined analysis of bacterial, archaeal, and fungal genomes with both hidden Markov models and reciprocal BLAST identified 265 putative steroid degraders within only Actinobacteria and Proteobacteria, which mainly originated from soil, eukaryotic host, and aquatic environments. These bacteria include members of 17 genera not previously known to contain steroid degraders. A pathway for cholesterol degradation was conserved in many actinobacterial genera, particularly in members of the Corynebacterineae, and a pathway for cholate degradation was conserved in members of the genus Rhodococcus. A pathway for testosterone and, sometimes, cholate degradation had a patchy distribution among Proteobacteria. The steroid degradation genes tended to occur within large gene clusters. Growth experiments confirmed bioinformatic predictions of steroid metabolism capacity in nine bacterial strains. The results indicate there was a single ancestral 9,10-seco-steroid degradation pathway. Gene duplication, likely in a progenitor of Rhodococcus, later gave rise to a cholate degradation pathway. Proteobacteria and additional Actinobacteria subsequently obtained a cholate degradation pathway via horizontal gene transfer, in some cases facilitated by plasmids. Catabolism of steroids appears to be an important component of the ecological niches of broad groups of Actinobacteria and individual species of Proteobacteria. PMID:26956583

  13. Genome-Wide Analysis of Human Metapneumovirus Evolution

    PubMed Central

    Kim, Jin Il; Park, Sehee; Lee, Ilseob; Park, Kwang Sook; Kwak, Eun Jung; Moon, Kwang Mee; Lee, Chang Kyu; Bae, Joon-Yong; Park, Man-Seong; Song, Ki-Joon

    2016-01-01

    Human metapneumovirus (HMPV) has been described as an important etiologic agent of upper and lower respiratory tract infections, especially in young children and the elderly. Most of school-aged children might be introduced to HMPVs, and exacerbation with other viral or bacterial super-infection is common. However, our understanding of the molecular evolution of HMPVs remains limited. To address the comprehensive evolutionary dynamics of HMPVs, we report a genome-wide analysis of the eight genes (N, P, M, F, M2, SH, G, and L) using 103 complete genome sequences. Phylogenetic reconstruction revealed that the eight genes from one HMPV strain grouped into the same genetic group among the five distinct lineages (A1, A2a, A2b, B1, and B2). A few exceptions of phylogenetic incongruence might suggest past recombination events, and we detected possible recombination breakpoints in the F, SH, and G coding regions. The five genetic lineages of HMPVs shared quite remote common ancestors ranging more than 220 to 470 years of age with the most recent origins for the A2b sublineage. Purifying selection was common, but most protein genes except the F and M2-2 coding regions also appeared to experience episodic diversifying selection. Taken together, these suggest that the five lineages of HMPVs maintain their individual evolutionary dynamics and that recombination and selection forces might work on shaping the genetic diversity of HMPVs. PMID:27046055

  14. 13C metabolic flux analysis at a genome-scale.

    PubMed

    Gopalakrishnan, Saratram; Maranas, Costas D

    2015-11-01

    Metabolic models used in 13C metabolic flux analysis generally include a limited number of reactions primarily from central metabolism. They typically omit degradation pathways, complete cofactor balances, and atom transition contributions for reactions outside central metabolism. This study addresses the impact on prediction fidelity of scaling-up mapping models to a genome-scale. The core mapping model employed in this study accounts for (75 reactions and 65 metabolites) primarily from central metabolism. The genome-scale metabolic mapping model (GSMM) (697 reaction and 595 metabolites) is constructed using as a basis the iAF1260 model upon eliminating reactions guaranteed not to carry flux based on growth and fermentation data for a minimal glucose growth medium. Labeling data for 17 amino acid fragments obtained from cells fed with glucose labeled at the second carbon was used to obtain fluxes and ranges. Metabolic fluxes and confidence intervals are estimated, for both core and genome-scale mapping models, by minimizing the sum of square of differences between predicted and experimentally measured labeling patterns using the EMU decomposition algorithm. Overall, we find that both topology and estimated values of the metabolic fluxes remain largely consistent between core and GSM model. Stepping up to a genome-scale mapping model leads to wider flux inference ranges for 20 key reactions present in the core model. The glycolysis flux range doubles due to the possibility of active gluconeogenesis, the TCA flux range expanded by 80% due to the availability of a bypass through arginine consistent with labeling data, and the transhydrogenase reaction flux was essentially unresolved due to the presence of as many as five routes for the inter-conversion of NADPH to NADH afforded by the genome-scale model. By globally accounting for ATP demands in the GSMM model the unused ATP decreased drastically with the lower bound matching the maintenance ATP requirement. A non

  15. The crux and crust of ebolavirus: Analysis of genome sequences and glycoprotein gene.

    PubMed

    Mahale, Kiran Narasinha; Patole, Milind S

    2015-08-01

    The recent 2013-15 epidemic of Ebola virus disease (EVD) has initiated extensive sequencing and analysis of ebolavirus genomes. All ebolavirus genomes available until December 2014 have been collated and analyzed in this study to obtain phylogenetic relationship and uncover the variations amongst them. The terminal 'leader' and 'trailer' nucleotide sequences of the genomes were omitted and analysis of the intermediate region accommodating the sole seven genes (hepta-CDS region) of the virus showed relative stability of the genome, including the ones isolated from the current epidemic. The genome information was scrutinized to detect the variation in the surface glycoprotein gene and annotate its three protein products, resulting from its atypical transcription. This study will make an easy understanding of the genomes for those who desire to exploit the genome sequences for different investigations in EVD. PMID:26051281

  16. Tomato functional genomics database (TFGD): a comprehensive collection and analysis package for tomato functional genomics

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Tomato Functional Genomics Database (TFGD; http://ted.bti.cornell.edu) provides a comprehensive systems biology resource to store, mine, analyze, visualize and integrate large-scale tomato functional genomics datasets. The database is expanded from the previously described Tomato Expression Database...

  17. HIGH RESOLUTION FOURIER ANALYSIS WITH AUTO-REGRESSIVE LINEAR PREDICTION

    SciTech Connect

    Barton, J.; Shirley, D.A.

    1984-04-01

    Auto-regressive linear prediction is adapted to double the resolution of Angle-Resolved Photoemission Extended Fine Structure (ARPEFS) Fourier transforms. Even with the optimal taper (weighting function), the commonly used taper-and-transform Fourier method has limited resolution: it assumes the signal is zero beyond the limits of the measurement. By seeking the Fourier spectrum of an infinite extent oscillation consistent with the measurements but otherwise having maximum entropy, the errors caused by finite data range can be reduced. Our procedure developed to implement this concept adapts auto-regressive linear prediction to extrapolate the signal in an effective and controllable manner. Difficulties encountered when processing actual ARPEFS data are discussed. A key feature of this approach is the ability to convert improved measurements (signal-to-noise or point density) into improved Fourier resolution.

  18. Analysis of 81 genes from 64 plastid genomes resolves relationships in angiosperms and identifies genome-scale evolutionary patterns

    PubMed Central

    Jansen, Robert K.; Cai, Zhengqiu; Raubeson, Linda A.; Daniell, Henry; dePamphilis, Claude W.; Leebens-Mack, James; Müller, Kai F.; Guisinger-Bellian, Mary; Haberle, Rosemarie C.; Hansen, Anne K.; Chumley, Timothy W.; Lee, Seung-Bum; Peery, Rhiannon; McNeal, Joel R.; Kuehl, Jennifer V.; Boore, Jeffrey L.

    2007-01-01

    Angiosperms are the largest and most successful clade of land plants with >250,000 species distributed in nearly every terrestrial habitat. Many phylogenetic studies have been based on DNA sequences of one to several genes, but, despite decades of intensive efforts, relationships among early diverging lineages and several of the major clades remain either incompletely resolved or weakly supported. We performed phylogenetic analyses of 81 plastid genes in 64 sequenced genomes, including 13 new genomes, to estimate relationships among the major angiosperm clades, and the resulting trees are used to examine the evolution of gene and intron content. Phylogenetic trees from multiple methods, including model-based approaches, provide strong support for the position of Amborella as the earliest diverging lineage of flowering plants, followed by Nymphaeales and Austrobaileyales. The plastid genome trees also provide strong support for a sister relationship between eudicots and monocots, and this group is sister to a clade that includes Chloranthales and magnoliids. Resolution of relationships among the major clades of angiosperms provides the necessary framework for addressing numerous evolutionary questions regarding the rapid diversification of angiosperms. Gene and intron content are highly conserved among the early diverging angiosperms and basal eudicots, but 62 independent gene and intron losses are limited to the more derived monocot and eudicot clades. Moreover, a lineage-specific correlation was detected between rates of nucleotide substitutions, indels, and genomic rearrangements. PMID:18048330

  19. Uncertainty Analysis in the Creation of a Fine-Resolution Leaf Area Index (LAI) Reference Map for Validation of Moderate Resolution LAI Products

    EPA Science Inventory

    The validation process for a moderate resolution leaf area index (LAI) product (i.e., MODIS) involves the creation of a high spatial resolution LAI reference map (Lai-RM), which when scaled to the moderate LAI resolution (i.e., >1 km) allows for comparison and analysis with this ...

  20. Genome sequence analysis of the model grass Brachypodium distachyon: insights into grass genome evolution

    SciTech Connect

    Schulman, Al

    2009-08-09

    Three subfamilies of grasses, the Erhardtoideae (rice), the Panicoideae (maize, sorghum, sugar cane and millet), and the Pooideae (wheat, barley and cool season forage grasses) provide the basis of human nutrition and are poised to become major sources of renewable energy. Here we describe the complete genome sequence of the wild grass Brachypodium distachyon (Brachypodium), the first member of the Pooideae subfamily to be completely sequenced. Comparison of the Brachypodium, rice and sorghum genomes reveals a precise sequence- based history of genome evolution across a broad diversity of the grass family and identifies nested insertions of whole chromosomes into centromeric regions as a predominant mechanism driving chromosome evolution in the grasses. The relatively compact genome of Brachypodium is maintained by a balance of retroelement replication and loss. The complete genome sequence of Brachypodium, coupled to its exceptional promise as a model system for grass research, will support the development of new energy and food crops

  1. Genome Information Broker for Viruses (GIB-V): database for comparative analysis of virus genomes

    PubMed Central

    Hirahata, Masaki; Abe, Takashi; Tanaka, Naoto; Kuwana, Yoshikazu; Shigemoto, Yasumasa; Miyazaki, Satoru; Suzuki, Yoshiyuki; Sugawara, Hideaki

    2007-01-01

    Genome Information Broker for Viruses (GIB-V) is a comprehensive virus genome/segment database. We extracted 18 418 complete virus genomes/segments from the International Nucleotide Sequence Database Collaboration (INSDC, ) by DNA Data Bank of Japan (DDBJ), EMBL and GenBank and stored them in our system. The list of registered viruses is arranged hierarchically according to taxonomy. Keyword searches can be performed for genome/segment data or biological features of any virus stored in GIB-V. GIB-V is equipped with a BLAST search function, and search results are displayed graphically or in list form. Moreover, the BLAST results can be used online with the ClustalW feature of the DDBJ. All available virus genome/segment data can be collected by the GIB-V download function. GIB-V can be accessed at no charge at . PMID:17158166

  2. The tiger genome and comparative analysis with lion and snow leopard genomes.

    PubMed

    Cho, Yun Sung; Hu, Li; Hou, Haolong; Lee, Hang; Xu, Jiaohui; Kwon, Soowhan; Oh, Sukhun; Kim, Hak-Min; Jho, Sungwoong; Kim, Sangsoo; Shin, Young-Ah; Kim, Byung Chul; Kim, Hyunmin; Kim, Chang-Uk; Luo, Shu-Jin; Johnson, Warren E; Koepfli, Klaus-Peter; Schmidt-Küntzel, Anne; Turner, Jason A; Marker, Laurie; Harper, Cindy; Miller, Susan M; Jacobs, Wilhelm; Bertola, Laura D; Kim, Tae Hyung; Lee, Sunghoon; Zhou, Qian; Jung, Hyun-Ju; Xu, Xiao; Gadhvi, Priyvrat; Xu, Pengwei; Xiong, Yingqi; Luo, Yadan; Pan, Shengkai; Gou, Caiyun; Chu, Xiuhui; Zhang, Jilin; Liu, Sanyang; He, Jing; Chen, Ying; Yang, Linfeng; Yang, Yulan; He, Jiaju; Liu, Sha; Wang, Junyi; Kim, Chul Hong; Kwak, Hwanjong; Kim, Jong-Soo; Hwang, Seungwoo; Ko, Junsu; Kim, Chang-Bae; Kim, Sangtae; Bayarlkhagva, Damdin; Paek, Woon Kee; Kim, Seong-Jin; O'Brien, Stephen J; Wang, Jun; Bhak, Jong

    2013-01-01

    Tigers and their close relatives (Panthera) are some of the world's most endangered species. Here we report the de novo assembly of an Amur tiger whole-genome sequence as well as the genomic sequences of a white Bengal tiger, African lion, white African lion and snow leopard. Through comparative genetic analyses of these genomes, we find genetic signatures that may reflect molecular adaptations consistent with the big cats' hypercarnivorous diet and muscle strength. We report a snow leopard-specific genetic determinant in EGLN1 (Met39>Lys39), which is likely to be associated with adaptation to high altitude. We also detect a TYR260G>A mutation likely responsible for the white lion coat colour. Tiger and cat genomes show similar repeat composition and an appreciably conserved synteny. Genomic data from the five big cats provide an invaluable resource for resolving easily identifiable phenotypes evident in very close, but distinct, species.

  3. The tiger genome and comparative analysis with lion and snow leopard genomes

    PubMed Central

    Cho, Yun Sung; Hu, Li; Hou, Haolong; Lee, Hang; Xu, Jiaohui; Kwon, Soowhan; Oh, Sukhun; Kim, Hak-Min; Jho, Sungwoong; Kim, Sangsoo; Shin, Young-Ah; Kim, Byung Chul; Kim, Hyunmin; Kim, Chang-uk; Luo, Shu-Jin; Johnson, Warren E.; Koepfli, Klaus-Peter; Schmidt-Küntzel, Anne; Turner, Jason A.; Marker, Laurie; Harper, Cindy; Miller, Susan M.; Jacobs, Wilhelm; Bertola, Laura D.; Kim, Tae Hyung; Lee, Sunghoon; Zhou, Qian; Jung, Hyun-Ju; Xu, Xiao; Gadhvi, Priyvrat; Xu, Pengwei; Xiong, Yingqi; Luo, Yadan; Pan, Shengkai; Gou, Caiyun; Chu, Xiuhui; Zhang, Jilin; Liu, Sanyang; He, Jing; Chen, Ying; Yang, Linfeng; Yang, Yulan; He, Jiaju; Liu, Sha; Wang, Junyi; Kim, Chul Hong; Kwak, Hwanjong; Kim, Jong-Soo; Hwang, Seungwoo; Ko, Junsu; Kim, Chang-Bae; Kim, Sangtae; Bayarlkhagva, Damdin; Paek, Woon Kee; Kim, Seong-Jin; O’Brien, Stephen J.; Wang, Jun; Bhak, Jong

    2013-01-01

    Tigers and their close relatives (Panthera) are some of the world’s most endangered species. Here we report the de novo assembly of an Amur tiger whole-genome sequence as well as the genomic sequences of a white Bengal tiger, African lion, white African lion and snow leopard. Through comparative genetic analyses of these genomes, we find genetic signatures that may reflect molecular adaptations consistent with the big cats’ hypercarnivorous diet and muscle strength. We report a snow leopard-specific genetic determinant in EGLN1 (Met39>Lys39), which is likely to be associated with adaptation to high altitude. We also detect a TYR260G>A mutation likely responsible for the white lion coat colour. Tiger and cat genomes show similar repeat composition and an appreciably conserved synteny. Genomic data from the five big cats provide an invaluable resource for resolving easily identifiable phenotypes evident in very close, but distinct, species. PMID:24045858

  4. The tiger genome and comparative analysis with lion and snow leopard genomes.

    PubMed

    Cho, Yun Sung; Hu, Li; Hou, Haolong; Lee, Hang; Xu, Jiaohui; Kwon, Soowhan; Oh, Sukhun; Kim, Hak-Min; Jho, Sungwoong; Kim, Sangsoo; Shin, Young-Ah; Kim, Byung Chul; Kim, Hyunmin; Kim, Chang-Uk; Luo, Shu-Jin; Johnson, Warren E; Koepfli, Klaus-Peter; Schmidt-Küntzel, Anne; Turner, Jason A; Marker, Laurie; Harper, Cindy; Miller, Susan M; Jacobs, Wilhelm; Bertola, Laura D; Kim, Tae Hyung; Lee, Sunghoon; Zhou, Qian; Jung, Hyun-Ju; Xu, Xiao; Gadhvi, Priyvrat; Xu, Pengwei; Xiong, Yingqi; Luo, Yadan; Pan, Shengkai; Gou, Caiyun; Chu, Xiuhui; Zhang, Jilin; Liu, Sanyang; He, Jing; Chen, Ying; Yang, Linfeng; Yang, Yulan; He, Jiaju; Liu, Sha; Wang, Junyi; Kim, Chul Hong; Kwak, Hwanjong; Kim, Jong-Soo; Hwang, Seungwoo; Ko, Junsu; Kim, Chang-Bae; Kim, Sangtae; Bayarlkhagva, Damdin; Paek, Woon Kee; Kim, Seong-Jin; O'Brien, Stephen J; Wang, Jun; Bhak, Jong

    2013-01-01

    Tigers and their close relatives (Panthera) are some of the world's most endangered species. Here we report the de novo assembly of an Amur tiger whole-genome sequence as well as the genomic sequences of a white Bengal tiger, African lion, white African lion and snow leopard. Through comparative genetic analyses of these genomes, we find genetic signatures that may reflect molecular adaptations consistent with the big cats' hypercarnivorous diet and muscle strength. We report a snow leopard-specific genetic determinant in EGLN1 (Met39>Lys39), which is likely to be associated with adaptation to high altitude. We also detect a TYR260G>A mutation likely responsible for the white lion coat colour. Tiger and cat genomes show similar repeat composition and an appreciably conserved synteny. Genomic data from the five big cats provide an invaluable resource for resolving easily identifiable phenotypes evident in very close, but distinct, species. PMID:24045858

  5. A practical guide to environmental association analysis in landscape genomics.

    PubMed

    Rellstab, Christian; Gugerli, Felix; Eckert, Andrew J; Hancock, Angela M; Holderegger, Rolf

    2015-09-01

    Landscape genomics is an emerging research field that aims to identify the environmental factors that shape adaptive genetic variation and the gene variants that drive local adaptation. Its development has been facilitated by next-generation sequencing, which allows for screening thousands to millions of single nucleotide polymorphisms in many individuals and populations at reasonable costs. In parallel, data sets describing environmental factors have greatly improved and increasingly become publicly accessible. Accordingly, numerous analytical methods for environmental association studies have been developed. Environmental association analysis identifies genetic variants associated with particular environmental factors and has the potential to uncover adaptive patterns that are not discovered by traditional tests for the detection of outlier loci based on population genetic differentiation. We review methods for conducting environmental association analysis including categorical tests, logistic regressions, matrix correlations, general linear models and mixed effects models. We discuss the advantages and disadvantages of different approaches, provide a list of dedicated software packages and their specific properties, and stress the importance of incorporating neutral genetic structure in the analysis. We also touch on additional important aspects such as sampling design, environmental data preparation, pooled and reduced-representation sequencing, candidate-gene approaches, linearity of allele-environment associations and the combination of environmental association analyses with traditional outlier detection tests. We conclude by summarizing expected future directions in the field, such as the extension of statistical approaches, environmental association analysis for ecological gene annotation, and the need for replication and post hoc validation studies.

  6. High resolution melting (HRM) analysis of DNA--its role and potential in food analysis.

    PubMed

    Druml, Barbara; Cichna-Markl, Margit

    2014-09-01

    DNA based methods play an increasing role in food safety control and food adulteration detection. Recent papers show that high resolution melting (HRM) analysis is an interesting approach. It involves amplification of the target of interest in the presence of a saturation dye by the polymerase chain reaction (PCR) and subsequent melting of the amplicons by gradually increasing the temperature. Since the melting profile depends on the GC content, length, sequence and strand complementarity of the product, HRM analysis is highly suitable for the detection of single-base variants and small insertions or deletions. The review gives an introduction into HRM analysis, covers important aspects in the development of an HRM analysis method and describes how HRM data are analysed and interpreted. Then we discuss the potential of HRM analysis based methods in food analysis, i.e. for the identification of closely related species and cultivars and the identification of pathogenic microorganisms.

  7. Comparative Genomics Analysis of Streptomyces Species Reveals Their Adaptation to the Marine Environment and Their Diversity at the Genomic Level

    PubMed Central

    Tian, Xinpeng; Zhang, Zhewen; Yang, Tingting; Chen, Meili; Li, Jie; Chen, Fei; Yang, Jin; Li, Wenjie; Zhang, Bing; Zhang, Zhang; Wu, Jiayan; Zhang, Changsheng; Long, Lijuan; Xiao, Jingfa

    2016-01-01

    Over 200 genomes of streptomycete strains that were isolated from various environments are available from the NCBI. However, little is known about the characteristics that are linked to marine adaptation in marine-derived streptomycetes. The particularity and complexity of the marine environment suggest that marine streptomycetes are genetically diverse. Here, we sequenced nine strains from the Streptomyces genus that were isolated from different longitudes, latitudes, and depths of the South China Sea. Then we compared these strains to 22 NCBI downloaded streptomycete strains. Thirty-one streptomycete strains are clearly grouped into a marine-derived subgroup and multiple source subgroup-based phylogenetic tree. The phylogenetic analyses have revealed the dynamic process underlying streptomycete genome evolution, and lateral gene transfer is an important driving force during the process. Pan-genomics analyses have revealed that streptomycetes have an open pan-genome, which reflects the diversity of these streptomycetes and guarantees the species a quick and economical response to diverse environments. Functional and comparative genomics analyses indicate that the marine-derived streptomycetes subgroup possesses some common characteristics of marine adaptation. Our findings have expanded our knowledge of how ocean isolates of streptomycete strains adapt to marine environments. The availability of streptomycete genomes from the South China Sea will be beneficial for further analysis on marine streptomycetes and will enrich the South China Sea’s genetic data sources. PMID:27446038

  8. Ethical considerations of research policy for personal genome analysis: the approach of the Genome Science Project in Japan.

    PubMed

    Minari, Jusaku; Shirai, Tetsuya; Kato, Kazuto

    2014-12-01

    As evidenced by high-throughput sequencers, genomic technologies have recently undergone radical advances. These technologies enable comprehensive sequencing of personal genomes considerably more efficiently and less expensively than heretofore. These developments present a challenge to the conventional framework of biomedical ethics; under these changing circumstances, each research project has to develop a pragmatic research policy. Based on the experience with a new large-scale project-the Genome Science Project-this article presents a novel approach to conducting a specific policy for personal genome research in the Japanese context. In creating an original informed-consent form template for the project, we present a two-tiered process: making the draft of the template following an analysis of national and international policies; refining the draft template in conjunction with genome project researchers for practical application. Through practical use of the template, we have gained valuable experience in addressing challenges in the ethical review process, such as the importance of sharing details of the latest developments in genomics with members of research ethics committees. We discuss certain limitations of the conventional concept of informed consent and its governance system and suggest the potential of an alternative process using information technology.

  9. Comparative Genomics Analysis of Streptomyces Species Reveals Their Adaptation to the Marine Environment and Their Diversity at the Genomic Level.

    PubMed

    Tian, Xinpeng; Zhang, Zhewen; Yang, Tingting; Chen, Meili; Li, Jie; Chen, Fei; Yang, Jin; Li, Wenjie; Zhang, Bing; Zhang, Zhang; Wu, Jiayan; Zhang, Changsheng; Long, Lijuan; Xiao, Jingfa

    2016-01-01

    Over 200 genomes of streptomycete strains that were isolated from various environments are available from the NCBI. However, little is known about the characteristics that are linked to marine adaptation in marine-derived streptomycetes. The particularity and complexity of the marine environment suggest that marine streptomycetes are genetically diverse. Here, we sequenced nine strains from the Streptomyces genus that were isolated from different longitudes, latitudes, and depths of the South China Sea. Then we compared these strains to 22 NCBI downloaded streptomycete strains. Thirty-one streptomycete strains are clearly grouped into a marine-derived subgroup and multiple source subgroup-based phylogenetic tree. The phylogenetic analyses have revealed the dynamic process underlying streptomycete genome evolution, and lateral gene transfer is an important driving force during the process. Pan-genomics analyses have revealed that streptomycetes have an open pan-genome, which reflects the diversity of these streptomycetes and guarantees the species a quick and economical response to diverse environments. Functional and comparative genomics analyses indicate that the marine-derived streptomycetes subgroup possesses some common characteristics of marine adaptation. Our findings have expanded our knowledge of how ocean isolates of streptomycete strains adapt to marine environments. The availability of streptomycete genomes from the South China Sea will be beneficial for further analysis on marine streptomycetes and will enrich the South China Sea's genetic data sources. PMID:27446038

  10. Comparative Genomics Analysis of Streptomyces Species Reveals Their Adaptation to the Marine Environment and Their Diversity at the Genomic Level.

    PubMed

    Tian, Xinpeng; Zhang, Zhewen; Yang, Tingting; Chen, Meili; Li, Jie; Chen, Fei; Yang, Jin; Li, Wenjie; Zhang, Bing; Zhang, Zhang; Wu, Jiayan; Zhang, Changsheng; Long, Lijuan; Xiao, Jingfa

    2016-01-01

    Over 200 genomes of streptomycete strains that were isolated from various environments are available from the NCBI. However, little is known about the characteristics that are linked to marine adaptation in marine-derived streptomycetes. The particularity and complexity of the marine environment suggest that marine streptomycetes are genetically diverse. Here, we sequenced nine strains from the Streptomyces genus that were isolated from different longitudes, latitudes, and depths of the South China Sea. Then we compared these strains to 22 NCBI downloaded streptomycete strains. Thirty-one streptomycete strains are clearly grouped into a marine-derived subgroup and multiple source subgroup-based phylogenetic tree. The phylogenetic analyses have revealed the dynamic process underlying streptomycete genome evolution, and lateral gene transfer is an important driving force during the process. Pan-genomics analyses have revealed that streptomycetes have an open pan-genome, which reflects the diversity of these streptomycetes and guarantees the species a quick and economical response to diverse environments. Functional and comparative genomics analyses indicate that the marine-derived streptomycetes subgroup possesses some common characteristics of marine adaptation. Our findings have expanded our knowledge of how ocean isolates of streptomycete strains adapt to marine environments. The availability of streptomycete genomes from the South China Sea will be beneficial for further analysis on marine streptomycetes and will enrich the South China Sea's genetic data sources.

  11. Exploring a Nonmodel Teleost Genome Through RAD Sequencing—Linkage Mapping in Common Pandora, Pagellus erythrinus and Comparative Genomic Analysis

    PubMed Central

    Manousaki, Tereza; Tsakogiannis, Alexandros; Taggart, John B.; Palaiokostas, Christos; Tsaparis, Dimitris; Lagnel, Jacques; Chatziplis, Dimitrios; Magoulas, Antonios; Papandroulakis, Nikos; Mylonas, Constantinos C.; Tsigenopoulos, Costas S.

    2015-01-01

    Common pandora (Pagellus erythrinus) is a benthopelagic marine fish belonging to the teleost family Sparidae, and a newly recruited species in Mediterranean aquaculture. The paucity of genetic information relating to sparids, despite their growing economic value for aquaculture, provides the impetus for exploring the genomics of this fish group. Genomic tool development, such as genetic linkage maps provision, lays the groundwork for linking genotype to phenotype, allowing fine-mapping of loci responsible for beneficial traits. In this study, we applied ddRAD methodology to identify polymorphic markers in a full-sib family of common pandora. Employing the Illumina MiSeq platform, we sampled and sequenced a size-selected genomic fraction of 99 individuals, which led to the identification of 920 polymorphic loci. Downstream mapping analysis resulted in the construction of 24 robust linkage groups, corresponding to the karyotype of the species. The common pandora linkage map showed varying degrees of conserved synteny with four other teleost genomes, namely the European seabass (Dicentrarchus labrax), Nile tilapia (Oreochromis niloticus), stickleback (Gasterosteus aculeatus), and medaka (Oryzias latipes), suggesting a conserved genomic evolution in Sparidae. Our work exploits the possibilities of genotyping by sequencing to gain novel insights into genome structure and evolution. Such information will boost the study of cultured species and will set the foundation for a deeper understanding of the complex evolutionary history of teleosts. PMID:26715088

  12. Comparative Genomics Analysis of Rice and Pineapple Contributes to Understand the Chromosome Number Reduction and Genomic Changes in Grasses

    PubMed Central

    Wang, Jinpeng; Yu, Jiaxiang; Sun, Pengchuan; Li, Yuxian; Xia, Ruiyan; Liu, Yinzhe; Ma, Xuelian; Yu, Jigao; Yang, Nanshan; Lei, Tianyu; Wang, Zhenyi; Wang, Li; Ge, Weina; Song, Xiaoming; Liu, Xiaojian; Sun, Sangrong; Liu, Tao; Jin, Dianchuan; Pan, Yuxin; Wang, Xiyin

    2016-01-01

    Rice is one of the most researched model plant, and has a genome structure most resembling that of the grass common ancestor after a grass common tetraploidization ∼100 million years ago. There has been a standing controversy whether there had been five or seven basic chromosomes, before the tetraploidization, which were tackled but could not be well solved for the lacking of a sequenced and assembled outgroup plant to have a conservative genome structure. Recently, the availability of pineapple genome, which has not been subjected to the grass-common tetraploidization, provides a precious opportunity to solve the above controversy and to research into genome changes of rice and other grasses. Here, we performed a comparative genomics analysis of pineapple and rice, and found solid evidence that grass-common ancestor had 2n = 2x = 14 basic chromosomes before the tetraploidization and duplicated to 2n = 4x = 28 after the event. Moreover, we proposed that enormous gene missing from duplicated regions in rice should be explained by an allotetraploid produced by prominently divergent parental lines, rather than gene losses after their divergence. This means that genome fractionation might have occurred before the formation of the allotetraploid grass ancestor. PMID:27757123

  13. STINGRAY: system for integrated genomic resources and analysis

    PubMed Central

    2014-01-01

    Background The STINGRAY system has been conceived to ease the tasks of integrating, analyzing, annotating and presenting genomic and expression data from Sanger and Next Generation Sequencing (NGS) platforms. Findings STINGRAY includes: (a) a complete and integrated workflow (more than 20 bioinformatics tools) ranging from functional annotation to phylogeny; (b) a MySQL database schema, suitable for data integration and user access control; and (c) a user-friendly graphical web-based interface that makes the system intuitive, facilitating the tasks of data analysis and annotation. Conclusion STINGRAY showed to be an easy to use and complete system for analyzing sequencing data. While both Sanger and NGS platforms are supported, the system could be faster using Sanger data, since the large NGS datasets could potentially slow down the MySQL database usage. STINGRAY is available at http://stingray.biowebdb.org and the open source code at http://sourceforge.net/projects/stingray-biowebdb/. PMID:24606808

  14. Surface ligation-based resonance light scattering analysis of methylated genomic DNA on a microarray platform.

    PubMed

    Ma, Lan; Lei, Zhen; Liu, Xia; Liu, Dianjun; Wang, Zhenxin

    2016-05-10

    DNA methylation is a crucial epigenetic modification and is closely related to tumorigenesis. Herein, a surface ligation-based high throughput method combined with bisulfite treatment is developed for analysis of methylated genomic DNA. In this method, a DNA microarray is employed as a reaction platform, and resonance light scattering (RLS) of nanoparticles is used as the detection principle. The specificity stems from allele-specific ligation of Taq DNA ligase, which is further enhanced by improving the fidelity of Taq DNA ligase in a heterogeneous reaction. Two amplification techniques, rolling circle amplification (RCA) and silver enhancement, are employed after the ligation reaction and a gold nanoparticle (GNP) labeling procedure is used to amplify the signal. As little as 0.01% methylated DNA (i.e. 2 pmol L(-1)) can be distinguished from the cocktail of methylated and unmethylated DNA by the proposed method. More importantly, this method shows good accuracy and sensitivity in profiling the methylation level of genomic DNA of three selected colonic cancer cell lines. This strategy provides a high throughput alternative with reasonable sensitivity and resolution for cancer study and diagnosis.

  15. Functional Analysis of Shewanella, a cross genome comparison.

    SciTech Connect

    Serres, Margrethe H.

    2009-05-15

    The bacterial genus Shewanella includes a group of highly versatile organisms that have successfully adapted to life in many environments ranging from aquatic (fresh and marine) to sedimentary (lake and marine sediments, subsurface sediments, sea vent). A unique respiratory capability of the Shewanellas, initially observed for Shewanella oneidensis MR-1, is the ability to use metals and metalloids, including radioactive compounds, as electron acceptors. Members of the Shewanella genus have also been shown to degrade environmental pollutants i.e. halogenated compounds, making this group highly applicable for the DOE mission. S. oneidensis MR-1 has in addition been found to utilize a diverse set of nutrients and to have a large set of genes dedicated to regulation and to sensing of the environment. The sequencing of the S. oneidensis MR-1 genome facilitated experimental and bioinformatics analyses by a group of collaborating researchers, the Shewanella Federation. Through the joint effort and with support from Department of Energy S. oneidensis MR-1 has become a model organism of study. Our work has been a functional analysis of S. oneidensis MR-1, both by itself and as part of a comparative study. We have improved the annotation of gene products, assigned metabolic functions, and analyzed protein families present in S. oneidensis MR-1. The data has been applied to analysis of experimental data (i.e. gene expression, proteome) generated for S. oneidensis MR-1. Further, this work has formed the basis for a comparative study of over 20 members of the Shewanella genus. The species and strains selected for genome sequencing represented an evolutionary gradient of DNA relatedness, ranging from close to intermediate, and to distant. The organisms selected have also adapted to a variety of ecological niches. Through our work we have been able to detect and interpret genome similarities and differences between members of the genus. We have in this way contributed to the

  16. Comparative analysis of trichomonad genome sizes and karyotypes.

    PubMed

    Zubácová, Zuzana; Cimbůrek, Zdenek; Tachezy, Jan

    2008-09-01

    In parasitic protists, the genome sizes range from 2.9Mb in Encephalitozoon cuniculi to about 160Mb in Trichomonas vaginalis. The suprisingly large genome size of the former human parasite resulted from the expansion of various repetitive elements, specific gene families, and possibly from large-scale genome duplication. The reason for this phenomenon, as well as whether other trichomonad species have undergone a similar genome expansion, is not known. In this work we studied the genomes of nine selected species of the Trichomonadea group. We found that each species has a characteristic karyotype with a stable and haploid number of chromosomes. Relatively large genome sizes were found in all the tested species, although over a rather broad range (86-177Mb). The largest genomes were typically observed in the Trichomonas and Tritrichomonas genera (133-177Mb), while Tetratrichomonas gallinarum contains the smallest genome (86Mb). The genome size correlated with the cell volume, however, no relationship between genome size and the site of infection or trichomonad phagocytic ability was observed. The data presented here provide primary information towards selecting a trichomonad species for future large-scale sequencing to elucidate the evolution of unusual parabasalid genomes. PMID:18606195

  17. Comparative analysis of trichomonad genome sizes and karyotypes.

    PubMed

    Zubácová, Zuzana; Cimbůrek, Zdenek; Tachezy, Jan

    2008-09-01

    In parasitic protists, the genome sizes range from 2.9Mb in Encephalitozoon cuniculi to about 160Mb in Trichomonas vaginalis. The suprisingly large genome size of the former human parasite resulted from the expansion of various repetitive elements, specific gene families, and possibly from large-scale genome duplication. The reason for this phenomenon, as well as whether other trichomonad species have undergone a similar genome expansion, is not known. In this work we studied the genomes of nine selected species of the Trichomonadea group. We found that each species has a characteristic karyotype with a stable and haploid number of chromosomes. Relatively large genome sizes were found in all the tested species, although over a rather broad range (86-177Mb). The largest genomes were typically observed in the Trichomonas and Tritrichomonas genera (133-177Mb), while Tetratrichomonas gallinarum contains the smallest genome (86Mb). The genome size correlated with the cell volume, however, no relationship between genome size and the site of infection or trichomonad phagocytic ability was observed. The data presented here provide primary information towards selecting a trichomonad species for future large-scale sequencing to elucidate the evolution of unusual parabasalid genomes.

  18. Genome-wide analysis of wild-type Epstein-Barr virus genomes derived from healthy individuals of the 1,000 Genomes Project.

    PubMed

    Santpere, Gabriel; Darre, Fleur; Blanco, Soledad; Alcami, Antonio; Villoslada, Pablo; Mar Albà, M; Navarro, Arcadi

    2014-04-01

    Most people in the world (∼90%) are infected by the Epstein-Barr virus (EBV), which establishes itself permanently in B cells. Infection by EBV is related to a number of diseases including infectious mononucleosis, multiple sclerosis, and different types of cancer. So far, only seven complete EBV strains have been described, all of them coming from donors presenting EBV-related diseases. To perform a detailed comparative genomic analysis of EBV including, for the first time, EBV strains derived from healthy individuals, we reconstructed EBV sequences infecting lymphoblastoid cell lines (LCLs) from the 1000 Genomes Project. As strain B95-8 was used to transform B cells to obtain LCLs, it is always present, but a specific deletion in its genome sets it apart from natural EBV strains. After studying hundreds of individuals, we determined the presence of natural EBV in at least 10 of them and obtained a set of variants specific to wild-type EBV. By mapping the natural EBV reads into the EBV reference genome (NC007605), we constructed nearly complete wild-type viral genomes from three individuals. Adding them to the five disease-derived EBV genomic sequences available in the literature, we performed an in-depth comparative genomic analysis. We found that latency genes harbor more nucleotide diversity than lytic genes and that six out of nine latency-related genes, as well as other genes involved in viral attachment and entry into host cells, packaging, and the capsid, present the molecular signature of accelerated protein evolution rates, suggesting rapid host-parasite coevolution.

  19. Analysis of genomic diversity in Mexican Mestizo populations to develop genomic medicine in Mexico.

    PubMed

    Silva-Zolezzi, Irma; Hidalgo-Miranda, Alfredo; Estrada-Gil, Jesus; Fernandez-Lopez, Juan Carlos; Uribe-Figueroa, Laura; Contreras, Alejandra; Balam-Ortiz, Eros; del Bosque-Plata, Laura; Velazquez-Fernandez, David; Lara, Cesar; Goya, Rodrigo; Hernandez-Lemus, Enrique; Davila, Carlos; Barrientos, Eduardo; March, Santiago; Jimenez-Sanchez, Gerardo

    2009-05-26

    Mexico is developing the basis for genomic medicine to improve healthcare of its population. The extensive study of genetic diversity and linkage disequilibrium structure of different populations has made it possible to develop tagging and imputation strategies to comprehensively analyze common genetic variation in association studies of complex diseases. We assessed the benefit of a Mexican haplotype map to improve identification of genes related to common diseases in the Mexican population. We evaluated genetic diversity, linkage disequilibrium patterns, and extent of haplotype sharing using genomewide data from Mexican Mestizos from regions with different histories of admixture and particular population dynamics. Ancestry was evaluated by including 1 Mexican Amerindian group and data from the HapMap. Our results provide evidence of genetic differences between Mexican subpopulations that should be considered in the design and analysis of association studies of complex diseases. In addition, these results support the notion that a haplotype map of the Mexican Mestizo population can reduce the number of tag SNPs required to characterize common genetic variation in this population. This is one of the first genomewide genotyping efforts of a recently admixed population in Latin America.

  20. Analysis of genomic diversity in Mexican Mestizo populations to develop genomic medicine in Mexico

    PubMed Central

    Silva-Zolezzi, Irma; Hidalgo-Miranda, Alfredo; Estrada-Gil, Jesus; Fernandez-Lopez, Juan Carlos; Uribe-Figueroa, Laura; Contreras, Alejandra; Balam-Ortiz, Eros; del Bosque-Plata, Laura; Velazquez-Fernandez, David; Lara, Cesar; Goya, Rodrigo; Hernandez-Lemus, Enrique; Davila, Carlos; Barrientos, Eduardo; March, Santiago; Jimenez-Sanchez, Gerardo

    2009-01-01

    Mexico is developing the basis for genomic medicine to improve healthcare of its population. The extensive study of genetic diversity and linkage disequilibrium structure of different populations has made it possible to develop tagging and imputation strategies to comprehensively analyze common genetic variation in association studies of complex diseases. We assessed the benefit of a Mexican haplotype map to improve identification of genes related to common diseases in the Mexican population. We evaluated genetic diversity, linkage disequilibrium patterns, and extent of haplotype sharing using genomewide data from Mexican Mestizos from regions with different histories of admixture and particular population dynamics. Ancestry was evaluated by including 1 Mexican Amerindian group and data from the HapMap. Our results provide evidence of genetic differences between Mexican subpopulations that should be considered in the design and analysis of association studies of complex diseases. In addition, these results support the notion that a haplotype map of the Mexican Mestizo population can reduce the number of tag SNPs required to characterize common genetic variation in this population. This is one of the first genomewide genotyping efforts of a recently admixed population in Latin America. PMID:19433783

  1. High resolution ultraviolet imaging spectrometer for latent image analysis.

    PubMed

    Lyu, Hang; Liao, Ningfang; Li, Hongsong; Wu, Wenmin

    2016-03-21

    In this work, we present a close-range ultraviolet imaging spectrometer with high spatial resolution, and reasonably high spectral resolution. As the transmissive optical components cause chromatic aberration in the ultraviolet (UV) spectral range, an all-reflective imaging scheme is introduced to promote the image quality. The proposed instrument consists of an oscillating mirror, a Cassegrain objective, a Michelson structure, an Offner relay, and a UV enhanced CCD. The finished spectrometer has a spatial resolution of 29.30μm on the target plane; the spectral scope covers both near and middle UV band; and can obtain approximately 100 wavelength samples over the range of 240~370nm. The control computer coordinates all the components of the instrument and enables capturing a series of images, which can be reconstructed into an interferogram datacube. The datacube can be converted into a spectrum datacube, which contains spectral information of each pixel with many wavelength samples. A spectral calibration is carried out by using a high pressure mercury discharge lamp. A test run demonstrated that this interferometric configuration can obtain high resolution spectrum datacube. The pattern recognition algorithm is introduced to analyze the datacube and distinguish the latent traces from the base materials. This design is particularly good at identifying the latent traces in the application field of forensic imaging.

  2. e-Fungi: a data resource for comparative analysis of fungal genomes

    PubMed Central

    Hedeler, Cornelia; Wong, Han Min; Cornell, Michael J; Alam, Intikhab; Soanes, Darren M; Rattray, Magnus; Hubbard, Simon J; Talbot, Nicholas J; Oliver, Stephen G; Paton, Norman W

    2007-01-01

    Background The number of sequenced fungal genomes is ever increasing, with about 200 genomes already fully sequenced or in progress. Only a small percentage of those genomes have been comprehensively studied, for example using techniques from functional genomics. Comparative analysis has proven to be a useful strategy for enhancing our understanding of evolutionary biology and of the less well understood genomes. However, the data required for these analyses tends to be distributed in various heterogeneous data sources, making systematic comparative studies a cumbersome task. Furthermore, comparative analyses benefit from close integration of derived data sets that cluster genes or organisms in a way that eases the expression of requests that clarify points of similarity or difference between species. Description To support systematic comparative analyses of fungal genomes we have developed the e-Fungi database, which integrates a variety of data for more than 30 fungal genomes. Publicly available genome data, functional annotations, and pathway information has been integrated into a single data repository and complemented with results of comparative analyses, such as MCL and OrthoMCL cluster analysis, and predictions of signaling proteins and the sub-cellular localisation of proteins. To access the data, a library of analysis tasks is available through a web interface. The analysis tasks are motivated by recent comparative genomics studies, and aim to support the study of evolutionary biology as well as community efforts for improving the annotation of genomes. Web services for each query are also available, enabling the tasks to be incorporated into workflows. Conclusion The e-Fungi database provides fungal biologists with a resource for comparative studies of a large range of fungal genomes. Its analysis library supports the comparative study of genome data, functional annotation, and results of large scale analyses over all the genomes stored in the database

  3. High-resolution genomic profiling of adenomas and carcinomas of the salivary glands reveals amplification, rearrangement, and fusion of HMGA2.

    PubMed

    Persson, Fredrik; Andrén, Ywonne; Winnes, Marta; Wedell, Barbro; Nordkvist, Anders; Gudnadottir, Gunhildur; Dahlenfors, Rigmor; Sjögren, Helene; Mark, Joachim; Stenman, Göran

    2009-01-01

    Carcinoma ex pleomorphic adenoma (Ca-ex-PA) is an epithelial malignancy developing within a benign salivary gland pleomorphic adenoma (PA). Here we have used genome-wide, high-resolution array-CGH, and fluorescence in situ hybridization to identify genes amplified in double min chromosomes and homogeneously staining regions in PA and Ca-ex-PA and to identify additional genomic imbalances characteristic of these tumor types. Ten of the 16 tumors analyzed showed amplification/gain of a 30-kb minimal common region, consisting of the 5'-part of HMGA2 (encoding the three DNA-binding domains). Coamplification of MDM2 was found in nine tumors. Five tumors had cryptic HMGA2-WIF1 gene fusions with amplification of the fusion oncogene in four tumors. Expression analysis of eight amplified candidate genes in 12q revealed that tumors with amplification/rearrangement of HMGA2 and MDM2 had significantly higher expression levels when compared with tumors without amplification. Analysis of individual HMGA2 exons showed that the expression of exons 3-5 were substantially reduced when compared with exons 1-2 in 9 of 10 tumors with HMGA2 activation, indicating that gene fusions and rearrangements of HMGA2 are common in tumors with amplification. In addition, recurrent amplifications/gains of 1q11-q32.1, 2p16.1-p12, 8q12.1, 8q22-24.1, and 20, and losses of 1p21.3-p21.1, 5q23.2-q31.2, 8p, 10q21.3, and 15q11.2 were identified. Collectively, our results identify HMGA2 and MDM2 as amplification targets in PA and Ca-ex-PA and suggest that amplification of 12q genes (in particular MDM2), deletions of 5q23.2-q31.2, gains of 8q12.1 (PLAG1) and 8q22.1-q24.1 (MYC), and amplification of ERBB2 may be of importance for malignant transformation of benign PA.

  4. Sequencing and Comparative Genome Analysis of Two Pathogenic Streptococcus gallolyticus Subspecies: Genome Plasticity, Adaptation and Virulence

    PubMed Central

    Teng, Yu-Ting; Wu, Hui-Lun; Liu, Yen-Ming; Wu, Keh-Ming; Chang, Chuan-Hsiung; Hsu, Ming-Ta

    2011-01-01

    Streptococcus gallolyticus infections in humans are often associated with bacteremia, infective endocarditis and colon cancers. The disease manifestations are different depending on the subspecies of S. gallolyticus causing the infection. Here, we present the complete genomes of S. gallolyticus ATCC 43143 (biotype I) and S. pasteurianus ATCC 43144 (biotype II.2). The genomic differences between the two biotypes were characterized with comparative genomic analyses. The chromosome of ATCC 43143 and ATCC 43144 are 2,36 and 2,10 Mb in length and encode 2246 and 1869 CDS respectively. The organization and genomic contents of both genomes were most similar to the recently published S. gallolyticus UCN34, where 2073 (92%) and 1607 (86%) of the ATCC 43143 and ATCC 43144 CDS were conserved in UCN34 respectively. There are around 600 CDS conserved in all Streptococcus genomes, indicating the Streptococcus genus has a small core-genome (constitute around 30% of total CDS) and substantial evolutionary plasticity. We identified eight and five regions of genome plasticity in ATCC 43143 and ATCC 43144 respectively. Within these regions, several proteins were recognized to contribute to the fitness and virulence of each of the two subspecies. We have also predicted putative cell-surface associated proteins that could play a role in adherence to host tissues, leading to persistent infections causing sub-acute and chronic diseases in humans. This study showed evidence that the S. gallolyticus still possesses genes making it suitable in a rumen environment, whereas the ability for S. pasteurianus to live in rumen is reduced. The genome heterogeneity and genetic diversity among the two biotypes, especially membrane and lipoproteins, most likely contribute to the differences in the pathogenesis of the two S. gallolyticus biotypes and the type of disease an infected patient eventually develops. PMID:21633709

  5. Genome-Wide Prediction and Analysis of 3D-Domain Swapped Proteins in the Human Genome from Sequence Information

    PubMed Central

    Upadhyay, Atul Kumar; Sowdhamini, Ramanathan

    2016-01-01

    3D-domain swapping is one of the mechanisms of protein oligomerization and the proteins exhibiting this phenomenon have many biological functions. These proteins, which undergo domain swapping, have acquired much attention owing to their involvement in human diseases, such as conformational diseases, amyloidosis, serpinopathies, proteionopathies etc. Early realisation of proteins in the whole human genome that retain tendency to domain swap will enable many aspects of disease control management. Predictive models were developed by using machine learning approaches with an average accuracy of 78% (85.6% of sensitivity, 87.5% of specificity and an MCC value of 0.72) to predict putative domain swapping in protein sequences. These models were applied to many complete genomes with special emphasis on the human genome. Nearly 44% of the protein sequences in the human genome were predicted positive for domain swapping. Enrichment analysis was performed on the positively predicted sequences from human genome for their domain distribution, disease association and functional importance based on Gene Ontology (GO). Enrichment analysis was also performed to infer a better understanding of the functional importance of these sequences. Finally, we developed hinge region prediction, in the given putative domain swapped sequence, by using important physicochemical properties of amino acids. PMID:27467780

  6. Rapid real-time PCR and high resolution melt analysis in a self-filling thermoplastic chip.

    PubMed

    Sposito, A; Hoang, V; DeVoe, D L

    2016-09-21

    A microfluidic platform designed for point-of-care PCR-based nucleic acid diagnostics is described. Compared to established microfluidic PCR technologies, the system is unique in its ability to achieve exceptionally rapid PCR amplification in a low cost thermoplastic format, together with high temperature accuracy enabling effective validation of reaction product by high resolution melt analysis performed in the same chamber as PCR. In addition, the system employs capillary pumping for automated loading of sample into the reaction chamber, combined with an integrated hydrophilic valve for precise self-metering of sample volumes into the device. Using the microfluidic system to target a mutation in the G6PC gene, efficient PCR from human genomic DNA template is achieved with cycle times as low as 14 s, full amplification in 8.5 min, and final melt analysis accurately identifying the desired amplicon. PMID:27460504

  7. Different genomic relationship matrices for single-step analysis using phenotypic, pedigree and genomic information

    PubMed Central

    2011-01-01

    Background The incorporation of genomic coefficients into the numerator relationship matrix allows estimation of breeding values using all phenotypic, pedigree and genomic information simultaneously. In such a single-step procedure, genomic and pedigree-based relationships have to be compatible. As there are many options to create genomic relationships, there is a question of which is optimal and what the effects of deviations from optimality are. Methods Data of litter size (total number born per litter) for 338,346 sows were analyzed. Illumina PorcineSNP60 BeadChip genotypes were available for 1,989. Analyses were carried out with the complete data set and with a subset of genotyped animals and three generations pedigree (5,090 animals). A single-trait animal model was used to estimate variance components and breeding values. Genomic relationship matrices were constructed using allele frequencies equal to 0.5 (G05), equal to the average minor allele frequency (GMF), or equal to observed frequencies (GOF). A genomic matrix considering random ascertainment of allele frequencies was also used (GOF*). A normalized matrix (GN) was obtained to have average diagonal coefficients equal to 1. The genomic matrices were combined with the numerator relationship matrix creating H matrices. Results In G05 and GMF, both diagonal and off-diagonal elements were on average greater than the pedigree-based coefficients. In GOF and GOF*, the average diagonal elements were smaller than pedigree-based coefficients. The mean of off-diagonal coefficients was zero in GOF and GOF*. Choices of G with average diagonal coefficients different from 1 led to greater estimates of additive variance in the smaller data set. The correlation between EBV and genomic EBV (n = 1,989) were: 0.79 using G05, 0.79 using GMF, 0.78 using GOF, 0.79 using GOF*, and 0.78 using GN. Accuracies calculated by inversion increased with all genomic matrices. The accuracies of genomic-assisted EBV were inflated in all

  8. Comparative genomic analysis reveals bilateral breast cancers are genetically independent.

    PubMed

    Song, Fangfang; Li, Xiangchun; Song, Fengju; Zhao, Yanrui; Li, Haixin; Zheng, Hong; Gao, Zhibo; Wang, Jun; Zhang, Wei; Chen, Kexin

    2015-10-13

    Bilateral breast cancer (BBC) poses a major challenge for oncologists because of the cryptic relationship between the two lesions. The purpose of this study was to determine the origin of the contralateral breast cancer (either dependent or independent of the index tumor). Here, we used ultra-deep whole-exome sequencing and array comparative genomic hybridization (aCGH) to study four paired samples of BBCs with different tumor subtypes and time intervals between the developments of each tumor. We used two paired primary breast tumors and corresponding metastatic liver lesions as the control. We tested the origin independent nature of BBC in three ways: mutational concordance, mutational signature clustering, and clonality analysis using copy number profiles. We found that the paired BBC samples had near-zero concordant mutation rates, which were much lower than those of the paired primary/metastasis samples. The results of a mutational signature analysis also suggested that BBCs are independent of one another. A clonality analysis using aCGH data further revealed that paired BBC samples was clonally independent, in contrast to clonal related origin found for paired primary/metastasis samples. Our preliminary findings show that BBCs in Han Chinese women are origin independent and thus should be treated separately. PMID:26378809

  9. Genomic Analysis and Comparison of Two Gonorrhea Outbreaks

    PubMed Central

    Dordel, Janina; Whittles, Lilith K.; Collins, Caitlin; Bilek, Nicole; Bishop, Cynthia J.; White, Peter J.; Aanensen, David M.; Bentley, Stephen D.; Spratt, Brian G.

    2016-01-01

    ABSTRACT Gonorrhea is a sexually transmitted disease causing growing concern, with a substantial increase in reported incidence over the past few years in the United Kingdom and rising levels of resistance to a wide range of antibiotics. Understanding its epidemiology is therefore of major biomedical importance, not only on a population scale but also at the level of direct transmission. However, the molecular typing techniques traditionally used for gonorrhea infections do not provide sufficient resolution to investigate such fine-scale patterns. Here we sequenced the genomes of 237 isolates from two local collections of isolates from Sheffield and London, each of which was resolved into a single type using traditional methods. The two data sets were selected to have different epidemiological properties: the Sheffield data were collected over 6 years from a predominantly heterosexual population, whereas the London data were gathered within half a year and strongly associated with men who have sex with men. Based on contact tracing information between individuals in Sheffield, we found that transmission is associated with a median time to most recent common ancestor of 3.4 months, with an upper bound of 8 months, which we used as a criterion to identify likely transmission links in both data sets. In London, we found that transmission happened predominantly between individuals of similar age, sexual orientation, and location and also with the same HIV serostatus, which may reflect serosorting and associated risk behaviors. Comparison of the two data sets suggests that the London epidemic involved about ten times more cases than the Sheffield outbreak. PMID:27353752

  10. Comparative genomic analysis of hyperthermophilic archaeal fuselloviridae viruses

    SciTech Connect

    B. Wiedenheft; K. Stedman; F. Roberto; D. Willits; A. K. Gleske; L. Zoeller; J. Snyder; T. Douglas; M. Young

    2004-02-01

    The complete genome sequences of two Sulfolobus spindle-shaped viruses (SSVs) from acidic hot springs in Kamchatka (Russia) and Yellowstone National Park (United States) have been determined. These nonlytic temperate viruses were isolated from hyperthermophilic Sulfolobus hosts, and both viruses share the spindleshaped morphology characteristic of the Fuselloviridae family. These two genomes, in combination with the previously determined SSV1 genome from Japan and the SSV2 genome from Iceland, have allowed us to carry out a phylogenetic comparison of these geographically distributed hyperthermal viruses. Each virus contains a circular double-stranded DNA genome of _15 kbp with approximately 34 open reading frames (ORFs). These Fusellovirus ORFs show little or no similarity to genes in the public databases. In contrast, 18 ORFs are common to all four isolates and may represent the minimal gene set defining this viral group. In general, ORFs on one half of the genome are colinear and highly conserved, while ORFs on the other half are not. One shared ORF among all four genomes is an integrase of the tyrosine recombinase family. All four viral genomes integrate into their host tRNA genes. The specific tRNA gene used for integration varies, and one genome integrates into multiple loci. Several unique ORFs are found in the genome of each isolate.

  11. Genome-wide analysis of alternative splicing in Chlamydomonas reinhardtii

    PubMed Central

    2010-01-01

    Background Genome-wide computational analysis of alternative splicing (AS) in several flowering plants has revealed that pre-mRNAs from about 30% of genes undergo AS. Chlamydomonas, a simple unicellular green alga, is part of the lineage that includes land plants. However, it diverged from land plants about one billion years ago. Hence, it serves as a good model system to study alternative splicing in early photosynthetic eukaryotes, to obtain insights into the evolution of this process in plants, and to compare splicing in simple unicellular photosynthetic and non-photosynthetic eukaryotes. We performed a global analysis of alternative splicing in Chlamydomonas reinhardtii using its recently completed genome sequence and all available ESTs and cDNAs. Results Our analysis of AS using BLAT and a modified version of the Sircah tool revealed AS of 498 transcriptional units with 611 events, representing about 3% of the total number of genes. As in land plants, intron retention is the most prevalent form of AS. Retained introns and skipped exons tend to be shorter than their counterparts in constitutively spliced genes. The splice site signals in all types of AS events are weaker than those in constitutively spliced genes. Furthermore, in alternatively spliced genes, the prevalent splice form has a stronger splice site signal than the non-prevalent form. Analysis of constitutively spliced introns revealed an over-abundance of motifs with simple repetitive elements in comparison to introns involved in intron retention. In almost all cases, AS results in a truncated ORF, leading to a coding sequence that is around 50% shorter than the prevalent splice form. Using RT-PCR we verified AS of two genes and show that they produce more isoforms than indicated by EST data. All cDNA/EST alignments and splice graphs are provided in a website at http://combi.cs.colostate.edu/as/chlamy. Conclusions The extent of AS in Chlamydomonas that we observed is much smaller than observed in

  12. Identification and Evaluation of Single-Nucleotide Polymorphisms in Allotetraploid Peanut (Arachis hypogaea L.) Based on Amplicon Sequencing Combined with High Resolution Melting (HRM) Analysis.

    PubMed

    Hong, Yanbin; Pandey, Manish K; Liu, Ying; Chen, Xiaoping; Liu, Hong; Varshney, Rajeev K; Liang, Xuanqiang; Huang, Shangzhi

    2015-01-01

    The cultivated peanut (Arachis hypogaea L.) is an allotetraploid (AABB) species derived from the A-genome (Arachis duranensis) and B-genome (Arachis ipaensis) progenitors. Presence of two versions of a DNA sequence based on the two progenitor genomes poses a serious technical and analytical problem during single nucleotide polymorphism (SNP) marker identification and analysis. In this context, we have analyzed 200 amplicons derived from expressed sequence tags (ESTs) and genome survey sequences (GSS) to identify SNPs in a panel of genotypes consisting of 12 cultivated peanut varieties and two diploid progenitors representing the ancestral genomes. A total of 18 EST-SNPs and 44 genomic-SNPs were identified in 12 peanut varieties by aligning the sequence of A. hypogaea with diploid progenitors. The average frequency of sequence polymorphism was higher for genomic-SNPs than the EST-SNPs with one genomic-SNP every 1011 bp as compared to one EST-SNP every 2557 bp. In order to estimate the potential and further applicability of these identified SNPs, 96 peanut varieties were genotyped using high resolution melting (HRM) method. Polymorphism information content (PIC) values for EST-SNPs ranged between 0.021 and 0.413 with a mean of 0.172 in the set of peanut varieties, while genomic-SNPs ranged between 0.080 and 0.478 with a mean of 0.249. Total 33 SNPs were used for polymorphism detection among the parents and 10 selected lines from mapping population Y13Zh (Zhenzhuhei × Yueyou13). Of the total 33 SNPs, nine SNPs showed polymorphism in the mapping population Y13Zh, and seven SNPs were successfully mapped into five linkage groups. Our results showed that SNPs can be identified in allotetraploid peanut with high accuracy through amplicon sequencing and HRM assay. The identified SNPs were very informative and can be used for different genetic and breeding applications in peanut.

  13. Identification and Evaluation of Single-Nucleotide Polymorphisms in Allotetraploid Peanut (Arachis hypogaea L.) Based on Amplicon Sequencing Combined with High Resolution Melting (HRM) Analysis

    PubMed Central

    Hong, Yanbin; Pandey, Manish K.; Liu, Ying; Chen, Xiaoping; Liu, Hong; Varshney, Rajeev K.; Liang, Xuanqiang; Huang, Shangzhi

    2015-01-01

    The cultivated peanut (Arachis hypogaea L.) is an allotetraploid (AABB) species derived from the A-genome (Arachis duranensis) and B-genome (Arachis ipaensis) progenitors. Presence of two versions of a DNA sequence based on the two progenitor genomes poses a serious technical and analytical problem during single nucleotide polymorphism (SNP) marker identification and analysis. In this context, we have analyzed 200 amplicons derived from expressed sequence tags (ESTs) and genome survey sequences (GSS) to identify SNPs in a panel of genotypes consisting of 12 cultivated peanut varieties and two diploid progenitors representing the ancestral genomes. A total of 18 EST-SNPs and 44 genomic-SNPs were identified in 12 peanut varieties by aligning the sequence of A. hypogaea with diploid progenitors. The average frequency of sequence polymorphism was higher for genomic-SNPs than the EST-SNPs with one genomic-SNP every 1011 bp as compared to one EST-SNP every 2557 bp. In order to estimate the potential and further applicability of these identified SNPs, 96 peanut varieties were genotyped using high resolution melting (HRM) method. Polymorphism information content (PIC) values for EST-SNPs ranged between 0.021 and 0.413 with a mean of 0.172 in the set of peanut varieties, while genomic-SNPs ranged between 0.080 and 0.478 with a mean of 0.249. Total 33 SNPs were used for polymorphism detection among the parents and 10 selected lines from mapping population Y13Zh (Zhenzhuhei × Yueyou13). Of the total 33 SNPs, nine SNPs showed polymorphism in the mapping population Y13Zh, and seven SNPs were successfully mapped into five linkage groups. Our results showed that SNPs can be identified in allotetraploid peanut with high accuracy through amplicon sequencing and HRM assay. The identified SNPs were very informative and can be used for different genetic and breeding applications in peanut. PMID:26697032

  14. A reference genome for common bean and genome-wide analysis of dual domestications.

    PubMed

    Schmutz, Jeremy; McClean, Phillip E; Mamidi, Sujan; Wu, G Albert; Cannon, Steven B; Grimwood, Jane; Jenkins, Jerry; Shu, Shengqiang; Song, Qijian; Chavarro, Carolina; Torres-Torres, Mirayda; Geffroy, Valerie; Moghaddam, Samira Mafi; Gao, Dongying; Abernathy, Brian; Barry, Kerrie; Blair, Matthew; Brick, Mark A; Chovatia, Mansi; Gepts, Paul; Goodstein, David M; Gonzales, Michael; Hellsten, Uffe; Hyten, David L; Jia, Gaofeng; Kelly, James D; Kudrna, Dave; Lee, Rian; Richard, Manon M S; Miklas, Phillip N; Osorno, Juan M; Rodrigues, Josiane; Thareau, Vincent; Urrea, Carlos A; Wang, Mei; Yu, Yeisoo; Zhang, Ming; Wing, Rod A; Cregan, Perry B; Rokhsar, Daniel S; Jackson, Scott A

    2014-07-01

    Common bean (Phaseolus vulgaris L.) is the most important grain legume for human consumption and has a role in sustainable agriculture owing to its ability to fix atmospheric nitrogen. We assembled 473 Mb of the 587-Mb genome and genetically anchored 98% of this sequence in 11 chromosome-scale pseudomolecules. We compared the genome for the common bean against the soybean genome to find changes in soybean resulting from polyploidy. Using resequencing of 60 wild individuals and 100 landraces from the genetically differentiated Mesoamerican and Andean gene pools, we confirmed 2 independent domestications from genetic pools that diverged before human colonization. Less than 10% of the 74 Mb of sequence putatively involved in domestication was shared by the two domestication events. We identified a set of genes linked with increased leaf and seed size and combined these results with quantitative trait locus data from Mesoamerican cultivars. Genes affected by domestication may be useful for genomics-enabled crop improvement.

  15. Decelerated genome evolution in modern vertebrates revealed by analysis of multiple lancelet genomes.

    PubMed

    Huang, Shengfeng; Chen, Zelin; Yan, Xinyu; Yu, Ting; Huang, Guangrui; Yan, Qingyu; Pontarotti, Pierre Antoine; Zhao, Hongchen; Li, Jie; Yang, Ping; Wang, Ruihua; Li, Rui; Tao, Xin; Deng, Ting; Wang, Yiquan; Li, Guang; Zhang, Qiujin; Zhou, Sisi; You, Leiming; Yuan, Shaochun; Fu, Yonggui; Wu, Fenfang; Dong, Meiling; Chen, Shangwu; Xu, Anlong

    2014-12-19

    Vertebrates diverged from other chordates ~500 Myr ago and experienced successful innovations and adaptations, but the genomic basis underlying vertebrate origins are not fully understood. Here we suggest, through comparison with multiple lancelet (amphioxus) genomes, that ancient vertebrates experienced high rates of protein evolution, genome rearrangement and domain shuffling and that these rates greatly slowed down after the divergence of jawed and jawless vertebrates. Compared with lancelets, modern vertebrates retain, at least relatively, less protein diversity, fewer nucleotide polymorphisms, domain combinations and conserved non-coding elements (CNE). Modern vertebrates also lost substantial transposable element (TE) diversity, whereas lancelets preserve high TE diversity that includes even the long-sought RAG transposon. Lancelets also exhibit rapid gene turnover, pervasive transcription, fastest exon shuffling in metazoans and substantial TE methylation not observed in other invertebrates. These new lancelet genome sequences provide new insights into the chordate ancestral state and the vertebrate evolution.

  16. Decelerated genome evolution in modern vertebrates revealed by analysis of multiple lancelet genomes.

    PubMed

    Huang, Shengfeng; Chen, Zelin; Yan, Xinyu; Yu, Ting; Huang, Guangrui; Yan, Qingyu; Pontarotti, Pierre Antoine; Zhao, Hongchen; Li, Jie; Yang, Ping; Wang, Ruihua; Li, Rui; Tao, Xin; Deng, Ting; Wang, Yiquan; Li, Guang; Zhang, Qiujin; Zhou, Sisi; You, Leiming; Yuan, Shaochun; Fu, Yonggui; Wu, Fenfang; Dong, Meiling; Chen, Shangwu; Xu, Anlong

    2014-01-01

    Vertebrates diverged from other chordates ~500 Myr ago and experienced successful innovations and adaptations, but the genomic basis underlying vertebrate origins are not fully understood. Here we suggest, through comparison with multiple lancelet (amphioxus) genomes, that ancient vertebrates experienced high rates of protein evolution, genome rearrangement and domain shuffling and that these rates greatly slowed down after the divergence of jawed and jawless vertebrates. Compared with lancelets, modern vertebrates retain, at least relatively, less protein diversity, fewer nucleotide polymorphisms, domain combinations and conserved non-coding elements (CNE). Modern vertebrates also lost substantial transposable element (TE) diversity, whereas lancelets preserve high TE diversity that includes even the long-sought RAG transposon. Lancelets also exhibit rapid gene turnover, pervasive transcription, fastest exon shuffling in metazoans and substantial TE methylation not observed in other invertebrates. These new lancelet genome sequences provide new insights into the chordate ancestral state and the vertebrate evolution. PMID:25523484

  17. Automated analysis for microcalcifications in high resolution digital mammograms

    DOEpatents

    Mascio, Laura N.

    1996-01-01

    A method for automatically locating microcalcifications indicating breast cancer. The invention assists mammographers in finding very subtle microcalcifications and in recognizing the pattern formed by all the microcalcifications. It also draws attention to microcalcifications that might be overlooked because a more prominent feature draws attention away from an important object. A new filter has been designed to weed out false positives in one of the steps of the method. Previously, iterative selection threshold was used to separate microcalcifications from the spurious signals resulting from texture or other background. A Selective Erosion or Enhancement (SEE) Filter has been invented to improve this step. Since the algorithm detects areas containing potential calcifications on the mammogram, it can be used to determine which areas need be stored at the highest resolution available, while, in addition, the full mammogram can be reduced to an appropriate resolution for the remaining cancer signs.

  18. Automated analysis for microcalcifications in high resolution digital mammograms

    DOEpatents

    Mascio, L.N.

    1996-12-17

    A method is disclosed for automatically locating microcalcifications indicating breast cancer. The invention assists mammographers in finding very subtle microcalcifications and in recognizing the pattern formed by all the microcalcifications. It also draws attention to microcalcifications that might be overlooked because a more prominent feature draws attention away from an important object. A new filter has been designed to weed out false positives in one of the steps of the method. Previously, iterative selection threshold was used to separate microcalcifications from the spurious signals resulting from texture or other background. A Selective Erosion or Enhancement (SEE) Filter has been invented to improve this step. Since the algorithm detects areas containing potential calcifications on the mammogram, it can be used to determine which areas need be stored at the highest resolution available, while, in addition, the full mammogram can be reduced to an appropriate resolution for the remaining cancer signs. 8 figs.

  19. High resolution frequency analysis techniques with application to the redshift experiment

    NASA Technical Reports Server (NTRS)

    Decher, R.; Teuber, D.

    1975-01-01

    High resolution frequency analysis methods, with application to the gravitational probe redshift experiment, are discussed. For this experiment a resolution of .00001 Hz is required to measure a slowly varying, low frequency signal of approximately 1 Hz. Major building blocks include fast Fourier transform, discrete Fourier transform, Lagrange interpolation, golden section search, and adaptive matched filter technique. Accuracy, resolution, and computer effort of these methods are investigated, including test runs on an IBM 360/65 computer.

  20. Analysis of the ABCA4 genomic locus in Stargardt disease.

    PubMed

    Zernant, Jana; Xie, Yajing Angela; Ayuso, Carmen; Riveiro-Alvarez, Rosa; Lopez-Martinez, Miguel-Angel; Simonelli, Francesca; Testa, Francesco; Gorin, Michael B; Strom, Samuel P; Bertelsen, Mette; Rosenberg, Thomas; Boone, Philip M; Yuan, Bo; Ayyagari, Radha; Nagy, Peter L; Tsang, Stephen H; Gouras, Peter; Collison, Frederick T; Lupski, James R; Fishman, Gerald A; Allikmets, Rando

    2014-12-20

    Autosomal recessive Stargardt disease (STGD1, MIM 248200) is caused by mutations in the ABCA4 gene. Complete sequencing of ABCA4 in STGD patients identifies compound heterozygous or homozygous disease-associated alleles in 65-70% of patients and only one mutation in 15-20% of patients. This study was designed to find the missing disease-causing ABCA4 variation by a combination of next-generation sequencing (NGS), array-Comparative Genome Hybridization (aCGH) screening, familial segregation and in silico analyses. The entire 140 kb ABCA4 genomic locus was sequenced in 114 STGD patients with one known ABCA4 exonic mutation revealing, on average, 200 intronic variants per sample. Filtering of these data resulted in 141 candidates for new mutations. Two variants were detected in four samples, two in three samples, and 20 variants in two samples, the remaining 117 new variants were detected only once. Multimodal analysis suggested 12 new likely pathogenic intronic ABCA4 variants, some of which were specific to (isolated) ethnic groups. No copy number variation (large deletions and insertions) was detected in any patient suggesting that it is a very rare event in the ABCA4 locus. Many variants were excluded since they were not conserved in non-human primates, were frequent in African populations and, therefore, represented ancestral, and not disease-associated, variants. The sequence variability in the ABCA4 locus is extensive and the non-coding sequences do not harbor frequent mutations in STGD patients of European-American descent. Defining disease-associated alleles in the ABCA4 locus requires exceptionally well characterized large cohorts and extensive analyses by a combination of various approaches. PMID:25082829

  1. Epigenomics and the structure of the living genome.

    PubMed

    Friedman, Nir; Rando, Oliver J

    2015-10-01

    Eukaryotic genomes are packaged into an extensively folded state known as chromatin. Analysis of the structure of eukaryotic chromosomes has been revolutionized by development of a suite of genome-wide measurement technologies, collectively termed "epigenomics." We review major advances in epigenomic analysis of eukaryotic genomes, covering aspects of genome folding at scales ranging from whole chromosome folding down to nucleotide-resolution assays that provide structural insights into protein-DNA interactions. We then briefly outline several challenges remaining and highlight new developments such as single-cell epigenomic assays that will help provide us with a high-resolution structural understanding of eukaryotic genomes.

  2. Genome-wide survey and analysis of microsatellites in the Pacific oyster genome: abundance, distribution, and potential for marker development

    NASA Astrophysics Data System (ADS)

    Wang, Jiafeng; Qi, Haigang; Li, Li; Zhang, Guofan

    2014-01-01

    Microsatellites are a ubiquitous component of the eukaryote genome and constitute one of the most popular sources of molecular markers for genetic studies. However, no data are currently available regarding microsatellites across the entire genome in oysters, despite their importance to the aquaculture industry. We present the first genome-wide investigation of microsatellites in the Pacific oyster Crassostrea gigas by analysis of the complete genome, resequencing, and expression data. The Pacific oyster genome is rich in microsatellites. A total of 604 653 repeats were identified, in average of one locus per 815 base pairs (bp). A total of 12 836 genes had coding repeats, and 7 332 were expressed normally, including genes with a wide range of molecular functions. Compared with 20 different species of animals, microsatellites in the oyster genome typically exhibited 1) an intermediate overall frequency; 2) relatively uniform contents of (A)n and (C)n repeats and abundant long (C)n repeats (≥24 bp); 3) large average length of (AG)n repeats; and 4) scarcity of trinucleotide repeats. The microsatellite-flanking regions exhibited a high degree of polymorphism with a heterozygosity rate of around 2.0%, but there was no correlation between heterozygosity and microsatellite abundance. A total of 19 462 polymorphic microsatellites were discovered, and dinucleotide repeats were the most active, with over 26% of loci found to harbor allelic variations. In all, 7 451 loci with high potential for marker development were identified. Better knowledge of the microsatellites in the oyster genome will provide information for the future design of a wide range of molecular markers and contribute to further advancements in the field of oyster genetics, particularly for molecular-based selection and breeding.

  3. Current Status of Echinoderm Genome Analysis - What do we Know?

    PubMed Central

    Kondo, Mariko; Akasaka, Koji

    2012-01-01

    Echinoderms have long served as model organisms for a variety of biological research, especially in the field of developmental biology. Although the genome of the purple sea urchin Strongylocentrotus purpuratus has been sequenced, it is the only echinoderm whose whole genome sequence has been reported. Nevertheless, data is rapidly accumulating on the chromosomes and genomic sequences of all five classes of echinoderms, including the mitochondrial genomes and Hox genes. This blossoming new data will be essential for estimating the phylogenetic relationships among echinoderms, and also to examine the underlying mechanisms by which the diverse morphologies of echinoderms have arisen. PMID:23024605

  4. Genome-wide transcriptome analysis of 150 cell samples.

    PubMed

    Irimia, Daniel; Mindrinos, Michael; Russom, Aman; Xiao, Wenzhong; Wilhelmy, Julie; Wang, Shenglong; Heath, Joe Don; Kurn, Nurith; Tompkins, Ronald G; Davis, Ronald W; Toner, Mehmet

    2009-01-01

    A major challenge in molecular biology is interrogating the human transcriptome on a genome wide scale when only a limited amount of biological sample is available for analysis. Current methodologies using microarray technologies for simultaneously monitoring mRNA transcription levels require nanogram amounts of total RNA. To overcome the sample size limitation of current technologies, we have developed a method to probe the global gene expression in biological samples as small as 150 cells, or the equivalent of approximately 300 pg total RNA. The new method employs microfluidic devices for the purification of total RNA from mammalian cells and ultra-sensitive whole transcriptome amplification techniques. We verified that the RNA integrity is preserved through the isolation process, accomplished highly reproducible whole transcriptome analysis, and established high correlation between repeated isolations of 150 cells and the same cell culture sample. We validated the technology by demonstrating that the combined microfluidic and amplification protocol is capable of identifying biological pathways perturbed by stimulation, which are consistent with the information recognized in bulk-isolated samples.

  5. Genome-wide transcriptome analysis of 150 cell samples†

    PubMed Central

    Russom, Aman; Xiao, Wenzhong; Wilhelmy, Julie; Wang, Shenglong; Heath, Joe Don; Kurn, Nurith; Tompkins, Ronald G.; Davis, Ronald W.; Toner, Mehmet

    2013-01-01

    A major challenge in molecular biology is interrogating the human transcriptome on a genome wide scale when only a limited amount of biological sample is available for analysis. Current methodologies using microarray technologies for simultaneously monitoring mRNA transcription levels require nanogram amounts of total RNA. To overcome the sample size limitation of current technologies, we have developed a method to probe the global gene expression in biological samples as small as 150 cells, or the equivalent of approximately 300 pg total RNA. The new method employs microfluidic devices for the purification of total RNA from mammalian cells and ultra-sensitive whole transcriptome amplification techniques. We verified that the RNA integrity is preserved through the isolation process, accomplished highly reproducible whole transcriptome analysis, and established high correlation between repeated isolations of 150 cells and the same cell culture sample. We validated the technology by demonstrating that the combined microfluidic and amplification protocol is capable of identifying biological pathways perturbed by stimulation, which are consistent with the information recognized in bulk-isolated samples. PMID:20023796

  6. Genomic Analysis of ATP Efflux in Saccharomyces cerevisiae.

    PubMed

    Peters, Theodore W; Miller, Aaron W; Tourette, Cendrine; Agren, Hannah; Hubbard, Alan; Hughes, Robert E

    2015-11-19

    Adenosine triphosphate (ATP) plays an important role as a primary molecule for the transfer of chemical energy to drive biological processes. ATP also functions as an extracellular signaling molecule in a diverse array of eukaryotic taxa in a conserved process known as purinergic signaling. Given the important roles of extracellular ATP in cell signaling, we sought to comprehensively elucidate the pathways and mechanisms governing ATP efflux from eukaryotic cells. Here, we present results of a genomic analysis of ATP efflux from Saccharomyces cerevisiae by measuring extracellular ATP levels in cultures of 4609 deletion mutants. This screen revealed key cellular processes that regulate extracellular ATP levels, including mitochondrial translation and vesicle sorting in the late endosome, indicating that ATP production and transport through vesicles are required for efflux. We also observed evidence for altered ATP efflux in strains deleted for genes involved in amino acid signaling, and mitochondrial retrograde signaling. Based on these results, we propose a model in which the retrograde signaling pathway potentiates amino acid signaling to promote mitochondrial respiration. This study advances our understanding of the mechanism of ATP secretion in eukaryotes and implicates TOR complex 1 (TORC1) and nutrient signaling pathways in the regulation of ATP efflux. These results will facilitate analysis of ATP efflux mechanisms in higher eukaryotes.

  7. Rice-arsenate interactions in hydroponics: whole genome transcriptional analysis.

    PubMed

    Norton, Gareth J; Lou-Hing, Daniel E; Meharg, Andrew A; Price, Adam H

    2008-01-01

    Rice (Oryza sativa) varieties that are arsenate-tolerant (Bala) and -sensitive (Azucena) were used to conduct a transcriptome analysis of the response of rice seedlings to sodium arsenate (AsV) in hydroponic solution. RNA extracted from the roots of three replicate experiments of plants grown for 1 week in phosphate-free nutrient with or without 13.3 muM AsV was used to challenge the Affymetrix (52K) GeneChip Rice Genome array. A total of 576 probe sets were significantly up-regulated at least 2-fold in both varieties, whereas 622 were down-regulated. Ontological classification is presented. As expected, a large number of transcription factors, stress proteins, and transporters demonstrated differential expression. Striking is the lack of response of classic oxidative stress-responsive genes or phytochelatin synthases/synthatases. However, the large number of responses from genes involved in glutathione synthesis, metabolism, and transport suggests that glutathione conjugation and arsenate methylation may be important biochemical responses to arsenate challenge. In this report, no attempt is made to dissect differences in the response of the tolerant and sensitive variety, but analysis in a companion article will link gene expression to the known tolerance loci available in the BalaxAzucena mapping population.

  8. Sequence and comparative genomic analysis of actin-related proteins.

    PubMed

    Muller, Jean; Oma, Yukako; Vallar, Laurent; Friederich, Evelyne; Poch, Olivier; Winsor, Barbara

    2005-12-01

    Actin-related proteins (ARPs) are key players in cytoskeleton activities and nuclear functions. Two complexes, ARP2/3 and ARP1/11, also known as dynactin, are implicated in actin dynamics and in microtubule-based trafficking, respectively. ARP4 to ARP9 are components of many chromatin-modulating complexes. Conventional actins and ARPs codefine a large family of homologous proteins, the actin superfamily, with a tertiary structure known as the actin fold. Because ARPs and actin share high sequence conservation, clear family definition requires distinct features to easily and systematically identify each subfamily. In this study we performed an in depth sequence and comparative genomic analysis of ARP subfamilies. A high-quality multiple alignment of approximately 700 complete protein sequences homologous to actin, including 148 ARP sequences, allowed us to extend the ARP classification to new organisms. Sequence alignments revealed conserved residues, motifs, and inserted sequence signatures to define each ARP subfamily. These discriminative characteristics allowed us to develop ARPAnno (http://bips.u-strasbg.fr/ARPAnno), a new web server dedicated to the annotation of ARP sequences. Analyses of sequence conservation among actins and ARPs highlight part of the actin fold and suggest interactions between ARPs and actin-binding proteins. Finally, analysis of ARP distribution across eukaryotic phyla emphasizes the central importance of nuclear ARPs, particularly the multifunctional ARP4.

  9. Improved Statistics for Genome-Wide Interaction Analysis

    PubMed Central

    Ueki, Masao; Cordell, Heather J.

    2012-01-01

    Recently, Wu and colleagues [1] proposed two novel statistics for genome-wide interaction analysis using case/control or case-only data. In computer simulations, their proposed case/control statistic outperformed competing approaches, including the fast-epistasis option in PLINK and logistic regression analysis under the correct model; however, reasons for its superior performance were not fully explored. Here we investigate the theoretical properties and performance of Wu et al.'s proposed statistics and explain why, in some circumstances, they outperform competing approaches. Unfortunately, we find minor errors in the formulae for their statistics, resulting in tests that have higher than nominal type 1 error. We also find minor errors in PLINK's fast-epistasis and case-only statistics, although theory and simulations suggest that these errors have only negligible effect on type 1 error. We propose adjusted versions of all four statistics that, both theoretically and in computer simulations, maintain correct type 1 error rates under the null hypothesis. We also investigate statistics based on correlation coefficients that maintain similar control of type 1 error. Although designed to test specifically for interaction, we show that some of these previously-proposed statistics can, in fact, be sensitive to main effects at one or both loci, particularly in the presence of linkage disequilibrium. We propose two new “joint effects” statistics that, provided the disease is rare, are sensitive only to genuine interaction effects. In computer simulations we find, in most situations considered, that highest power is achieved by analysis under the correct genetic model. Such an analysis is unachievable in practice, as we do not know this model. However, generally high power over a wide range of scenarios is exhibited by our joint effects and adjusted Wu statistics. We recommend use of these alternative or adjusted statistics and urge caution when using Wu et al

  10. Improved statistics for genome-wide interaction analysis.

    PubMed

    Ueki, Masao; Cordell, Heather J

    2012-01-01

    Recently, Wu and colleagues [1] proposed two novel statistics for genome-wide interaction analysis using case/control or case-only data. In computer simulations, their proposed case/control statistic outperformed competing approaches, including the fast-epistasis option in PLINK and logistic regression analysis under the correct model; however, reasons for its superior performance were not fully explored. Here we investigate the theoretical properties and performance of Wu et al.'s proposed statistics and explain why, in some circumstances, they outperform competing approaches. Unfortunately, we find minor errors in the formulae for their statistics, resulting in tests that have higher than nominal type 1 error. We also find minor errors in PLINK's fast-epistasis and case-only statistics, although theory and simulations suggest that these errors have only negligible effect on type 1 error. We propose adjusted versions of all four statistics that, both theoretically and in computer simulations, maintain correct type 1 error rates under the null hypothesis. We also investigate statistics based on correlation coefficients that maintain similar control of type 1 error. Although designed to test specifically for interaction, we show that some of these previously-proposed statistics can, in fact, be sensitive to main effects at one or both loci, particularly in the presence of linkage disequilibrium. We propose two new "joint effects" statistics that, provided the disease is rare, are sensitive only to genuine interaction effects. In computer simulations we find, in most situations considered, that highest power is achieved by analysis under the correct genetic model. Such an analysis is unachievable in practice, as we do not know this model. However, generally high power over a wide range of scenarios is exhibited by our joint effects and adjusted Wu statistics. We recommend use of these alternative or adjusted statistics and urge caution when using Wu et al

  11. Genome-Wide Computational Analysis of Dioxin Response Element Location and Distribution in the Human, Mouse and Rat Genomes

    PubMed Central

    Dere, Edward; Forgacs, Agnes L; Zacharewski, Timothy R; Burgoon, Lyle D

    2014-01-01

    The aryl hydrocarbon receptor (AhR) mediates responses elicited by 2,3,7,8-tetrachlorodibenzo-p-dioxin by binding to dioxin response elements (DRE) containing the core consensus sequence 5′-GCGTG-3′. The human, mouse and rat genomes were computationally searched for all DRE cores. Each core was then extended by 7bp upstream and downstream, and matrix similarity (MS) scores for the resulting 19bp DRE sequences were calculated using a revised position weight matrix constructed from bona fide functional DREs. In total, 72,318 human, 70,720 mouse and 88,651 rat high-scoring (MS ≥ 0.8437) putative DREs were identified. Gene encoding intragenic DNA regions had ~1.6-times more putative DREs than the non-coding intergenic DNA regions. Furthermore, the promoter region spanning ±1.5kb of a TSS had the highest density of putative DREs within the genome. Chromosomal analysis found that the putative DRE densities of chromosomes X and Y were significantly lower than the mean chromosomal density. Interestingly, the 10kb upstream promoter region on chromosome X of the genomes were significantly less dense than the chromosomal mean, while the same region in chromosome Y was the most dense. In addition to providing a detailed genomic map of all DRE cores in the human, mouse and rat genomes, these data will further aid the elucidation of AhR-mediated signal transduction. PMID:21370876

  12. CoCoNUT: an efficient system for the comparison and analysis of genomes

    PubMed Central

    2008-01-01

    Background Comparative genomics is the analysis and comparison of genomes from different species. This area of research is driven by the large number of sequenced genomes and heavily relies on efficient algorithms and software to perform pairwise and multiple genome comparisons. Results Most of the software tools available are tailored for one specific task. In contrast, we have developed a novel system CoCoNUT (Computational Comparative geNomics Utility Toolkit) that allows solving several different tasks in a unified framework: (1) finding regions of high similarity among multiple genomic sequences and aligning them, (2) comparing two draft or multi-chromosomal genomes, (3) locating large segmental duplications in large genomic sequences, and (4) mapping cDNA/EST to genomic sequences. Conclusion CoCoNUT is competitive with other software tools w.r.t. the quality of the results. The use of state of the art algorithms and data structures allows CoCoNUT to solve comparative genomics tasks more efficiently than previous tools. With the improved user interface (including an interactive visualization component), CoCoNUT provides a unified, versatile, and easy-to-use software tool for large scale studies in comparative genomics. PMID:19014477

  13. Analysis of the Complete Chloroplast Genome of a Medicinal Plant, Dianthus superbus var. longicalyncinus, from a Comparative Genomics Perspective.

    PubMed

    Raman, Gurusamy; Park, SeonJoo

    2015-01-01

    Dianthus superbus var. longicalycinus is an economically important traditional Chinese medicinal plant that is also used for ornamental purposes. In this study, D. superbus was compared to its closely related family of Caryophyllaceae chloroplast (cp) genomes such as Lychnis chalcedonica and Spinacia oleracea. D. superbus had the longest large single copy (LSC) region (82,805 bp), with some variations in the inverted repeat region A (IRA)/LSC regions. The IRs underwent both expansion and constriction during evolution of the Caryophyllaceae family; however, intense variations were not identified. The pseudogene ribosomal protein subunit S19 (rps19) was identified at the IRA/LSC junction, but was not present in the cp genome of other Caryophyllaceae family members. The translation initiation factor IF-1 (infA) and ribosomal protein subunit L23 (rpl23) genes were absent from the Dianthus cp genome. When the cp genome of Dianthus was compared with 31 other angiosperm lineages, the infA gene was found to have been lost in most members of rosids, solanales of asterids and Lychnis of Caryophyllales, whereas rpl23 gene loss or pseudogization had occurred exclusively in Caryophyllales. Nevertheless, the cp genome of Dianthus and Spinacia has two introns in the proteolytic subunit of ATP-dependent protease (clpP) gene, but Lychnis has lost introns from the clpP gene. Furthermore, phylogenetic analysis of individual protein-coding genes infA and rpl23 revealed that gene loss or pseudogenization occurred independently in the cp genome of Dianthus. Molecular phylogenetic analysis also demonstrated a sister relationship between Dianthus and Lychnis based on 78 protein-coding sequences. The results presented herein will contribute to studies of the evolution, molecular biology and genetic engineering of the medicinal and ornamental plant, D. superbus var. longicalycinus.

  14. Analysis of the Complete Chloroplast Genome of a Medicinal Plant, Dianthus superbus var. longicalyncinus, from a Comparative Genomics Perspective

    PubMed Central

    Raman, Gurusamy; Park, SeonJoo

    2015-01-01

    Dianthus superbus var. longicalycinus is an economically important traditional Chinese medicinal plant that is also used for ornamental purposes. In this study, D. superbus was compared to its closely related family of Caryophyllaceae chloroplast (cp) genomes such as Lychnis chalcedonica and Spinacia oleracea. D. superbus had the longest large single copy (LSC) region (82,805 bp), with some variations in the inverted repeat region A (IRA)/LSC regions. The IRs underwent both expansion and constriction during evolution of the Caryophyllaceae family; however, intense variations were not identified. The pseudogene ribosomal protein subunit S19 (rps19) was identified at the IRA/LSC junction, but was not present in the cp genome of other Caryophyllaceae family members. The translation initiation factor IF-1 (infA) and ribosomal protein subunit L23 (rpl23) genes were absent from the Dianthus cp genome. When the cp genome of Dianthus was compared with 31 other angiosperm lineages, the infA gene was found to have been lost in most members of rosids, solanales of asterids and Lychnis of Caryophyllales, whereas rpl23 gene loss or pseudogization had occurred exclusively in Caryophyllales. Nevertheless, the cp genome of Dianthus and Spinacia has two introns in the proteolytic subunit of ATP-dependent protease (clpP) gene, but Lychnis has lost introns from the clpP gene. Furthermore, phylogenetic analysis of individual protein-coding genes infA and rpl23 revealed that gene loss or pseudogenization occurred independently in the cp genome of Dianthus. Molecular phylogenetic analysis also demonstrated a sister relationship between Dianthus and Lychnis based on 78 protein-coding sequences. The results presented herein will contribute to studies of the evolution, molecular biology and genetic engineering of the medicinal and ornamental plant, D. superbus var. longicalycinus. PMID:26513163

  15. Development and characterization of genomic and expressed SSRs in citrus by genome-wide analysis.

    PubMed

    Liu, Sheng-Rui; Li, Wen-Yang; Long, Dang; Hu, Chun-Gen; Zhang, Jin-Zhi

    2013-01-01

    Microsatellites or simple sequence repeats (SSRs) are one of the most popular sources of genetic markers and play a significant role in plant genetics and breeding. In this study, we identified citrus SSRs in the genome of Clementine mandarin and analyzed their frequency and distribution in different genomic regions. A total of 80,708 SSRs were detected in the genome with an overall density of 268 SSRs/Mb. While di-nucleotide repeats were the most frequent microsatellites in genomic DNA sequence, tetra-nucleotides, which had more repeat units than any other SSR types, had the highest cumulative sequence length. We identified 6,834 transcripts as containing 8,989 SSRs in 33,929 Clementine mandarin transcripts, among which, tri-nucleotide motifs (36.0%) were the most common, followed by di-nucleotide (26.9%) and hexa-nucleotide motifs (15.1%). The motif AG (16.7%) was most abundant among these SSRs, while motifs AAG (6.6%), AAT (5.0%), and TAG (2.2%) were most common among tri-nucleotides. Functional categorization of transcripts containing SSRs revealed that 5,879 (86.0%) of such transcripts had homology with known proteins, GO and KEGG annotation revealed that transcripts containing SSRs were those implicated in diverse biological processes in plants, including binding, development, transcription, and protein degradation. When 27 genomic and 78 randomly selected SSRs were tested on Clementine mandarin, 95 SSRs revealed polymorphism. These 95 SSRs were further deployed on 18 genotypes of the three generas of Rutaceae for the genetic diversity assessment, genomic SSRs generally show low transferability in comparison to SSRs developed from expressed sequences. These transcript-markers identified in our study may provide a valuable genetic and genomic tool for further genetic research and varietal development in citrus, such as diversity study, QTL mapping, molecular breeding, comparative mapping and other genetic analyses.

  16. Inverted Low-Copy Repeats and Genome Instability—A Genome-Wide Analysis

    PubMed Central

    Dittwald, Piotr; Gambin, Tomasz; Gonzaga-Jauregui, Claudia; Carvalho, Claudia M.B.; Lupski, James R.; Stankiewicz, Paweł; Gambin, Anna

    2013-01-01

    Inverse paralogous low-copy repeats (IP-LCRs) can cause genome instability by nonallelic homologous recombination (NAHR)-mediated balanced inversions. When disrupting a dosage-sensitive gene(s), balanced inversions can lead to abnormal phenotypes. We delineated the genome-wide distribution of IP-LCRs >1 kB in size with >95% sequence identity and mapped the genes, potentially intersected by an inversion, that overlap at least one of the IP-LCRs. Remarkably, our results show that 12.0% of the human