Science.gov

Sample records for gene discovery project

  1. Alternative Gene Form Discovery and Candidate Gene Selection from Gene Indexing Projects

    PubMed Central

    Burke, John; Wang, Hui; Hide, Winston; Davison, Daniel B.

    1998-01-01

    Several efforts are under way to partition single-read expressed sequence tag (EST), as well as full-length transcript data, into large-scale gene indices, where transcripts are in common index classes if and only if they share a common progenitor gene. Accurate gene indexing facilitates gene expression studies, as well as inexpensive and early gene sequence discovery through assembly of ESTs that are derived from genes that have not been sequenced by classical methods. We extend, correct, and enhance the information obtained from index groups by splitting index classes into subclasses based on sequence dissimilarity (diversity). Two applications of this are highlighted in this report. First it is shown that our method can ameliorate the damage that artifacts, such as chimerism, inflict on index integrity. Additionally, we demonstrate how the organization imposed by an effective subpartition can greatly increase the sensitivity of gene expression studies by accounting for the existence and tissue- or pathology-specific regulation of novel gene isoforms and polymorphisms. We apply our subpartitioning treatment to the UniGene gene indexing project to measure a marked increase in information quality and abundance (in terms of assembly length and insertion/deletion error) after treatment and demonstrate cases where new levels of information concerning differential expression of alternate gene forms, such as regulated alternative splicing, are discovered. [Tables 2 and 3 can be viewed in their entirety as Online Supplements at http://www.genome.org.] PMID:9521931

  2. Pine Gene Discovery Project - Final Report - 08/31/1997 - 02/28/2001

    SciTech Connect

    Whetten, R. W.; Sederoff, R. R.; Kinlaw, C.; Retzel, E.

    2001-04-30

    Integration of pines into the large scope of plant biology research depends on study of pines in parallel with study of annual plants, and on availability of research materials from pine to plant biologists interested in comparing pine with annual plant systems. The objectives of the Pine Gene Discovery Project were to obtain 10,000 partial DNA sequences of genes expressed in loblolly pine, to determine which of those pine genes were similar to known genes from other organisms, and to make the DNA sequences and isolated pine genes available to plant researchers to stimulate integration of pines into the wider scope of plant biology research. Those objectives have been completed, and the results are available to the public. Requests for pine genes have been received from a number of laboratories that would otherwise not have included pine in their research, indicating that progress is being made toward the goal of integrating pine research into the larger molecular biology research community.

  3. FORGE Canada Consortium: outcomes of a 2-year national rare-disease gene-discovery project.

    PubMed

    Beaulieu, Chandree L; Majewski, Jacek; Schwartzentruber, Jeremy; Samuels, Mark E; Fernandez, Bridget A; Bernier, Francois P; Brudno, Michael; Knoppers, Bartha; Marcadier, Janet; Dyment, David; Adam, Shelin; Bulman, Dennis E; Jones, Steve J M; Avard, Denise; Nguyen, Minh Thu; Rousseau, Francois; Marshall, Christian; Wintle, Richard F; Shen, Yaoqing; Scherer, Stephen W; Friedman, Jan M; Michaud, Jacques L; Boycott, Kym M

    2014-06-01

    Inherited monogenic disease has an enormous impact on the well-being of children and their families. Over half of the children living with one of these conditions are without a molecular diagnosis because of the rarity of the disease, the marked clinical heterogeneity, and the reality that there are thousands of rare diseases for which causative mutations have yet to be identified. It is in this context that in 2010 a Canadian consortium was formed to rapidly identify mutations causing a wide spectrum of pediatric-onset rare diseases by using whole-exome sequencing. The FORGE (Finding of Rare Disease Genes) Canada Consortium brought together clinicians and scientists from 21 genetics centers and three science and technology innovation centers from across Canada. From nation-wide requests for proposals, 264 disorders were selected for study from the 371 submitted; disease-causing variants (including in 67 genes not previously associated with human disease; 41 of these have been genetically or functionally validated, and 26 are currently under study) were identified for 146 disorders over a 2-year period. Here, we present our experience with four strategies employed for gene discovery and discuss FORGE's impact in a number of realms, from clinical diagnostics to the broadening of the phenotypic spectrum of many diseases to the biological insight gained into both disease states and normal human development. Lastly, on the basis of this experience, we discuss the way forward for rare-disease genetic discovery both in Canada and internationally. PMID:24906018

  4. Independent Gene Discovery and Testing

    ERIC Educational Resources Information Center

    Palsule, Vrushalee; Coric, Dijana; Delancy, Russell; Dunham, Heather; Melancon, Caleb; Thompson, Dennis; Toms, Jamie; White, Ashley; Shultz, Jeffry

    2010-01-01

    A clear understanding of basic gene structure is critical when teaching molecular genetics, the central dogma and the biological sciences. We sought to create a gene-based teaching project to improve students' understanding of gene structure and to integrate this into a research project that can be implemented by instructors at the secondary level…

  5. Metagenomics and novel gene discovery

    PubMed Central

    Culligan, Eamonn P; Sleator, Roy D; Marchesi, Julian R; Hill, Colin

    2014-01-01

    Metagenomics provides a means of assessing the total genetic pool of all the microbes in a particular environment, in a culture-independent manner. It has revealed unprecedented diversity in microbial community composition, which is further reflected in the encoded functional diversity of the genomes, a large proportion of which consists of novel genes. Herein, we review both sequence-based and functional metagenomic methods to uncover novel genes and outline some of the associated problems of each type of approach, as well as potential solutions. Furthermore, we discuss the potential for metagenomic biotherapeutic discovery, with a particular focus on the human gut microbiome and finally, we outline how the discovery of novel genes may be used to create bioengineered probiotics. PMID:24317337

  6. Cancer gene discovery using digital differential display.

    PubMed

    Scheurle, D; DeYoung, M P; Binninger, D M; Page, H; Jahanzeb, M; Narayanan, R

    2000-08-01

    The Cancer Gene Anatomy Project database of the National Cancer Institute has thousands of expressed sequences, both known and novel, in the form of expressed sequence tags (ESTs). These ESTs, derived from diverse normal and tumor cDNA libraries, offer an attractive starting point for cancer gene discovery. Using a data-mining tool called Digital Differential Display (DDD) from the Cancer Gene Anatomy Project database, ESTs from six different solid tumor types (breast, colon, lung, ovary, pancreas, and prostate) were analyzed for differential expression. An electronic expression profile and chromosomal map position of these hits were generated from the Unigene database. The hits were categorized into major classes of genes including ribosomal proteins, enzymes, cell surface molecules, secretory proteins, adhesion molecules, and immunoglobulins and were found to be differentially expressed in these tumorderived libraries. Genes known to be up-regulated in prostate, breast, and pancreatic carcinomas were discovered by DDD, demonstrating the utility of this technique. Two hundred known genes and 500 novel sequences were discovered to be differentially expressed in these select tumor-derived libraries. Test genes were validated for expression specificity by reverse transcription-PCR, providing a proof of concept for gene discovery by DDD. A comprehensive database of hits can be accessed at http:// www.fau.edu/cmbb/publications/cancergenes. htm. This solid tumor DDD database should facilitate target identification for cancer diagnostics and therapeutics. PMID:10945605

  7. Cancer gene discovery in mouse and man

    PubMed Central

    Mattison, Jenny; van der Weyden, Louise; Hubbard, Tim; Adams, David J.

    2009-01-01

    The elucidation of the human and mouse genome sequence and developments in high-throughput genome analysis, and in computational tools, have made it possible to profile entire cancer genomes. In parallel with these advances mouse models of cancer have evolved into a powerful tool for cancer gene discovery. Here we discuss the approaches that may be used for cancer gene identification in both human and mouse and discuss how a cross-species ‘oncogenomics’ approach to cancer gene discovery represents a powerful strategy for finding genes that drive tumourigenesis. PMID:19285540

  8. Phenotypic mutant library: potential for gene discovery

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The rapid development of high throughput and affordable Next- Generation Sequencing (NGS) techniques has renewed interest in gene discovery using forward genetics. The conventional forward genetic approach starts with isolation of mutants with a phenotype of interest, mapping the mutation within a s...

  9. Pathway-driven discovery of epilepsy genes

    PubMed Central

    Noebels, Jeffrey

    2016-01-01

    Epilepsy genes deliver critical insights into the molecular control of brain synchronization and are revolutionizing our understanding and treatment of the disease. The epilepsy-associated genome is rapidly expanding, and two powerful complementary approaches, isolation of de novo exome variants in patients and targeted mutagenesis in model systems, account for the steep increase. In sheer number, the tally of genes linked to seizures will likely match that of cancer and exceed it in biological diversity. The proteins act within most intracellular compartments and span the molecular determinants of firing and wiring in the developing brain. Every facet of neurotransmission, from dendritic spine to exocytotic machinery, is in play, and defects of synaptic inhibition are over-represented. The contributions of somatic mutations and noncoding microRNAs are also being explored. The functional spectrum of established epilepsy genes and the arrival of rapid, precise technologies for genome editing now provide a robust scaffold to prioritize hypothesis-driven discovery and further populate this genetic proto-map. Although each gene identified offers translational potential to stratify patient care, the complexity of individual variation and covert actions of genetic modifiers may confound single-gene solutions for the clinical disorder. In vivo genetic deconstruction of epileptic networks, ex vivo validation of variant profiles in patient-derived induced pluripotent stem cells, in silico variant modeling and modifier gene discovery, now in their earliest stages, will help clarify individual patterns. Because seizures stand at the crossroads of all neuronal synchronization disorders in the developing and aging brain, the neurobiological analysis of epilepsy-associated genes provides an extraordinary gateway to new insights into higher cortical function. PMID:25710836

  10. Biomarker Gene Signature Discovery Integrating Network Knowledge

    PubMed Central

    Cun, Yupeng; Fröhlich, Holger

    2012-01-01

    Discovery of prognostic and diagnostic biomarker gene signatures for diseases, such as cancer, is seen as a major step towards a better personalized medicine. During the last decade various methods, mainly coming from the machine learning or statistical domain, have been proposed for that purpose. However, one important obstacle for making gene signatures a standard tool in clinical diagnosis is the typical low reproducibility of these signatures combined with the difficulty to achieve a clear biological interpretation. For that purpose in the last years there has been a growing interest in approaches that try to integrate information from molecular interaction networks. Here we review the current state of research in this field by giving an overview about so-far proposed approaches. PMID:24832044

  11. The Helioviewer Project: Discovery For Everyone Everywhere

    NASA Astrophysics Data System (ADS)

    Ireland, Jack; Hughitt, K.; Müller, D.; Dimitoglou, G.; Schmiedel, P.; Fleck, B.

    2009-05-01

    The Helioviewer Project: discovery for everyone everywhere There is an ever increasing amount of solar and heliospheric data gathered from multiple sources such as space-based facilities and ground based observatories. There are also multiple feature and event catalogs arising from human and computer based detection methods. The Helioviewer Project is developing a suite of technologies to allow users around the world to visualize, browse and access these heterogeneous datasets in an intuitive and highly customizable fashion. Helioviewer technologies are based on the JPEG2000 file format, an extremely flexible format that allows for the efficient transfer of data (and meta-data, such as FITS keywords) between client and server. Rather then having to download an entire image and then examine the small portion- for example, an active region - that you are interested in, the JPEG2000 file format lets you preferentially download only those portions you are interested in. This dramatically reduces the amount of data transferred, making possible responsive and flexible scientific discovery applications that can browse populous archives of large images, such as those from the Solar Dynamics Observatory. In addition, the Helioviewer Project is designed to be flexible and extensible to data sources as they become available. Helioviewer.org (www.helioviewer.org) works seamlessly with the Virtual Solar Observatory (VSO) whilst an application programming interface (API) is being developed for interaction with the Solar Dynamics Observatory Heliophysics Event Knowledgebase. After a short introduction to the underlying technology, a live demonstration of the web application www.helioviewer.org will be given. We will also comment on other client applications (Jhelioviewer, a Java-based browse tool), and the application of Helioviewer technology to existing and future solar and heliospheric data and feature/event repositories. This project is funded by NASA VxO and LWS awards and

  12. Inflammatory bowel disease gene discovery. CRADA final report

    SciTech Connect

    1997-09-09

    The ultimate goal of this project is to identify the human gene(s) responsible for the disorder known as IBD. The work was planned in two phases. The desired products resulting from Phase 1 were BAC clone(s) containing the genetic marker(s) identified by gene/Networks, Inc. as potentially linked to IBD, plasmid subclones of those BAC(s), and new genetic markers developed from these plasmid subclones. The newly developed markers would be genotyped by gene/Networks, Inc. to ascertain evidence for linkage or non-linkage of IBD to this region. If non-linkage was indicated, the project would move to investigation of other candidate chromosomal regions. Where linkage was indicated, the project would move to Phase 2, in which a physical map of the candidate region(s) would be developed. The products of this phase would be contig(s) of BAC clones in the region exhibiting linkage to IBD, as well as plasmic subclones of the BACs and further genetic marker development. There would also be continued genotyping with new polymorphic markers during this phase. It was anticipated that clones identified and developed during these two phases would provide the physical resources for eventual disease gene discovery.

  13. Peroxidase gene discovery from the horseradish transcriptome

    PubMed Central

    2014-01-01

    Background Horseradish peroxidases (HRPs) from Armoracia rusticana have long been utilized as reporters in various diagnostic assays and histochemical stainings. Regardless of their increasing importance in the field of life sciences and suggested uses in medical applications, chemical synthesis and other industrial applications, the HRP isoenzymes, their substrate specificities and enzymatic properties are poorly characterized. Due to lacking sequence information of natural isoenzymes and the low levels of HRP expression in heterologous hosts, commercially available HRP is still extracted as a mixture of isoenzymes from the roots of A. rusticana. Results In this study, a normalized, size-selected A. rusticana transcriptome library was sequenced using 454 Titanium technology. The resulting reads were assembled into 14871 isotigs with an average length of 1133 bp. Sequence databases, ORF finding and ORF characterization were utilized to identify peroxidase genes from the 14871 isotigs generated by de novo assembly. The sequences were manually reviewed and verified with Sanger sequencing of PCR amplified genomic fragments, resulting in the discovery of 28 secretory peroxidases, 23 of them previously unknown. A total of 22 isoenzymes including allelic variants were successfully expressed in Pichia pastoris and showed peroxidase activity with at least one of the substrates tested, thus enabling their development into commercial pure isoenzymes. Conclusions This study demonstrates that transcriptome sequencing combined with sequence motif search is a powerful concept for the discovery and quick supply of new enzymes and isoenzymes from any plant or other eukaryotic organisms. Identification and manual verification of the sequences of 28 HRP isoenzymes do not only contribute a set of peroxidases for industrial, biological and biomedical applications, but also provide valuable information on the reliability of the approach in identifying and characterizing a large group

  14. Genome-enabled Discovery of Carbon Sequestration Genes

    SciTech Connect

    Tuskan, Gerald A; Tschaplinski, Timothy J; Kalluri, Udaya C; Yin, Tongming; Yang, Xiaohan; Zhang, Xinye; Engle, Nancy L; Ranjan, Priya; Basu, Manojit M; Gunter, Lee E; Jawdy, Sara; Martin, Madhavi Z; Campbell, Alina S; DiFazio, Stephen P; Davis, John M; Hinchee, Maud; Pinnacchio, Christa; Meilan, R; Busov, V.; Strauss, S

    2009-01-01

    The fate of carbon below ground is likely to be a major factor determining the success of carbon sequestration strategies involving plants. Despite their importance, molecular processes controlling belowground C allocation and partitioning are poorly understood. This project is leveraging the Populus trichocarpa genome sequence to discover genes important to C sequestration in plants and soils. The focus is on the identification of genes that provide key control points for the flow and chemical transformations of carbon in roots, concentrating on genes that control the synthesis of chemical forms of carbon that result in slower turnover rates of soil organic matter (i.e., increased recalcitrance). We propose to enhance carbon allocation and partitioning to roots by 1) modifying the auxin signaling pathway, and the invertase family, which controls sucrose metabolism, and by 2) increasing root proliferation through transgenesis with genes known to control fine root proliferation (e.g., ANT), 3) increasing the production of recalcitrant C metabolites by identifying genes controlling secondary C metabolism by a major mQTL-based gene discovery effort, and 4) increasing aboveground productivity by enhancing drought tolerance to achieve maximum C sequestration. This broad, integrated approach is aimed at ultimately enhancing root biomass as well as root detritus longevity, providing the best prospects for significant enhancement of belowground C sequestration.

  15. Fusion genes and their discovery using high throughput sequencing.

    PubMed

    Annala, M J; Parker, B C; Zhang, W; Nykter, M

    2013-11-01

    Fusion genes are hybrid genes that combine parts of two or more original genes. They can form as a result of chromosomal rearrangements or abnormal transcription, and have been shown to act as drivers of malignant transformation and progression in many human cancers. The biological significance of fusion genes together with their specificity to cancer cells has made them into excellent targets for molecular therapy. Fusion genes are also used as diagnostic and prognostic markers to confirm cancer diagnosis and monitor response to molecular therapies. High-throughput sequencing has enabled the systematic discovery of fusion genes in a wide variety of cancer types. In this review, we describe the history of fusion genes in cancer and the ways in which fusion genes form and affect cellular function. We also describe computational methodologies for detecting fusion genes from high-throughput sequencing experiments, and the most common sources of error that lead to false discovery of fusion genes. PMID:23376639

  16. Standardized Plant Disease Evaluations will Enhance Resistance Gene Discovery

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Gene discovery and marker development using DNA based tools require plant populations with well-documented phenotypes. Related crops such as apples and pears may share a number of genes, for example resistance to common diseases, and data mining in one crop may reveal genes for the other. However, u...

  17. Antibiotic resistance gene discovery in food-producing animals

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Numerous environmental reservoirs contribute to the widespread antibiotic resistance problem in human pathogens. One environmental reservoir of particular importance is the intestinal bacteria of food-producing animals. In this review I examine recent discoveries of antibiotic resistance genes in ...

  18. Discovery of Tumor Suppressor Gene Function.

    ERIC Educational Resources Information Center

    Oppenheimer, Steven B.

    1995-01-01

    This is an update of a 1991 review on tumor suppressor genes written at a time when understanding of how the genes work was limited. A recent major breakthrough in the understanding of the function of tumor suppressor genes is discussed. (LZ)

  19. SNP marker discovery in koala TLR genes.

    PubMed

    Cui, Jian; Frankham, Greta J; Johnson, Rebecca N; Polkinghorne, Adam; Timms, Peter; O'Meally, Denis; Cheng, Yuanyuan; Belov, Katherine

    2015-01-01

    Toll-like receptors (TLRs) play a crucial role in the early defence against invading pathogens, yet our understanding of TLRs in marsupial immunity is limited. Here, we describe the characterisation of nine TLRs from a koala immune tissue transcriptome and one TLR from a draft sequence of the koala genome and the subsequent development of an assay to study genetic diversity in these genes. We surveyed genetic diversity in 20 koalas from New South Wales, Australia and showed that one gene, TLR10 is monomorphic, while the other nine TLR genes have between two and 12 alleles. 40 SNPs (16 non-synonymous) were identified across the ten TLR genes. These markers provide a springboard to future studies on innate immunity in the koala, a species under threat from two major infectious diseases. PMID:25799012

  20. A Discovery Lab for Studying Gene Regulation.

    ERIC Educational Resources Information Center

    Moss, Robert

    1997-01-01

    Presents a laboratory in which students are provided with cultures of three bacterial strains. Using the results, students will determine which of the strains corresponds to a mutant lacking a particular functional gene. (DDR)

  1. SNP Marker Discovery in Koala TLR Genes

    PubMed Central

    Cui, Jian; Frankham, Greta J.; Johnson, Rebecca N.; Polkinghorne, Adam; Timms, Peter; O’Meally, Denis; Cheng, Yuanyuan; Belov, Katherine

    2015-01-01

    Toll-like receptors (TLRs) play a crucial role in the early defence against invading pathogens, yet our understanding of TLRs in marsupial immunity is limited. Here, we describe the characterisation of nine TLRs from a koala immune tissue transcriptome and one TLR from a draft sequence of the koala genome and the subsequent development of an assay to study genetic diversity in these genes. We surveyed genetic diversity in 20 koalas from New South Wales, Australia and showed that one gene, TLR10 is monomorphic, while the other nine TLR genes have between two and 12 alleles. 40 SNPs (16 non-synonymous) were identified across the ten TLR genes. These markers provide a springboard to future studies on innate immunity in the koala, a species under threat from two major infectious diseases. PMID:25799012

  2. Standardized plant disease evaluations will enhance resistance gene discovery

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Gene discovery and marker development using DNA-based tools require plant populations with well documented phenotypes. If dissimilar phenotype evaluation methods or data scoring techniques are employed with different crops, or at different labs for the same crops, then data mining for genetic marker...

  3. Implementation of Discovery Projects in Statistics

    ERIC Educational Resources Information Center

    Bailey, Brad; Spence, Dianna J.; Sinn, Robb

    2013-01-01

    Researchers and statistics educators consistently suggest that students will learn statistics more effectively by conducting projects through which they actively engage in a broad spectrum of tasks integral to statistical inquiry, in the authentic context of a real-world application. In keeping with these findings, we share an implementation of…

  4. Technology development for gene discovery and full-length sequencing

    SciTech Connect

    Marcelo Bento Soares

    2004-07-19

    In previous years, with support from the U.S. Department of Energy, we developed methods for construction of normalized and subtracted cDNA libraries, and constructed hundreds of high-quality libraries for production of Expressed Sequence Tags (ESTs). Our clones were made widely available to the scientific community through the IMAGE Consortium, and millions of ESTs were produced from our libraries either by collaborators or by our own sequencing laboratory at the University of Iowa. During this grant period, we focused on (1) the development of a method for preferential cloning of tissue-specific and/or rare transcripts, (2) its utilization to expedite EST-based gene discovery for the NIH Mouse Brain Molecular Anatomy Project, (3) further development and optimization of a method for construction of full-length-enriched cDNA libraries, and (4) modification of a plasmid vector to maximize efficiency of full-length cDNA sequencing by the transposon-mediated approach. It is noteworthy that the technology developed for preferential cloning of rare mRNAs enabled identification of over 2,000 mouse transcripts differentially expressed in the hippocampus. In addition, the method that we optimized for construction of full-length-enriched cDNA libraries was successfully utilized for the production of approximately fifty libraries from the developing mouse nervous system, from which over 2,500 full-ORF-containing cDNAs have been identified and accurately sequenced in their entirety either by our group or by the NIH-Mammalian Gene Collection Program Sequencing Team.

  5. Novel venom gene discovery in the platypus

    PubMed Central

    2010-01-01

    Background To date, few peptides in the complex mixture of platypus venom have been identified and sequenced, in part due to the limited amounts of platypus venom available to study. We have constructed and sequenced a cDNA library from an active platypus venom gland to identify the remaining components. Results We identified 83 novel putative platypus venom genes from 13 toxin families, which are homologous to known toxins from a wide range of vertebrates (fish, reptiles, insectivores) and invertebrates (spiders, sea anemones, starfish). A number of these are expressed in tissues other than the venom gland, and at least three of these families (those with homology to toxins from distant invertebrates) may play non-toxin roles. Thus, further functional testing is required to confirm venom activity. However, the presence of similar putative toxins in such widely divergent species provides further evidence for the hypothesis that there are certain protein families that are selected preferentially during evolution to become venom peptides. We have also used homology with known proteins to speculate on the contributions of each venom component to the symptoms of platypus envenomation. Conclusions This study represents a step towards fully characterizing the first mammal venom transcriptome. We have found similarities between putative platypus toxins and those of a number of unrelated species, providing insight into the evolution of mammalian venom. PMID:20920228

  6. Beegle: from literature mining to disease-gene discovery.

    PubMed

    ElShal, Sarah; Tranchevent, Léon-Charles; Sifrim, Alejandro; Ardeshirdavani, Amin; Davis, Jesse; Moreau, Yves

    2016-01-29

    Disease-gene identification is a challenging process that has multiple applications within functional genomics and personalized medicine. Typically, this process involves both finding genes known to be associated with the disease (through literature search) and carrying out preliminary experiments or screens (e.g. linkage or association studies, copy number analyses, expression profiling) to determine a set of promising candidates for experimental validation. This requires extensive time and monetary resources. We describe Beegle, an online search and discovery engine that attempts to simplify this process by automating the typical approaches. It starts by mining the literature to quickly extract a set of genes known to be linked with a given query, then it integrates the learning methodology of Endeavour (a gene prioritization tool) to train a genomic model and rank a set of candidate genes to generate novel hypotheses. In a realistic evaluation setup, Beegle has an average recall of 84% in the top 100 returned genes as a search engine, which improves the discovery engine by 12.6% in the top 5% prioritized genes. Beegle is publicly available at http://beegle.esat.kuleuven.be/. PMID:26384564

  7. INTEGRATE: gene fusion discovery using whole genome and transcriptome data

    PubMed Central

    Zhang, Jin; White, Nicole M.; Schmidt, Heather K.; Fulton, Robert S.; Tomlinson, Chad; Warren, Wesley C.; Wilson, Richard K.; Maher, Christopher A.

    2016-01-01

    While next-generation sequencing (NGS) has become the primary technology for discovering gene fusions, we are still faced with the challenge of ensuring that causative mutations are not missed while minimizing false positives. Currently, there are many computational tools that predict structural variations (SV) and gene fusions using whole genome (WGS) and transcriptome sequencing (RNA-seq) data separately. However, as both WGS and RNA-seq have their limitations when used independently, we hypothesize that the orthogonal validation from integrating both data could generate a sensitive and specific approach for detecting high-confidence gene fusion predictions. Fortunately, decreasing NGS costs have resulted in a growing quantity of patients with both data available. Therefore, we developed a gene fusion discovery tool, INTEGRATE, that leverages both RNA-seq and WGS data to reconstruct gene fusion junctions and genomic breakpoints by split-read mapping. To evaluate INTEGRATE, we compared it with eight additional gene fusion discovery tools using the well-characterized breast cell line HCC1395 and peripheral blood lymphocytes derived from the same patient (HCC1395BL). The predictions subsequently underwent a targeted validation leading to the discovery of 131 novel fusions in addition to the seven previously reported fusions. Overall, INTEGRATE only missed six out of the 138 validated fusions and had the highest accuracy of the nine tools evaluated. Additionally, we applied INTEGRATE to 62 breast cancer patients from The Cancer Genome Atlas (TCGA) and found multiple recurrent gene fusions including a subset involving estrogen receptor. Taken together, INTEGRATE is a highly sensitive and accurate tool that is freely available for academic use. PMID:26556708

  8. Mitigating false-positive associations in rare disease gene discovery.

    PubMed

    Akle, Sebastian; Chun, Sung; Jordan, Daniel M; Cassa, Christopher A

    2015-10-01

    Clinical sequencing is expanding, but causal variants are still not identified in the majority of cases. These unsolved cases can aid in gene discovery when individuals with similar phenotypes are identified in systems such as the Matchmaker Exchange. We describe risks for gene discovery in this growing set of unsolved cases. In a set of rare disease cases with the same phenotype, it is not difficult to find two individuals with the same phenotype that carry variants in the same gene. We quantify the risk of false-positive association in a cohort of individuals with the same phenotype, using the prior probability of observing a variant in each gene from over 60,000 individuals (Exome Aggregation Consortium). Based on the number of individuals with a genic variant, cohort size, specific gene, and mode of inheritance, we calculate a P value that the match represents a true association. A match in two of 10 patients in MECP2 is statistically significant (P = 0.0014), whereas a match in TTN would not reach significance, as expected (P > 0.999). Finally, we analyze the probability of matching in clinical exome cases to estimate the number of cases needed to identify genes related to different disorders. We offer Rare Disease Match, an online tool to mitigate the uncertainty of false-positive associations. PMID:26378430

  9. Mouse models for the discovery of colorectal cancer driver genes

    PubMed Central

    Clark, Christopher R; Starr, Timothy K

    2016-01-01

    Colorectal cancer (CRC) constitutes a major public health problem as the third most commonly diagnosed and third most lethal malignancy worldwide. The prevalence and the physical accessibility to colorectal tumors have made CRC an ideal model for the study of tumor genetics. Early research efforts using patient derived CRC samples led to the discovery of several highly penetrant mutations (e.g., APC, KRAS, MMR genes) in both hereditary and sporadic CRC tumors. This knowledge has enabled researchers to develop genetically engineered and chemically induced tumor models of CRC, both of which have had a substantial impact on our understanding of the molecular basis of CRC. Despite these advances, the morbidity and mortality of CRC remains a cause for concern and highlight the need to uncover novel genetic drivers of CRC. This review focuses on mouse models of CRC with particular emphasis on a newly developed cancer gene discovery tool, the Sleeping Beauty transposon-based mutagenesis model of CRC. PMID:26811627

  10. Alternative Approaches in Gene Discovery and Characterization in Alzheimer's Disease.

    PubMed

    Ertekin-Taner, Nilüfer; De Jager, Phillip L; Yu, Lei; Bennett, David A

    2013-03-01

    Uncovering the genetic risk and protective factors for complex diseases is of fundamental importance for advancing therapeutic and biomarker discoveries. This endeavor is particularly challenging for neuropsychiatric diseases where diagnoses predominantly rely on the clinical presentation, which may be heterogeneous, possibly due to the heterogeneity of the underlying genetic susceptibility factors and environmental exposures. Although genome-wide association studies of various neuropsychiatric diseases have recently identified susceptibility loci, there likely remain additional genetic risk factors that underlie the liability to these conditions. Furthermore, identification and characterization of the causal risk variant(s) in each of these novel susceptibility loci constitute a formidable task, particularly in the absence of any prior knowledge about their function or mechanism of action. Biologically relevant, quantitative phenotypes, i.e., endophenotypes, provide a powerful alternative to the more traditional, binary disease phenotypes in the discovery and characterization of susceptibility genes for neuropsychiatric conditions. In this review, we focus on Alzheimer's disease (AD) as a model neuropsychiatric disease and provide a synopsis of the recent literature on the use of endophenotypes in AD genetics. We highlight gene expression, neuropathology and cognitive endophenotypes in AD, with examples demonstrating the utility of these alternative approaches in the discovery of novel susceptibility genes and pathways. In addition, we discuss how these avenues generate testable hypothesis about the pathophysiology of genetic factors that have far-reaching implications for therapies. PMID:23482655

  11. Nonlinear Dependence in the Discovery of Differentially Expressed Genes

    PubMed Central

    Deller, J. R.; Radha, Hayder; McCormick, J. Justin; Wang, Huiyan

    2012-01-01

    Microarray data are used to determine which genes are active in response to a changing cell environment. Genes are “discovered” when they are significantly differentially expressed in the microarray data collected under the differing conditions. In one prevalent approach, all genes are assumed to satisfy a null hypothesis, ℍ0, of no difference in expression. A false discovery (type 1 error) occurs when ℍ0 is incorrectly rejected. The quality of a detection algorithm is assessed by estimating its number of false discoveries, 𝔉. Work involving the second-moment modeling of the z-value histogram (representing gene expression differentials) has shown significantly deleterious effects of intergene expression correlation on the estimate of 𝔉. This paper suggests that nonlinear dependencies could likewise be important. With an applied emphasis, this paper extends the “moment framework” by including third-moment skewness corrections in an estimator of 𝔉. This estimator combines observed correlation (corrected for sampling fluctuations) with the information from easily identifiable null cases. Nonlinear-dependence modeling reduces the estimation error relative to that of linear estimation. Third-moment calculations involve empirical densities of 3 × 3 covariance matrices estimated using very few samples. The principle of entropy maximization is employed to connect estimated moments to 𝔉 inference. Model results are tested with BRCA and HIV data sets and with carefully constructed simulations. PMID:25937940

  12. Discovery of a widely distributed toxin biosynthetic gene cluster

    PubMed Central

    Lee, Shaun W.; Mitchell, Douglas A.; Markley, Andrew L.; Hensler, Mary E.; Gonzalez, David; Wohlrab, Aaron; Dorrestein, Pieter C.; Nizet, Victor; Dixon, Jack E.

    2008-01-01

    Bacteriocins represent a large family of ribosomally produced peptide antibiotics. Here we describe the discovery of a widely conserved biosynthetic gene cluster for the synthesis of thiazole and oxazole heterocycles on ribosomally produced peptides. These clusters encode a toxin precursor and all necessary proteins for toxin maturation and export. Using the toxin precursor peptide and heterocycle-forming synthetase proteins from the human pathogen Streptococcus pyogenes, we demonstrate the in vitro reconstitution of streptolysin S activity. We provide evidence that the synthetase enzymes, as predicted from our bioinformatics analysis, introduce heterocycles onto precursor peptides, thereby providing molecular insight into the chemical structure of streptolysin S. Furthermore, our studies reveal that the synthetase exhibits relaxed substrate specificity and modifies toxin precursors from both related and distant species. Given our findings, it is likely that the discovery of similar peptidic toxins will rapidly expand to existing and emerging genomes. PMID:18375757

  13. Future Mission Proposal Opportunities: Discovery, New Frontiers, and Project Prometheus

    NASA Technical Reports Server (NTRS)

    Niebur, S. M.; Morgan, T. H.; Niebur, C. S.

    2003-01-01

    The NASA Office of Space Science is expanding opportunities to propose missions to comets, asteroids, and other solar system targets. The Discovery Program continues to be popular, with two sample return missions, Stardust and Genesis, currently in operation. The New Frontiers Program, a new proposal opportunity modeled on the successful Discovery Program, begins this year with the release of its first Announcement of Opportunity. Project Prometheus, a program to develop nuclear electric power and propulsion technology intended to enable a new class of high-power, high-capability investigations, is a third opportunity to propose solar system exploration. All three classes of mission include a commitment to provide data to the Planetary Data System, any samples to the NASA Curatorial Facility at Johnson Space Center, and programs for education and public outreach.

  14. Psychiatric gene discoveries shape evidence on ADHD's biology

    PubMed Central

    Thapar, A; Martin, J; Mick, E; Arias Vásquez, A; Langley, K; Scherer, S W; Schachar, R; Crosbie, J; Williams, N; Franke, B; Elia, J; Glessner, J; Hakonarson, H; Owen, M J; Faraone, S V; O'Donovan, M C; Holmans, P

    2016-01-01

    A strong motivation for undertaking psychiatric gene discovery studies is to provide novel insights into unknown biology. Although attention-deficit hyperactivity disorder (ADHD) is highly heritable, and large, rare copy number variants (CNVs) contribute to risk, little is known about its pathogenesis and it remains commonly misunderstood. We assembled and pooled five ADHD and control CNV data sets from the United Kingdom, Ireland, United States of America, Northern Europe and Canada. Our aim was to test for enrichment of neurodevelopmental gene sets, implicated by recent exome-sequencing studies of (a) schizophrenia and (b) autism as a means of testing the hypothesis that common pathogenic mechanisms underlie ADHD and these other neurodevelopmental disorders. We also undertook hypothesis-free testing of all biological pathways. We observed significant enrichment of individual genes previously found to harbour schizophrenia de novo non-synonymous single-nucleotide variants (SNVs; P=5.4 × 10−4) and targets of the Fragile X mental retardation protein (P=0.0018). No enrichment was observed for activity-regulated cytoskeleton-associated protein (P=0.23) or N-methyl-D-aspartate receptor (P=0.74) post-synaptic signalling gene sets previously implicated in schizophrenia. Enrichment of ADHD CNV hits for genes impacted by autism de novo SNVs (P=0.019 for non-synonymous SNV genes) did not survive Bonferroni correction. Hypothesis-free testing yielded several highly significantly enriched biological pathways, including ion channel pathways. Enrichment findings were robust to multiple testing corrections and to sensitivity analyses that excluded the most significant sample. The findings reveal that CNVs in ADHD converge on biologically meaningful gene clusters, including ones now established as conferring risk of other neurodevelopmental disorders. PMID:26573769

  15. Psychiatric gene discoveries shape evidence on ADHD's biology.

    PubMed

    Thapar, A; Martin, J; Mick, E; Arias Vásquez, A; Langley, K; Scherer, S W; Schachar, R; Crosbie, J; Williams, N; Franke, B; Elia, J; Glessner, J; Hakonarson, H; Owen, M J; Faraone, S V; O'Donovan, M C; Holmans, P

    2016-09-01

    A strong motivation for undertaking psychiatric gene discovery studies is to provide novel insights into unknown biology. Although attention-deficit hyperactivity disorder (ADHD) is highly heritable, and large, rare copy number variants (CNVs) contribute to risk, little is known about its pathogenesis and it remains commonly misunderstood. We assembled and pooled five ADHD and control CNV data sets from the United Kingdom, Ireland, United States of America, Northern Europe and Canada. Our aim was to test for enrichment of neurodevelopmental gene sets, implicated by recent exome-sequencing studies of (a) schizophrenia and (b) autism as a means of testing the hypothesis that common pathogenic mechanisms underlie ADHD and these other neurodevelopmental disorders. We also undertook hypothesis-free testing of all biological pathways. We observed significant enrichment of individual genes previously found to harbour schizophrenia de novo non-synonymous single-nucleotide variants (SNVs; P=5.4 × 10(-4)) and targets of the Fragile X mental retardation protein (P=0.0018). No enrichment was observed for activity-regulated cytoskeleton-associated protein (P=0.23) or N-methyl-D-aspartate receptor (P=0.74) post-synaptic signalling gene sets previously implicated in schizophrenia. Enrichment of ADHD CNV hits for genes impacted by autism de novo SNVs (P=0.019 for non-synonymous SNV genes) did not survive Bonferroni correction. Hypothesis-free testing yielded several highly significantly enriched biological pathways, including ion channel pathways. Enrichment findings were robust to multiple testing corrections and to sensitivity analyses that excluded the most significant sample. The findings reveal that CNVs in ADHD converge on biologically meaningful gene clusters, including ones now established as conferring risk of other neurodevelopmental disorders. PMID:26573769

  16. Genome Enabled Discovery of Carbon Sequestration Genes in Poplar

    SciTech Connect

    Filichkin, Sergei; Etherington, Elizabeth; Ma, Caiping; Strauss, Steve

    2007-02-22

    The goals of the S.H. Strauss laboratory portion of 'Genome-enabled discovery of carbon sequestration genes in poplar' are (1) to explore the functions of candidate genes using Populus transformation by inserting genes provided by Oakridge National Laboratory (ORNL) and the University of Florida (UF) into poplar; (2) to expand the poplar transformation toolkit by developing transformation methods for important genotypes; and (3) to allow induced expression, and efficient gene suppression, in roots and other tissues. As part of the transformation improvement effort, OSU developed transformation protocols for Populus trichocarpa 'Nisqually-1' clone and an early flowering P. alba clone, 6K10. Complete descriptions of the transformation systems were published (Ma et. al. 2004, Meilan et. al 2004). Twenty-one 'Nisqually-1' and 622 6K10 transgenic plants were generated. To identify root predominant promoters, a set of three promoters were tested for their tissue-specific expression patterns in poplar and in Arabidopsis as a model system. A novel gene, ET304, was identified by analyzing a collection of poplar enhancer trap lines generated at OSU (Filichkin et. al 2006a, 2006b). Other promoters include the pGgMT1 root-predominant promoter from Casuarina glauca and the pAtPIN2 promoter from Arabidopsis root specific PIN2 gene. OSU tested two induction systems, alcohol- and estrogen-inducible, in multiple poplar transgenics. Ethanol proved to be the more efficient when tested in tissue culture and greenhouse conditions. Two estrogen-inducible systems were evaluated in transgenic Populus, neither of which functioned reliably in tissue culture conditions. GATEWAY-compatible plant binary vectors were designed to compare the silencing efficiency of homologous (direct) RNAi vs. heterologous (transitive) RNAi inverted repeats. A set of genes was targeted for post transcriptional silencing in the model Arabidopsis system; these include the floral meristem identity gene (APETALA1 or

  17. Second-generation sequencing for gene discovery in the Brassicaceae.

    PubMed

    Hayward, Alice; Vighnesh, Guru; Delay, Christina; Samian, Mohd Rafizan; Manoli, Sahana; Stiller, Jiri; McKenzie, Megan; Edwards, David; Batley, Jacqueline

    2012-08-01

    The Brassicaceae contains the most diverse collection of agriculturally important crop species of all plant families. Yet, this is one of the few families that do not form functional symbiotic associations with mycorrhizal fungi in the soil for improved nutrient acquisition. The genes involved in this symbiosis were more recently recruited by legumes for symbiotic association with nitrogen-fixing rhizobia bacteria. This study applied second-generation sequencing (SGS) and analysis tools to discover that two such genes, NSP1 (Nodulation Signalling Pathway 1) and NSP2, remain conserved in diverse members of the Brassicaceae despite the absence of these symbioses. We demonstrate the utility of SGS data for the discovery of putative gene homologs and their analysis in complex polyploid crop genomes with little prior sequence information. Furthermore, we show how this data can be applied to enhance downstream reverse genetics analyses. We hypothesize that Brassica NSP genes may function in the root in other plant-microbe interaction pathways that were recruited for mycorrhizal and rhizobial symbioses during evolution. PMID:22765874

  18. Sugarcane Functional Genomics: Gene Discovery for Agronomic Trait Development

    PubMed Central

    Menossi, M.; Silva-Filho, M. C.; Vincentz, M.; Van-Sluys, M.-A.; Souza, G. M.

    2008-01-01

    Sugarcane is a highly productive crop used for centuries as the main source of sugar and recently to produce ethanol, a renewable bio-fuel energy source. There is increased interest in this crop due to the impending need to decrease fossil fuel usage. Sugarcane has a highly polyploid genome. Expressed sequence tag (EST) sequencing has significantly contributed to gene discovery and expression studies used to associate function with sugarcane genes. A significant amount of data exists on regulatory events controlling responses to herbivory, drought, and phosphate deficiency, which cause important constraints on yield and on endophytic bacteria, which are highly beneficial. The means to reduce drought, phosphate deficiency, and herbivory by the sugarcane borer have a negative impact on the environment. Improved tolerance for these constraints is being sought. Sugarcane's ability to accumulate sucrose up to 16% of its culm dry weight is a challenge for genetic manipulation. Genome-based technology such as cDNA microarray data indicates genes associated with sugar content that may be used to develop new varieties improved for sucrose content or for traits that restrict the expansion of the cultivated land. The genes can also be used as molecular markers of agronomic traits in traditional breeding programs. PMID:18273390

  19. Gene expression endophenotypes: a novel approach for gene discovery in Alzheimer's disease.

    PubMed

    Ertekin-Taner, Nilüfer

    2011-01-01

    Uncovering the underlying genetic component of any disease is key to the understanding of its pathophysiology and may open new avenues for development of therapeutic strategies and biomarkers. In the past several years, there has been an explosion of genome-wide association studies (GWAS) resulting in the discovery of novel candidate genes conferring risk for complex diseases, including neurodegenerative diseases. Despite this success, there still remains a substantial genetic component for many complex traits and conditions that is unexplained by the GWAS findings. Additionally, in many cases, the mechanism of action of the newly discovered disease risk variants is not inherently obvious. Furthermore, a genetic region with multiple genes may be identified via GWAS, making it difficult to discern the true disease risk gene. Several alternative approaches are proposed to overcome these potential shortcomings of GWAS, including the use of quantitative, biologically relevant phenotypes. Gene expression levels represent an important class of endophenotypes. Genetic linkage and association studies that utilize gene expression levels as endophenotypes determined that the expression levels of many genes are under genetic influence. This led to the postulate that there may exist many genetic variants that confer disease risk via modifying gene expression levels. Results from the handful of genetic studies which assess gene expression level endophenotypes in conjunction with disease risk suggest that this combined phenotype approach may both increase the power for gene discovery and lead to an enhanced understanding of their mode of action. This review summarizes the evidence in support of gene expression levels as promising endophenotypes in the discovery and characterization of novel candidate genes for complex diseases, which may also represent a novel approach in the genetic studies of Alzheimer's and other neurodegenerative diseases. PMID:21569597

  20. Amyotrophic Lateral Sclerosis: An Emerging Era of Collaborative Gene Discovery

    PubMed Central

    Gwinn, Katrina; Corriveau, Roderick A.; Mitsumoto, Hiroshi; Bednarz, Kate; Brown, Robert H.; Cudkowicz, Merit; Gordon, Paul H.; Hardy, John; Kasarskis, Edward J.; Kaufmann, Petra; Miller, Robert; Sorenson, Eric; Tandan, Rup; Traynor, Bryan J.; Nash, Josefina; Sherman, Alex; Mailman, Matthew D.; Ostell, James; Bruijn, Lucie; Cwik, Valerie; Rich, Stephen S.; Singleton, Andrew; Refolo, Larry; Andrews, Jaime; Zhang, Ran; Conwit, Robin; Keller, Margaret A.

    2007-01-01

    Amyotrophic lateral sclerosis (ALS) is the most common form of motor neuron disease (MND). It is currently incurable and treatment is largely limited to supportive care. Family history is associated with an increased risk of ALS, and many Mendelian causes have been discovered. However, most forms of the disease are not obviously familial. Recent advances in human genetics have enabled genome-wide analyses of single nucleotide polymorphisms (SNPs) that make it possible to study complex genetic contributions to human disease. Genome-wide SNP analyses require a large sample size and thus depend upon collaborative efforts to collect and manage the biological samples and corresponding data. Public availability of biological samples (such as DNA), phenotypic and genotypic data further enhances research endeavors. Here we discuss a large collaboration among academic investigators, government, and non-government organizations which has created a public repository of human DNA, immortalized cell lines, and clinical data to further gene discovery in ALS. This resource currently maintains samples and associated phenotypic data from 2332 MND subjects and 4692 controls. This resource should facilitate genetic discoveries which we anticipate will ultimately provide a better understanding of the biological mechanisms of neurodegeneration in ALS. PMID:18060051

  1. The Matchmaker Exchange: a platform for rare disease gene discovery.

    PubMed

    Philippakis, Anthony A; Azzariti, Danielle R; Beltran, Sergi; Brookes, Anthony J; Brownstein, Catherine A; Brudno, Michael; Brunner, Han G; Buske, Orion J; Carey, Knox; Doll, Cassie; Dumitriu, Sergiu; Dyke, Stephanie O M; den Dunnen, Johan T; Firth, Helen V; Gibbs, Richard A; Girdea, Marta; Gonzalez, Michael; Haendel, Melissa A; Hamosh, Ada; Holm, Ingrid A; Huang, Lijia; Hurles, Matthew E; Hutton, Ben; Krier, Joel B; Misyura, Andriy; Mungall, Christopher J; Paschall, Justin; Paten, Benedict; Robinson, Peter N; Schiettecatte, François; Sobreira, Nara L; Swaminathan, Ganesh J; Taschner, Peter E; Terry, Sharon F; Washington, Nicole L; Züchner, Stephan; Boycott, Kym M; Rehm, Heidi L

    2015-10-01

    There are few better examples of the need for data sharing than in the rare disease community, where patients, physicians, and researchers must search for "the needle in a haystack" to uncover rare, novel causes of disease within the genome. Impeding the pace of discovery has been the existence of many small siloed datasets within individual research or clinical laboratory databases and/or disease-specific organizations, hoping for serendipitous occasions when two distant investigators happen to learn they have a rare phenotype in common and can "match" these cases to build evidence for causality. However, serendipity has never proven to be a reliable or scalable approach in science. As such, the Matchmaker Exchange (MME) was launched to provide a robust and systematic approach to rare disease gene discovery through the creation of a federated network connecting databases of genotypes and rare phenotypes using a common application programming interface (API). The core building blocks of the MME have been defined and assembled. Three MME services have now been connected through the API and are available for community use. Additional databases that support internal matching are anticipated to join the MME network as it continues to grow. PMID:26295439

  2. Canonical Correlation Analysis for Gene-Based Pleiotropy Discovery

    PubMed Central

    Seoane, Jose A.; Campbell, Colin; Day, Ian N. M.; Casas, Juan P.; Gaunt, Tom R.

    2014-01-01

    Genome-wide association studies have identified a wealth of genetic variants involved in complex traits and multifactorial diseases. There is now considerable interest in testing variants for association with multiple phenotypes (pleiotropy) and for testing multiple variants for association with a single phenotype (gene-based association tests). Such approaches can increase statistical power by combining evidence for association over multiple phenotypes or genetic variants respectively. Canonical Correlation Analysis (CCA) measures the correlation between two sets of multidimensional variables, and thus offers the potential to combine these two approaches. To apply CCA, we must restrict the number of attributes relative to the number of samples. Hence we consider modules of genetic variation that can comprise a gene, a pathway or another biologically relevant grouping, and/or a set of phenotypes. In order to do this, we use an attribute selection strategy based on a binary genetic algorithm. Applied to a UK-based prospective cohort study of 4286 women (the British Women's Heart and Health Study), we find improved statistical power in the detection of previously reported genetic associations, and identify a number of novel pleiotropic associations between genetic variants and phenotypes. New discoveries include gene-based association of NSF with triglyceride levels and several genes (ACSM3, ERI2, IL18RAP, IL23RAP and NRG1) with left ventricular hypertrophy phenotypes. In multiple-phenotype analyses we find association of NRG1 with left ventricular hypertrophy phenotypes, fibrinogen and urea and pleiotropic relationships of F7 and F10 with Factor VII, Factor IX and cholesterol levels. PMID:25329069

  3. Canonical correlation analysis for gene-based pleiotropy discovery.

    PubMed

    Seoane, Jose A; Campbell, Colin; Day, Ian N M; Casas, Juan P; Gaunt, Tom R

    2014-10-01

    Genome-wide association studies have identified a wealth of genetic variants involved in complex traits and multifactorial diseases. There is now considerable interest in testing variants for association with multiple phenotypes (pleiotropy) and for testing multiple variants for association with a single phenotype (gene-based association tests). Such approaches can increase statistical power by combining evidence for association over multiple phenotypes or genetic variants respectively. Canonical Correlation Analysis (CCA) measures the correlation between two sets of multidimensional variables, and thus offers the potential to combine these two approaches. To apply CCA, we must restrict the number of attributes relative to the number of samples. Hence we consider modules of genetic variation that can comprise a gene, a pathway or another biologically relevant grouping, and/or a set of phenotypes. In order to do this, we use an attribute selection strategy based on a binary genetic algorithm. Applied to a UK-based prospective cohort study of 4286 women (the British Women's Heart and Health Study), we find improved statistical power in the detection of previously reported genetic associations, and identify a number of novel pleiotropic associations between genetic variants and phenotypes. New discoveries include gene-based association of NSF with triglyceride levels and several genes (ACSM3, ERI2, IL18RAP, IL23RAP and NRG1) with left ventricular hypertrophy phenotypes. In multiple-phenotype analyses we find association of NRG1 with left ventricular hypertrophy phenotypes, fibrinogen and urea and pleiotropic relationships of F7 and F10 with Factor VII, Factor IX and cholesterol levels. PMID:25329069

  4. Turning publicly available gene expression data into discoveries using gene set context analysis

    PubMed Central

    Ji, Zhicheng; Vokes, Steven A.; Dang, Chi V.; Ji, Hongkai

    2016-01-01

    Gene Set Context Analysis (GSCA) is an open source software package to help researchers use massive amounts of publicly available gene expression data (PED) to make discoveries. Users can interactively visualize and explore gene and gene set activities in 25,000+ consistently normalized human and mouse gene expression samples representing diverse biological contexts (e.g. different cells, tissues and disease types, etc.). By providing one or multiple genes or gene sets as input and specifying a gene set activity pattern of interest, users can query the expression compendium to systematically identify biological contexts associated with the specified gene set activity pattern. In this way, researchers with new gene sets from their own experiments may discover previously unknown contexts of gene set functions and hence increase the value of their experiments. GSCA has a graphical user interface (GUI). The GUI makes the analysis convenient and customizable. Analysis results can be conveniently exported as publication quality figures and tables. GSCA is available at https://github.com/zji90/GSCA. This software significantly lowers the bar for biomedical investigators to use PED in their daily research for generating and screening hypotheses, which was previously difficult because of the complexity, heterogeneity and size of the data. PMID:26350211

  5. Gene Discovery through Expressed Sequence Tag Sequencing in Trypanosoma cruzi

    PubMed Central

    Verdun, Ramiro E.; Di Paolo, Nelson; Urmenyi, Turan P.; Rondinelli, Edson; Frasch, Alberto C. C.; Sanchez, Daniel O.

    1998-01-01

    Analysis of expressed sequence tags (ESTs) constitutes a useful approach for gene identification that, in the case of human pathogens, might result in the identification of new targets for chemotherapy and vaccine development. As part of the Trypanosoma cruzi genome project, we have partially sequenced the 5′ ends of 1,949 clones to generate ESTs. The clones were randomly selected from a normalized CL Brener epimastigote cDNA library. A total of 14.6% of the clones were homologous to previously identified T. cruzi genes, while 18.4% had significant matches to genes from other organisms in the database. A total of 67% of the ESTs had no matches in the database, and thus, some of them might be T. cruzi-specific genes. Functional groups of those sequences with matches in the database were constructed according to their putative biological functions. The two largest categories were protein synthesis (23.3%) and cell surface molecules (10.8%). The information reported in this paper should be useful for researchers in the field to analyze genes and proteins of their own interest. PMID:9784549

  6. Next-generation diagnostics and disease-gene discovery with the Exomiser.

    PubMed

    Smedley, Damian; Jacobsen, Julius O B; Jäger, Marten; Köhler, Sebastian; Holtgrewe, Manuel; Schubach, Max; Siragusa, Enrico; Zemojtel, Tomasz; Buske, Orion J; Washington, Nicole L; Bone, William P; Haendel, Melissa A; Robinson, Peter N

    2015-12-01

    Exomiser is an application that prioritizes genes and variants in next-generation sequencing (NGS) projects for novel disease-gene discovery or differential diagnostics of Mendelian disease. Exomiser comprises a suite of algorithms for prioritizing exome sequences using random-walk analysis of protein interaction networks, clinical relevance and cross-species phenotype comparisons, as well as a wide range of other computational filters for variant frequency, predicted pathogenicity and pedigree analysis. In this protocol, we provide a detailed explanation of how to install Exomiser and use it to prioritize exome sequences in a number of scenarios. Exomiser requires ∼3 GB of RAM and roughly 15-90 s of computing time on a standard desktop computer to analyze a variant call format (VCF) file. Exomiser is freely available for academic use from http://www.sanger.ac.uk/science/tools/exomiser. PMID:26562621

  7. Gene Discovery in the Apicomplexa as Revealed by EST Sequencing and Assembly of a Comparative Gene Database

    PubMed Central

    Li, Li; Brunk, Brian P.; Kissinger, Jessica C.; Pape, Deana; Tang, Keliang; Cole, Robert H.; Martin, John; Wylie, Todd; Dante, Mike; Fogarty, Steven J.; Howe, Daniel K.; Liberator, Paul; Diaz, Carmen; Anderson, Jennifer; White, Michael; Jerome, Maria E.; Johnson, Emily A.; Radke, Jay A.; Stoeckert, Christian J.; Waterston, Robert H.; Clifton, Sandra W.; Roos, David S.; Sibley, L. David

    2003-01-01

    Large-scale EST sequencing projects for several important parasites within the phylum Apicomplexa were undertaken for the purpose of gene discovery. Included were several parasites of medical importance (Plasmodium falciparum, Toxoplasma gondii) and others of veterinary importance (Eimeria tenella, Sarcocystis neurona, and Neospora caninum). A total of 55,192 ESTs, deposited into dbEST/GenBank, were included in the analyses. The resulting sequences have been clustered into nonredundant gene assemblies and deposited into a relational database that supports a variety of sequence and text searches. This database has been used to compare the gene assemblies using BLAST similarity comparisons to the public protein databases to identify putative genes. Of these new entries, ∼15%–20% represent putative homologs with a conservative cutoff of p < 10−9, thus identifying many conserved genes that are likely to share common functions with other well-studied organisms. Gene assemblies were also used to identify strain polymorphisms, examine stage-specific expression, and identify gene families. An interesting class of genes that are confined to members of this phylum and not shared by plants, animals, or fungi, was identified. These genes likely mediate the novel biological features of members of the Apicomplexa and hence offer great potential for biological investigation and as possible therapeutic targets. [The sequence data from this study have been submitted to dbEST division of GenBank under accession nos.: Toxoplasma gondii: –, –, –, –, – , –, –, –, –. Plasmodium falciparum: –, –, –, –. Sarcocystis neurona: , , , , , , , , , , , , , –, –, –, –, –. Eimeria tenella: –, –, –, –, –, –, –, –, – , –, –, –, –, –, –, –, –, –, –, –. Neospora caninum: –, –, , – , –, –.] PMID:12618375

  8. The discovery of the microphthalmia locus and its gene, Mitf

    PubMed Central

    Arnheiter, Heinz

    2010-01-01

    Summary The history of the discovery of the microphthalmia locus and its gene, now called Mitf, is a testament to the triumph of serendipity. Although the first microphthalmia mutation was discovered among the descendants of a mouse that was irradiated for the purpose of mutagenesis, the mutation most likely was not radiation-induced but occurred spontaneously in one of the parents of a later breeding. Although Mitf might eventually have been identified by other molecular genetic techniques, it was first cloned from a chance transgene insertion at the microphthalmia locus. And although Mitf was found to encode a member of a well-known transcription factor family, its analysis might still be in its infancy had Mitf not turned out to be of crucial importance for the physiology and pathology of many distinct organs, including eye, ear, immune system, bone, and skin, and in particular for melanoma. In fact, near seven decades of Mitf research have led to many insights about development, function, degeneration, and malignancies of a number of specific cell types, and it is hoped that these insights will one day lead to therapies benefitting those afflicted with diseases originating in these cell types. PMID:20807369

  9. Chromosome substitution strains: gene discovery functional analysis and systems studies

    PubMed Central

    Nadeau, Joseph H.; Forejt, Jiri; Takada, Toyoyuki; Shiroishi, Toshihiko

    2014-01-01

    Laboratory mice are valuable in biomedical research in part because of the extraordinary diversity of genetic resources that are available for studies of complex genetic traits and as models for human biology and disease. Chromosome substitution strains (CSSs) are important in this resource portfolio because of their demonstrated use for gene discovery, genetic and epigenetic studies, functional characterizations, and systems analysis. CSSs are made by replacing a single chromosome in a host strain with the corresponding chromosome from a donor strain. A complete CSS panel involves a total of 22 engineered inbred strains, one for each of the 19 autosomes, one each for the X and Y chromosomes, and one for mitochondria. A genome survey simply involves comparing each phenotype for each of the CSSs with the phenotypes of the host strain. The CSS panels that are available for laboratory mice have been used to dissect a remarkable variety of phenotypes and to characterize an impressive array of disease models. These surveys have revealed considerable phenotypic diversity even among closely related progenitor strains, evidence for strong epistasis and for heritable epigenetic changes. Perhaps most importantly, and presumably because of their unique genetic constitution, CSSs, and congenic strains derived from them, the genetic variants underlying quantitative trait loci (QTLs) are readily identified and functionally characterized. Together these studies show that CSSs are important resource for laboratory mice. PMID:22961226

  10. Discovery and New Frontiers Project Budget Analysis Tool

    NASA Technical Reports Server (NTRS)

    Newhouse, Marilyn E.

    2011-01-01

    The Discovery and New Frontiers (D&NF) programs are multi-project, uncoupled programs that currently comprise 13 missions in phases A through F. The ability to fly frequent science missions to explore the solar system is the primary measure of program success. The program office uses a Budget Analysis Tool to perform "what-if" analyses and compare mission scenarios to the current program budget, and rapidly forecast the programs ability to meet their launch rate requirements. The tool allows the user to specify the total mission cost (fixed year), mission development and operations profile by phase (percent total mission cost and duration), launch vehicle, and launch date for multiple missions. The tool automatically applies inflation and rolls up the total program costs (in real year dollars) for comparison against available program budget. Thus, the tool allows the user to rapidly and easily explore a variety of launch rates and analyze the effect of changes in future mission or launch vehicle costs, the differing development profiles or operational durations of a future mission, or a replan of a current mission on the overall program budget. Because the tool also reports average monthly costs for the specified mission profile, the development or operations cost profile can easily be validate against program experience for similar missions. While specifically designed for predicting overall program budgets for programs that develop and operate multiple missions concurrently, the basic concept of the tool (rolling up multiple, independently-budget lines) could easily be adapted to other applications.

  11. Gene Prioritization for Imaging Genetics Studies Using Gene Ontology and a Stratified False Discovery Rate Approach.

    PubMed

    Patel, Sejal; Park, Min Tae M; Chakravarty, M Mallar; Knight, Jo

    2016-01-01

    Imaging genetics is an emerging field in which the association between genes and neuroimaging-based quantitative phenotypes are used to explore the functional role of genes in neuroanatomy and neurophysiology in the context of healthy function and neuropsychiatric disorders. The main obstacle for researchers in the field is the high dimensionality of the data in both the imaging phenotypes and the genetic variants commonly typed. In this article, we develop a novel method that utilizes Gene Ontology, an online database, to select and prioritize certain genes, employing a stratified false discovery rate (sFDR) approach to investigate their associations with imaging phenotypes. sFDR has the potential to increase power in genome wide association studies (GWAS), and is quickly gaining traction as a method for multiple testing correction. Our novel approach addresses both the pressing need in genetic research to move beyond candidate gene studies, while not being overburdened with a loss of power due to multiple testing. As an example of our methodology, we perform a GWAS of hippocampal volume using both the Enhancing NeuroImaging Genetics through Meta-Analysis (ENIGMA2) and the Alzheimer's Disease Neuroimaging Initiative datasets. The analysis of ENIGMA2 data yielded a set of SNPs with sFDR values between 10 and 20%. Our approach demonstrates a potential method to prioritize genes based on biological systems impaired in a disease. PMID:27092072

  12. Gene Prioritization for Imaging Genetics Studies Using Gene Ontology and a Stratified False Discovery Rate Approach

    PubMed Central

    Patel, Sejal; Park, Min Tae M.; Chakravarty, M. Mallar; Knight, Jo

    2016-01-01

    Imaging genetics is an emerging field in which the association between genes and neuroimaging-based quantitative phenotypes are used to explore the functional role of genes in neuroanatomy and neurophysiology in the context of healthy function and neuropsychiatric disorders. The main obstacle for researchers in the field is the high dimensionality of the data in both the imaging phenotypes and the genetic variants commonly typed. In this article, we develop a novel method that utilizes Gene Ontology, an online database, to select and prioritize certain genes, employing a stratified false discovery rate (sFDR) approach to investigate their associations with imaging phenotypes. sFDR has the potential to increase power in genome wide association studies (GWAS), and is quickly gaining traction as a method for multiple testing correction. Our novel approach addresses both the pressing need in genetic research to move beyond candidate gene studies, while not being overburdened with a loss of power due to multiple testing. As an example of our methodology, we perform a GWAS of hippocampal volume using both the Enhancing NeuroImaging Genetics through Meta-Analysis (ENIGMA2) and the Alzheimer's Disease Neuroimaging Initiative datasets. The analysis of ENIGMA2 data yielded a set of SNPs with sFDR values between 10 and 20%. Our approach demonstrates a potential method to prioritize genes based on biological systems impaired in a disease. PMID:27092072

  13. New Discoveries From The Archean Biosphere Drilling Project (ABDP)

    NASA Astrophysics Data System (ADS)

    Nedachi, M.

    2004-12-01

    The Archean Biosphere Drilling Project (ABDP), an international scientific drilling project involving scientists from the USA, Australia and Japan, was initiated in Pilbara Craton, Western Australia. The scientific objectives of the ABDP are the identification of microfossils and biomarkers, the clarification of geochemical environment of the early Earth, and the understanding of geophysical contribution to the co-evolution of life and environment. Through 2003 and 2004 activities, we have drilled 150 _| 300 m deep holes to recover _gfresh_h (modern weathering-free) geologic formations that range from 3.5 to 2.7 Ga in age. The drilling targets were: (1) 3.46 Ga Towers Formation, (2) mid-Archean Mosquito Formation, (3) 2.77 Ga Mt Roe Basalt, (4) 2.76 Ga Tumbiana Formation, (5) 2.74 Ga Hardey Formation. The initial investigations on the ABDP drill cores by Japanese members have already produced many exciting and interesting data and observations. 3.46 Ga Marble Bar Jasper could provide clues to the argument about the early photosynthetic cyanobacteria that have produced free oxygen and have evolved the oxygen level on the earth. There have been many ideas how the hematite in jasper was formed. Our most important discoveries are the confirmations that hematite, magnetite and siderite precipitated separately as primary minerals, and that there is a remaining texture which resembles microfossil using FE-SEM, ESCA, Laser-Raman and cathodoluminescence. Taking into account the carbon isotopic ratios of remains from _|25 to _|40 permil, these iron oxides might be biogenic. We need to identify the iron bacteria in detail to deduce the early earth_fs surface environment. In addition, the black shale of Apex Basalt overlying Marble Bar Jasper contains organic carbon from 0.7 to 5.2 percent, and the carbon isotopic ratio of which is from -26 to -30 per mil, suggesting that various microbes inhabited in the early Archean ocean. 2.77 Ga Mt Roe Basalt, which is composed of

  14. Discovery

    ERIC Educational Resources Information Center

    de Mestre, Neville

    2010-01-01

    All common fractions can be written in decimal form. In this Discovery article, the author suggests that teachers ask their students to calculate the decimals by actually doing the divisions themselves, and later on they can use a calculator to check their answers. This article presents a lesson based on the research of Bolt (1982).

  15. A Rule-Based Framework for Gene Regulation Pathways Discovery

    SciTech Connect

    Wilczynski, B; Hvidsten, T; Kryshtafovych, A; Stubbs, L; Komorowski, J; Fidelis, K

    2003-07-21

    We present novel approach to discover the rules that govern gene regulation mechanisms. The method is based on supervised machine learning and is designed to reveal relationships between transcription factors and gene promoters. As the representation of the gene regulatory circuit we have chosen a special form of IF-THEN rules associating certain features (a generalized idea of a Transcription Factor Binding Site) in gene promoters with specific gene expression profiles.

  16. Discovery of signature genes in gastric cancer associated with prognosis.

    PubMed

    Zhao, X; Cai, H; Wang, X; Ma, L

    2016-01-01

    Gene expression profiles of gastric cancer (GC) were analyzed with bioinformatics tools to identify signature genes associated with prognosis. Four gene expression data sets (accession number: GSE2685, GSE30727, GSE38932 and GSE26253) were downloaded from Gene Expression Omnibus. Differentially expressed genes (DEGs) were screened out using significance analysis of microarrays (SAM) algorithm. P-value 1 were set as the threshold. A co-expression network was constructed for the GC-related genes with package WGCNA of R. Modules were disclosed with WGCNA algorithm. Survival-related signature genes were screened out via COX single-variable regression.A total of 3210 GC-related genes were identified from the 3 data sets. Significantly enriched GO biological process terms included cell death, cell proliferation, apoptosis, response to hormone and phosphorylation. Pathways like viral carcinogenesis, metabolism, EBV viral infection, and PI3K-AKT signaling pathway were significantly over-represented in the DEGs. A gene co-expression network including 2414 genes was constructed, from which 7 modules were revealed. A total of 17 genes were identified as signature genes, such as DAB2, ALDH2, CD58, CITED2, BNIP3L, SLC43A2, FAU and COL5A1.Many signature genes associated with prognosis of GC were identified in present study, some of which have been implicated in the pathogenesis of GC. These findings could not only improve the knowledge about GC, but also provide clues for clinical treatments. PMID:26774142

  17. Using the DFCI Gene Index Databases for Biological Discovery

    PubMed Central

    Antonescu, Corina; Antonescu, Valentin; Sultana, Razvan; Quackenbush, John

    2014-01-01

    The DFCI Gene Index Web pages provide access to analyses of ESTs and gene sequences for nearly 114 species, as well as a number of resources derived from these. Each species-specific database is presented using a common format with a home page. A variety of methods exist that allow users to search each species-specific database. Methods implemented currently include nucleotide or protein sequence queries using WU-BLAST, text-based searches using various sequence identifiers, searches by gene, tissue and library name, and searches using functional classes through Gene Ontology assignments. This protocol provides guidance for using the Gene Index Databases to extract information. PMID:20205187

  18. Host genes associated with HIV/AIDS: advances in gene discovery

    PubMed Central

    An, Ping; Winkler, Cheryl A.

    2013-01-01

    Twenty five years after the discovery of HIV as the cause of AIDS, there is still no effective vaccine and no cure for this disease. HIV susceptibility shows a substantial degree of individual heterogeneity, much of which can be conferred by host genetic variation. In an effort to discover host factors required for HIV replication, identify critical pathogenic pathways, and reveal the full armament of host defenses, there has been a shift from candidate gene studies to unbiased genome wide genetic and functional studies. However, the number of securely identified host factors involved in HIV disease remains small, explaining only ~15–20% of the observed heterogeneity – most of which is attributable to HLA. Multidisciplinary approaches integrating genetic epidemiology to systems biology will be required to fully understand viral-host interactions to effectively combat HIV/AIDS. PMID:20149939

  19. Prioritization of neurodevelopmental disease genes by discovery of new mutations

    PubMed Central

    Hoischen, Alexander; Krumm, Niklas; Eichler, Evan E.

    2014-01-01

    Advances in genome sequencing technologies have begun to revolutionize neurogenetics allowing the full spectrum of genetic variation to be better understood in relationship to disease. Exome sequencing of hundreds to thousands of samples from patients with autism spectrum disorder, intellectual disability, epilepsy, and schizophrenia provide strong evidence of the importance of de novo and gene-disruptive events. There are now several hundred new candidate genes and targeted resequencing technologies that allow screening of dozens of genes in tens of thousands of individuals with high specificity and sensitivity. The decision of which genes to pursue depends on numerous factors including recurrence, prior evidence of overlap with pathogenic copy number variants, the position of the mutation within the protein, the mutational burden among healthy individuals, and membership of the candidate gene within disease-implicated protein networks. We discuss these emerging criteria for gene prioritization and the potential impact on the field of neuroscience. PMID:24866042

  20. Discovery of Cationic Polymers for Non-viral Gene Delivery using Combinatorial Approaches

    PubMed Central

    Barua, Sutapa; Ramos, James; Potta, Thrimoorthy; Taylor, David; Huang, Huang-Chiao; Montanez, Gabriela; Rege, Kaushal

    2015-01-01

    Gene therapy is an attractive treatment option for diseases of genetic origin, including several cancers and cardiovascular diseases. While viruses are effective vectors for delivering exogenous genes to cells, concerns related to insertional mutagenesis, immunogenicity, lack of tropism, decay and high production costs necessitate the discovery of non-viral methods. Significant efforts have been focused on cationic polymers as non-viral alternatives for gene delivery. Recent studies have employed combinatorial syntheses and parallel screening methods for enhancing the efficacy of gene delivery, biocompatibility of the delivery vehicle, and overcoming cellular level barriers as they relate to polymer-mediated transgene uptake, transport, transcription, and expression. This review summarizes and discusses recent advances in combinatorial syntheses and parallel screening of cationic polymer libraries for the discovery of efficient and safe gene delivery systems. PMID:21843141

  1. GENOME-ENABLED DISCOVERY OF CARBON SEQUESTRATION GENES IN POPLAR

    SciTech Connect

    DAVIS J M

    2007-10-11

    Plants utilize carbon by partitioning the reduced carbon obtained through photosynthesis into different compartments and into different chemistries within a cell and subsequently allocating such carbon to sink tissues throughout the plant. Since the phytohormones auxin and cytokinin are known to influence sink strength in tissues such as roots (Skoog & Miller 1957, Nordstrom et al. 2004), we hypothesized that altering the expression of genes that regulate auxin-mediated (e.g., AUX/IAA or ARF transcription factors) or cytokinin-mediated (e.g., RR transcription factors) control of root growth and development would impact carbon allocation and partitioning belowground (Fig. 1 - Renewal Proposal). Specifically, the ARF, AUX/IAA and RR transcription factor gene families mediate the effects of the growth regulators auxin and cytokinin on cell expansion, cell division and differentiation into root primordia. Invertases (IVR), whose transcript abundance is enhanced by both auxin and cytokinin, are critical components of carbon movement and therefore of carbon allocation. Thus, we initiated comparative genomic studies to identify the AUX/IAA, ARF, RR and IVR gene families in the Populus genome that could impact carbon allocation and partitioning. Bioinformatics searches using Arabidopsis gene sequences as queries identified regions with high degrees of sequence similarities in the Populus genome. These Populus sequences formed the basis of our transgenic experiments. Transgenic modification of gene expression involving members of these gene families was hypothesized to have profound effects on carbon allocation and partitioning.

  2. Gene Discovery through Genomic Sequencing of Brucella abortus

    PubMed Central

    Sánchez, Daniel O.; Zandomeni, Ruben O.; Cravero, Silvio; Verdún, Ramiro E.; Pierrou, Ester; Faccio, Paula; Diaz, Gabriela; Lanzavecchia, Silvia; Agüero, Fernán; Frasch, Alberto C. C.; Andersson, Siv G. E.; Rossetti, Osvaldo L.; Grau, Oscar; Ugalde, Rodolfo A.

    2001-01-01

    Brucella abortus is the etiological agent of brucellosis, a disease that affects bovines and human. We generated DNA random sequences from the genome of B. abortus strain 2308 in order to characterize molecular targets that might be useful for developing immunological or chemotherapeutic strategies against this pathogen. The partial sequencing of 1,899 clones allowed the identification of 1,199 genomic sequence surveys (GSSs) with high homology (BLAST expect value < 10−5) to sequences deposited in the GenBank databases. Among them, 925 represent putative novel genes for the Brucella genus. Out of 925 nonredundant GSSs, 470 were classified in 15 categories based on cellular function. Seven hundred GSSs showed no significant database matches and remain available for further studies in order to identify their function. A high number of GSSs with homology to Agrobacterium tumefaciens and Rhizobium meliloti proteins were observed, thus confirming their close phylogenetic relationship. Among them, several GSSs showed high similarity with genes related to nodule nitrogen fixation, synthesis of nod factors, nodulation protein symbiotic plasmid, and nodule bacteroid differentiation. We have also identified several B. abortus homologs of virulence and pathogenesis genes from other pathogens, including a homolog to both the Shda gene from Salmonella enterica serovar Typhimurium and the AidA-1 gene from Escherichia coli. Other GSSs displayed significant homologies to genes encoding components of the type III and type IV secretion machineries, suggesting that Brucella might also have an active type III secretion machinery. PMID:11159979

  3. Gene discovery in the horned beetle Onthophagus taurus

    PubMed Central

    2010-01-01

    Background Horned beetles, in particular in the genus Onthophagus, are important models for studies on sexual selection, biological radiations, the origin of novel traits, developmental plasticity, biocontrol, conservation, and forensic biology. Despite their growing prominence as models for studying both basic and applied questions in biology, little genomic or transcriptomic data are available for this genus. We used massively parallel pyrosequencing (Roche 454-FLX platform) to produce a comprehensive EST dataset for the horned beetle Onthophagus taurus. To maximize sequence diversity, we pooled RNA extracted from a normalized library encompassing diverse developmental stages and both sexes. Results We used 454 pyrosequencing to sequence ESTs from all post-embryonic stages of O. taurus. Approximately 1.36 million reads assembled into 50,080 non-redundant sequences encompassing a total of 26.5 Mbp. The non-redundant sequences match over half of the genes in Tribolium castaneum, the most closely related species with a sequenced genome. Analyses of Gene Ontology annotations and biochemical pathways indicate that the O. taurus sequences reflect a wide and representative sampling of biological functions and biochemical processes. An analysis of sequence polymorphisms revealed that SNP frequency was negatively related to overall expression level and the number of tissue types in which a given gene is expressed. The most variable genes were enriched for a limited number of GO annotations whereas the least variable genes were enriched for a wide range of GO terms directly related to fitness. Conclusions This study provides the first large-scale EST database for horned beetles, a much-needed resource for advancing the study of these organisms. Furthermore, we identified instances of gene duplications and alternative splicing, useful for future study of gene regulation, and a large number of SNP markers that could be used in population-genetic studies of O. taurus and

  4. Transient transformation meets gene function discovery: the strawberry fruit case

    PubMed Central

    Guidarelli, Michela; Baraldi, Elena

    2015-01-01

    Beside the well known nutritional and health benefits, strawberry (Fragaria X ananassa) crop draws increasing attention as plant model system for the Rosaceae family, due to the short generation time, the rapid in vitro regeneration, and to the availability of the genome sequence of F. X ananassa and F. vesca species. In the last years, the use of high-throughput sequence technologies provided large amounts of molecular information on the genes possibly related to several biological processes of this crop. Nevertheless, the function of most genes or gene products is still poorly understood and needs investigation. Transient transformation technology provides a powerful tool to study gene function in vivo, avoiding difficult drawbacks that typically affect the stable transformation protocols, such as transformation efficiency, transformants selection, and regeneration. In this review we provide an overview of the use of transient expression in the investigation of the function of genes important for strawberry fruit development, defense and nutritional properties. The technical aspects related to an efficient use of this technique are described, and the possible impact and application in strawberry crop improvement are discussed. PMID:26124771

  5. Transient transformation meets gene function discovery: the strawberry fruit case.

    PubMed

    Guidarelli, Michela; Baraldi, Elena

    2015-01-01

    Beside the well known nutritional and health benefits, strawberry (FragariaXananassa) crop draws increasing attention as plant model system for the Rosaceae family, due to the short generation time, the rapid in vitro regeneration, and to the availability of the genome sequence of F.Xananassa and F. vesca species. In the last years, the use of high-throughput sequence technologies provided large amounts of molecular information on the genes possibly related to several biological processes of this crop. Nevertheless, the function of most genes or gene products is still poorly understood and needs investigation. Transient transformation technology provides a powerful tool to study gene function in vivo, avoiding difficult drawbacks that typically affect the stable transformation protocols, such as transformation efficiency, transformants selection, and regeneration. In this review we provide an overview of the use of transient expression in the investigation of the function of genes important for strawberry fruit development, defense and nutritional properties. The technical aspects related to an efficient use of this technique are described, and the possible impact and application in strawberry crop improvement are discussed. PMID:26124771

  6. Interaction-based discovery of functionally important genes in cancers

    PubMed Central

    Ghersi, Dario; Singh, Mona

    2014-01-01

    A major challenge in cancer genomics is uncovering genes with an active role in tumorigenesis from a potentially large pool of mutated genes across patient samples. Here we focus on the interactions that proteins make with nucleic acids, small molecules, ions and peptides, and show that residues within proteins that are involved in these interactions are more frequently affected by mutations observed in large-scale cancer genomic data than are other residues. We leverage this observation to predict genes that play a functionally important role in cancers by introducing a computational pipeline (http://canbind.princeton.edu) for mapping large-scale cancer exome data across patients onto protein structures, and automatically extracting proteins with an enriched number of mutations affecting their nucleic acid, small molecule, ion or peptide binding sites. Using this computational approach, we show that many previously known genes implicated in cancers are enriched in mutations within the binding sites of their encoded proteins. By focusing on functionally relevant portions of proteins—specifically those known to be involved in molecular interactions—our approach is particularly well suited to detect infrequent mutations that may nonetheless be important in cancer, and should aid in expanding our functional understanding of the genomic landscape of cancer. PMID:24362839

  7. Discovery of a New Puroindoline b Gene in Wheat

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Wilkinson and co-workers (J. Cereal Sci. 48:722-728, 2008) reported the existence of three new variant forms of puroindoline b. Termed simply variants 1, 2 and 3, these genes were purported to be encoded by the same Pinb-2 locus on chromosome 7A. In our research, we examined a total of 25 wheat cu...

  8. Discovery of a new puroindole b gene in wheat

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Wilkinson and co-workers (2008) reported the existence of three new variant forms of puroindoline b. Termed simply variants 1, 2 and 3, these genes were purported to be encoded by the same Pinb-2 locus on chromosome 7A. In our research, we examined a total of 5 wheat cultivars, 38 ditelosomic line...

  9. Discovery of the lomaiviticin biosynthetic gene cluster in Salinispora pacifica

    PubMed Central

    Janso, Jeffrey E.; Haltli, Brad A.; Eustáquio, Alessandra S.; Kulowski, Kerry; Waldman, Abraham J.; Zha, Li; Nakamura, Hitomi; Bernan, Valerie S.; He, Haiyin; Carter, Guy T.; Koehn, Frank E.; Balskus, Emily P.

    2014-01-01

    The lomaiviticins are a family of cytotoxic marine natural products that have captured the attention of both synthetic and biological chemists due to their intricate molecular scaffolds and potent biological activities. Here we describe the identification of the gene cluster responsible for lomaiviticin biosynthesis in Salinispora pacifica strains DPJ-0016 and DPJ-0019 using a combination of molecular approaches and genome sequencing. The link between the lom gene cluster and lomaiviticin production was confirmed using bacterial genetics, and subsequent analysis and annotation of this cluster revealed the biosynthetic basis for the core polyketide scaffold. Additionally, we have used comparative genomics to identify candidate enzymes for several unusual tailoring events, including diazo formation and oxidative dimerization. These findings will allow further elucidation of the biosynthetic logic of lomaiviticin assembly and provide useful molecular tools for application in biocatalysis and synthetic biology. PMID:25045187

  10. Data mining as a discovery tool for imprinted genes.

    PubMed

    Brideau, Chelsea; Soloway, Paul

    2012-01-01

    This chapter serves as an introduction to the collection of genome-wide sequence and epigenomic data, as well as the use of these data in training generalized linear models (glm) to predicted imprinted status. This is meant to be an introduction to the method, so only the most straightforward examples will be covered. For instance, the examples given below refer to 11 classes of genomic regions (the entire gene body, introns, exons, 5' UTR, 3' UTR, and 1, 10, and 100 kb upstream and downstream of each gene). One could also build models based on combinations of these regions. Likewise, models could be built on combinations of epigenetic features, or on combinations of both genomic regions and epigenetic features.This chapter relies heavily on computational methods, including basic programming. However, this chapter is not meant to be an introduction to programming. Throughout the chapter, the reader will be provided with example code in the Perl programming language. PMID:22907493

  11. Gene discovery for facioscapulohumeral muscular dystrophy by machine learning techniques.

    PubMed

    González-Navarro, Félix F; Belanche-Muñoz, Lluís A; Gámez-Moreno, María G; Flores-Ríos, Brenda L; Ibarra-Esquer, Jorge E; López-Morteo, Gabriel A

    2016-04-28

    Facioscapulohumeral muscular dystrophy (FSHD) is a neuromuscular disorder that shows a preference for the facial, shoulder and upper arm muscles. FSHD affects about one in 20-400,000 people, and no effective therapeutic strategies are known to halt disease progression or reverse muscle weakness or atrophy. Many genes may be incorrectly regulated in affected muscle tissue, but the mechanisms responsible for the progressive muscle weakness remain largely unknown. Although machine learning (ML) has made significant inroads in biomedical disciplines such as cancer research, no reports have yet addressed FSHD analysis using ML techniques. This study explores a specific FSHD data set from a ML perspective. We report results showing a very promising small group of genes that clearly separates FSHD samples from healthy samples. In addition to numerical prediction figures, we show data visualizations and biological evidence illustrating the potential usefulness of these results. PMID:26960968

  12. A Projection and Density Estimation Method for Knowledge Discovery

    PubMed Central

    Stanski, Adam; Hellwich, Olaf

    2012-01-01

    A key ingredient to modern data analysis is probability density estimation. However, it is well known that the curse of dimensionality prevents a proper estimation of densities in high dimensions. The problem is typically circumvented by using a fixed set of assumptions about the data, e.g., by assuming partial independence of features, data on a manifold or a customized kernel. These fixed assumptions limit the applicability of a method. In this paper we propose a framework that uses a flexible set of assumptions instead. It allows to tailor a model to various problems by means of 1d-decompositions. The approach achieves a fast runtime and is not limited by the curse of dimensionality as all estimations are performed in 1d-space. The wide range of applications is demonstrated at two very different real world examples. The first is a data mining software that allows the fully automatic discovery of patterns. The software is publicly available for evaluation. As a second example an image segmentation method is realized. It achieves state of the art performance on a benchmark dataset although it uses only a fraction of the training data and very simple features. PMID:23049675

  13. TILLING in forage grasses for gene discovery and breeding improvement.

    PubMed

    Manzanares, Chloe; Yates, Steven; Ruckle, Michael; Nay, Michelle; Studer, Bruno

    2016-09-25

    Mutation breeding has a long-standing history and in some major crop species, many of the most important cultivars have their origin in germplasm generated by mutation induction. For almost two decades, methods for TILLING (Targeting Induced Local Lesions IN Genomes) have been established in model plant species such as Arabidopsis (Arabidopsis thaliana L.), enabling the functional analysis of genes. Recent advances in mutation detection by second generation sequencing technology have brought its utility to major crop species. However, it has remained difficult to apply similar approaches in forage and turf grasses, mainly due to their outbreeding nature maintained by an efficient self-incompatibility system. Starting with a description of the extent to which traditional mutagenesis methods have contributed to crop yield increase in the past, this review focuses on technological approaches to implement TILLING-based strategies for the improvement of forage grass breeding through forward and reverse genetics. We present first results from TILLING in allogamous forage grasses for traits such as stress tolerance and evaluate prospects for rapid implementation of beneficial alleles to forage grass breeding. In conclusion, large-scale induced mutation resources, used for forward genetic screens, constitute a valuable tool to increase the genetic diversity for breeding and can be generated with relatively small investments in forage grasses. Furthermore, large libraries of sequenced mutations can be readily established, providing enhanced opportunities to discover mutations in genes controlling traits of agricultural importance and to study gene functions by reverse genetics. PMID:26924175

  14. Whole-genome resequencing: changing the paradigms of SNP detection, molecular mapping and gene discovery

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The next generation sequencing (NGS) technologies have opened a wealth of opportunities for plant breeding and genomics research, and changed the paradigms of marker detection, genotyping, and gene discovery. Abundant genomic resources have been generated using a whole genome resequencing (WGR) str...

  15. Scientific Discovery with the Blue Gene/L

    SciTech Connect

    Negele, John W.

    2011-12-09

    This project succeeded in developing key software optimization tools to bring fundamental QCD calculations of nucleon structure from the Terascale era through the Petascale era and prepare for the Exascale era. It also enabled fundamental QCD physics calculations and demonstrated the power of placing small versions of frontier emerging architectures at MIT to attract outstanding students to computational science. MIT also hosted a workshop September 19 2008 to brainstorm ways to promote computational science at top tier research universities and attract gifted students into the field, some of whom would provide the next generation of talent at our defense laboratories.

  16. Cohesin gene mutations in tumorigenesis: from discovery to clinical significance

    PubMed Central

    Solomon, David A.; Kim, Jung-Sik; Waldman, Todd

    2014-01-01

    Cohesin is a multi-protein complex composed of four core subunits (SMC1A, SMC3, RAD21, and either STAG1 or STAG2) that is responsible for the cohesion of sister chromatids following DNA replication until its cleavage during mitosis thereby enabling faithful segregation of sister chromatids into two daughter cells. Recent cancer genomics analyses have discovered a high frequency of somatic mutations in the genes encoding the core cohesin subunits as well as cohesin regulatory factors (e.g. NIPBL, PDS5B, ESPL1) in a select subset of human tumors including glioblastoma, Ewing sarcoma, urothelial carcinoma, acute myeloid leukemia, and acute megakaryoblastic leukemia. Herein we review these studies including discussion of the functional significance of cohesin inactivation in tumorigenesis and potential therapeutic mechanisms to selectively target cancers harboring cohesin mutations. [BMB Reports 2014; 47(6): 299-310] PMID:24856830

  17. The Salinas Airshower Learning And Discovery Project (SALAD)

    NASA Astrophysics Data System (ADS)

    Hernandez, Victor; Niduaza, Rommel; Ruiz Castruita, Daniel; Knox, Adrian; Ramos, Daniel; Fan, Sewan; Fatuzzo, Laura

    2015-04-01

    The SALAD project partners community college and high school STEM students in order to develop and investigate cosmic ray detector telescopes and the physical concepts, using a new light sensor technology based on silicon photomultiplier (SiPM) detectors. Replacing the conventional photomultiplier with the SiPM, offers notable advantages in cost and facilitates more in depth, hands-on learning laboratory activities. The students in the SALAD project design, construct and extensively evaluate the SiPM detector modules. These SiPM modules, can be completed in a short time utilizing cost effective components. We describe our research to implement SiPM as read out light detectors for plastic scintillators in a cosmic ray detector telescope for use in high schools. In particular, we describe our work in the design, evaluation and the assembly of (1) a fast preamplifier, (2) a simple coincidence circuit using fast comparators, to discriminate the SiPM noise signal pulses, and (3) a monovibrator circuit to shape the singles plus the AND logic pulses for subsequent processing. To store the singles and coincidence counts data, an Arduino micro-controller with program sketches can be implemented. Results and findings from our work would be described and presented. US Department of Education Title V Grant Award PO31S090007

  18. The Alveolate Perkinsus marinus: Biological Insights from EST Gene Discovery

    PubMed Central

    2010-01-01

    Background Perkinsus marinus, a protozoan parasite of the eastern oyster Crassostrea virginica, has devastated natural and farmed oyster populations along the Atlantic and Gulf coasts of the United States. It is classified as a member of the Perkinsozoa, a recently established phylum considered close to the ancestor of ciliates, dinoflagellates, and apicomplexans, and a key taxon for understanding unique adaptations (e.g. parasitism) within the Alveolata. Despite intense parasite pressure, no disease-resistant oysters have been identified and no effective therapies have been developed to date. Results To gain insight into the biological basis of the parasite's virulence and pathogenesis mechanisms, and to identify genes encoding potential targets for intervention, we generated >31,000 5' expressed sequence tags (ESTs) derived from four trophozoite libraries generated from two P. marinus strains. Trimming and clustering of the sequence tags yielded 7,863 unique sequences, some of which carry a spliced leader. Similarity searches revealed that 55% of these had hits in protein sequence databases, of which 1,729 had their best hit with proteins from the chromalveolates (E-value ≤ 1e-5). Some sequences are similar to those proven to be targets for effective intervention in other protozoan parasites, and include not only proteases, antioxidant enzymes, and heat shock proteins, but also those associated with relict plastids, such as acetyl-CoA carboxylase and methyl erythrithol phosphate pathway components, and those involved in glycan assembly, protein folding/secretion, and parasite-host interactions. Conclusions Our transcriptome analysis of P. marinus, the first for any member of the Perkinsozoa, contributes new insight into its biology and taxonomic position. It provides a very informative, albeit preliminary, glimpse into the expression of genes encoding functionally relevant proteins as potential targets for chemotherapy, and evidence for the presence of a relict

  19. Marfan Syndrome and Related Disorders: 25 Years of Gene Discovery.

    PubMed

    Verstraeten, Aline; Alaerts, Maaike; Van Laer, Lut; Loeys, Bart

    2016-06-01

    Marfan syndrome (MFS) is a rare, autosomal-dominant, multisystem disorder, presenting with skeletal, ocular, skin, and cardiovascular symptoms. Significant clinical overlap with other systemic connective tissue diseases, including Loeys-Dietz syndrome (LDS), Shprintzen-Goldberg syndrome (SGS), and the MASS phenotype, has been documented. In MFS and LDS, the cardiovascular manifestations account for the major cause of patient morbidity and mortality, rendering them the main target for therapeutic intervention. Over the past decades, gene identification studies confidently linked the aforementioned syndromes, as well as nonsyndromic aneurysmal disease, to genetic defects in proteins related to the transforming growth factor (TGF)-β pathway, greatly expanding our knowledge on the disease mechanisms and providing us with novel therapeutic targets. As a result, the focus of the developing pharmacological treatment strategies is shifting from hemodynamic stress management to TGF-β antagonism. In this review, we discuss the insights that have been gained in the molecular biology of MFS and related disorders over the past 25 years. PMID:26919284

  20. Peanut EST Project: Gene discovery and marker development

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aflatoxin contamination caused by Aspergillus fungi is a great concern in peanut production worldwide. Pre-harvest Aspergillii infection and aflatoxin contamination are usually severe in peanuts that are grown under drought stressed conditions. Genomic research can provide new tools to study plant-m...

  1. Plug-and-Play Benzylisoquinoline Alkaloid Biosynthetic Gene Discovery in Engineered Yeast.

    PubMed

    Morris, J S; Dastmalchi, M; Li, J; Chang, L; Chen, X; Hagel, J M; Facchini, P J

    2016-01-01

    Benzylisoquinoline alkaloid (BIA) metabolism has been the focus of a considerable research effort over the past half-century, primarily because of the pharmaceutical importance of several compounds produced by opium poppy (Papaver somniferum). Advancements in genomics technologies have substantially accelerated the rate of gene discovery over the past decade, such that most biosynthetic enzymes involved in the formation of the major alkaloids of opium poppy have now been isolated and partially characterized. Not unexpectedly, the availability of all perceived biosynthetic genes has facilitated the reconstitution of several BIA pathways in microbial hosts, including yeast (Saccharomyces cerevisiae). Product yields are currently insufficient to consider the commercial production of high-value BIAs, such as morphine. However, the rudimentary success demonstrated by the uncomplicated and routine assembly of a multitude of characterized BIA biosynthetic genes provides a valuable gene discovery tool for the rapid functional identification of the plethora of gene candidates available through increasingly accessible genomic, transcriptomic, and proteomic databases. BIA biosynthetic gene discovery represents a substantial research opportunity largely owing to the wealth of existing enzyme data mostly obtained from a single plant species. Functionally novel enzymes and variants with potential metabolic engineering applications can be considered the primary targets. Selection of candidates from sequence repositories is facilitated by the monophyletic relationship among biosynthetic genes belonging to a wide range of enzyme families, such as the numerous cytochromes P450 and AdoMet-dependent O- and N-methyltransferases that operate in BIA metabolism. We describe methods for the rapid functional screening of uncharacterized gene candidates encoding potential BIA biosynthetic enzymes using yeast strains engineered to perform selected metabolic conversions. As an initial

  2. Literature-based discovery of IFN-gamma and vaccine-mediated gene interaction networks.

    PubMed

    Ozgür, Arzucan; Xiang, Zuoshuang; Radev, Dragomir R; He, Yongqun

    2010-01-01

    Interferon-gamma (IFN-gamma) regulates various immune responses that are often critical for vaccine-induced protection. In order to annotate the IFN-gamma-related gene interaction network from a large amount of IFN-gamma research reported in the literature, a literature-based discovery approach was applied with a combination of natural language processing (NLP) and network centrality analysis. The interaction network of human IFN-gamma (Gene symbol: IFNG) and its vaccine-specific subnetwork were automatically extracted using abstracts from all articles in PubMed. Four network centrality metrics were further calculated to rank the genes in the constructed networks. The resulting generic IFNG network contains 1060 genes and 26313 interactions among these genes. The vaccine-specific subnetwork contains 102 genes and 154 interactions. Fifty six genes such as TNF, NFKB1, IL2, IL6, and MAPK8 were ranked among the top 25 by at least one of the centrality methods in one or both networks. Gene enrichment analysis indicated that these genes were classified in various immune mechanisms such as response to extracellular stimulus, lymphocyte activation, and regulation of apoptosis. Literature evidence was manually curated for the IFN-gamma relatedness of 56 genes and vaccine development relatedness for 52 genes. This study also generated many new hypotheses worth further experimental studies. PMID:20625487

  3. GEM-TREND: a web tool for gene expression data mining toward relevant network discovery

    PubMed Central

    Feng, Chunlai; Araki, Michihiro; Kunimoto, Ryo; Tamon, Akiko; Makiguchi, Hiroki; Niijima, Satoshi; Tsujimoto, Gozoh; Okuno, Yasushi

    2009-01-01

    are dynamically linked to external data repositories. Conclusion GEM-TREND was developed to retrieve gene expression data by comparing query gene-expression pattern with those of GEO gene expression data. It could be a very useful resource for finding similar gene expression profiles and constructing its gene co-expression networks from a publicly available database. GEM-TREND was designed to be user-friendly and is expected to support knowledge discovery. GEM-TREND is freely available at . PMID:19728865

  4. LSST’s Projected Near-Earth Asteroid Discovery Performance

    NASA Astrophysics Data System (ADS)

    Chesley, Steven R.; Vereš, Peter

    2015-11-01

    The Large Synoptic Survey Telescope (LSST) is an ambitious project that has the potential to make major advances in Near-Earth Asteroid search efforts. With construction already underway and major optical elements complete, first light is set for 2020, followed by two years of commissioning. Once regular survey operations begin in 2022, LSST will systematically survey the observable sky over a ten-year period from its site on Cerro Pachon, Chile. With an 8.4 m aperture (6.5 m effective), 9.6 square degree field of view, and a 3.2-Gigapixel camera, LSST represents the most capable asteroid survey instrument ever built.LSST will be able cover over 6000 square degrees of sky per clear night with single visit exposures of 30 s, reaching a faint limit of 24.5 mag in the r band. However the cadence of survey operations is a critical factor for the near-Earth asteroid search performance, and there are multiple science drivers with different cadence objectives that are competing to shape the final survey strategy. We examine the NEA search performance of various LSST search strategies, paying particular attention to the challenges of linking large numbers asteroid detections in the presence of noise.Our approach is to derive lists of synthetic detections for a given instantiation of the LSST survey, based on an assumed model for the populations of solar system objects from the main asteroid belt inwards to the near-Earth population. These detection lists are combined with false detection lists that model both random noise and non-random artifacts resulting from image differencing algorithms. These large detection lists are fed to the Moving Object Processing System (MOPS), which attempts to link the synthetic detections correctly without becoming confused or overwhelmed by the false detections.The LSST baseline survey cadence relies primarily on single night pairs of detections, with roughly 30-60 min separating elements of the pair. The strategy of using pairs is an

  5. Modern plant metabolomics: Advanced natural product gene discoveries, improved technologies, and future prospects

    DOE PAGESBeta

    Sumner, Lloyd W.; Lei, Zhentian; Nikolau, Basil J.; Saito, Kazuki

    2014-10-24

    Plant metabolomics has matured and modern plant metabolomics has accelerated gene discoveries and the elucidation of a variety of plant natural product biosynthetic pathways. This study highlights specific examples of the discovery and characterization of novel genes and enzymes associated with the biosynthesis of natural products such as flavonoids, glucosinolates, terpenoids, and alkaloids. Additional examples of the integration of metabolomics with genome-based functional characterizations of plant natural products that are important to modern pharmaceutical technology are also reviewed. This article also provides a substantial review of recent technical advances in mass spectrometry imaging, nuclear magnetic resonance imaging, integrated LC-MS-SPE-NMR formore » metabolite identifications, and x-ray crystallography of microgram quantities for structural determinations. The review closes with a discussion on the future prospects of metabolomics related to crop species and herbal medicine.« less

  6. Modern plant metabolomics: advanced natural product gene discoveries, improved technologies, and future prospects.

    PubMed

    Sumner, Lloyd W; Lei, Zhentian; Nikolau, Basil J; Saito, Kazuki

    2015-02-01

    Plant metabolomics has matured and modern plant metabolomics has accelerated gene discoveries and the elucidation of a variety of plant natural product biosynthetic pathways. This review covers the approximate period of 2000 to 2014, and highlights specific examples of the discovery and characterization of novel genes and enzymes associated with the biosynthesis of natural products such as flavonoids, glucosinolates, terpenoids, and alkaloids. Additional examples of the integration of metabolomics with genome-based functional characterizations of plant natural products that are important to modern pharmaceutical technology are also reviewed. This article also provides a substantial review of recent technical advances in mass spectrometry imaging, nuclear magnetic resonance imaging, integrated LC-MS-SPE-NMR for metabolite identifications, and X-ray crystallography of microgram quantities for structural determinations. The review closes with a discussion on the future prospects of metabolomics related to crop species and herbal medicine. PMID:25342293

  7. Modern plant metabolomics: Advanced natural product gene discoveries, improved technologies, and future prospects

    SciTech Connect

    Sumner, Lloyd W.; Lei, Zhentian; Nikolau, Basil J.; Saito, Kazuki

    2014-10-24

    Plant metabolomics has matured and modern plant metabolomics has accelerated gene discoveries and the elucidation of a variety of plant natural product biosynthetic pathways. This study highlights specific examples of the discovery and characterization of novel genes and enzymes associated with the biosynthesis of natural products such as flavonoids, glucosinolates, terpenoids, and alkaloids. Additional examples of the integration of metabolomics with genome-based functional characterizations of plant natural products that are important to modern pharmaceutical technology are also reviewed. This article also provides a substantial review of recent technical advances in mass spectrometry imaging, nuclear magnetic resonance imaging, integrated LC-MS-SPE-NMR for metabolite identifications, and x-ray crystallography of microgram quantities for structural determinations. The review closes with a discussion on the future prospects of metabolomics related to crop species and herbal medicine.

  8. MAGIC database and interfaces: an integrated package for gene discovery and expression.

    PubMed

    Cordonnier-Pratt, Marie-Michèle; Liang, Chun; Wang, Haiming; Kolychev, Dmitri S; Sun, Feng; Freeman, Robert; Sullivan, Robert; Pratt, Lee H

    2004-01-01

    The rapidly increasing rate at which biological data is being produced requires a corresponding growth in relational databases and associated tools that can help laboratories contend with that data. With this need in mind, we describe here a Modular Approach to a Genomic, Integrated and Comprehensive (MAGIC) Database. This Oracle 9i database derives from an initial focus in our laboratory on gene discovery via production and analysis of expressed sequence tags (ESTs), and subsequently on gene expression as assessed by both EST clustering and microarrays. The MAGIC Gene Discovery portion of the database focuses on information derived from DNA sequences and on its biological relevance. In addition to MAGIC SEQ-LIMS, which is designed to support activities in the laboratory, it contains several additional subschemas. The latter include MAGIC Admin for database administration, MAGIC Sequence for sequence processing as well as sequence and clone attributes, MAGIC Cluster for the results of EST clustering, MAGIC Polymorphism in support of microsatellite and single-nucleotide-polymorphism discovery, and MAGIC Annotation for electronic annotation by BLAST and BLAT. The MAGIC Microarray portion is a MIAME-compliant database with two components at present. These are MAGIC Array-LIMS, which makes possible remote entry of all information into the database, and MAGIC Array Analysis, which provides data mining and visualization. Because all aspects of interaction with the MAGIC Database are via a web browser, it is ideally suited not only for individual research laboratories but also for core facilities that serve clients at any distance. PMID:18629159

  9. Nested Patch PCR enables highly multiplexed mutation discovery in candidate genes

    PubMed Central

    Varley, Katherine Elena; Mitra, Robi David

    2008-01-01

    Medical resequencing of candidate genes in individual patient samples is becoming increasingly important in the clinic and in clinical research. Medical resequencing requires the amplification and sequencing of many candidate genes in many patient samples. Here we introduce Nested Patch PCR, a novel method for highly multiplexed PCR that is very specific, can sensitively detect SNPs and mutations, and is easy to implement. This is the first method that couples multiplex PCR with sample-specific DNA barcodes and next-generation sequencing to enable highly multiplex mutation discovery in candidate genes for multiple samples in parallel. In our pilot study, we amplified exons from colon cancer and matched normal human genomic DNA. From each sample, we successfully amplified 96% (90 of 94) targeted exons from across the genome, totaling 21.6 kbp of sequence. Ninety percent of all sequencing reads were from targeted exons, demonstrating that Nested Patch PCR is highly specific. We found that the abundance of reads per exon was reproducible across samples. We reliably detected germline SNPs and discovered a colon tumor specific nonsense mutation in APC, a gene causally implicated in colorectal cancer. With Nested Patch PCR, candidate gene mutation discovery across multiple individual patient samples can now utilize the power of second-generation sequencing. PMID:18849522

  10. Leaf Ests from Stevia rebaudiana: a resource for gene discovery in diterpene synthesis.

    PubMed

    Brandle, J E; Richman, A; Swanson, A K; Chapman, B P

    2002-11-01

    Expressed sequence tags (ESTs) are providing a new approach to gene discovery in plant secondary metabolism. Stevia rebaudiana Bert. leaves produce high concentrations of diterpene steviol glycosides and should be a rich source of transcripts involved in diterpene synthesis. In order to create a resource for gene discovery and increase our understanding of steviol glycoside biosynthesis, we sequenced 5,548 ESTs from a S. rebaudiana leaf cDNA library. The EST collection was fully annotated based on database search results. ESTs involved in diterpene synthesis were identified using published sequences as electronic probes, by keyword searches of search results, and by differential representation. A significant portion of the ESTs were specific for standard leaf metabolic pathways; energy and primary metabolism represented 17.6% and 13.1% of total transcripts respectively. Diterpene metabolism in S. rebaudiana represented 1.1% of total transcripts. This study identified candidate genes for 70% of the known steps in the steviol glycoside pathway. One candidate, kaurene oxidase, was the 8th most abundant EST in the collection. Identification of many candidate genes specific to the I -deoxyxylulose 5-phosphate pathway suggests that the primary source of isopentenyl diphosphate, a precursor of geranylgeranyl diphosphate, is via the non-mevalonic acid pathway. The use of ESTs has greatly facilitated the identification of candidate genes and increased our understanding of diterpene metabolism. PMID:12374295

  11. Induction of comprehensible models for gene expression datasets by subgroup discovery methodology.

    PubMed

    Gamberger, Dragan; Lavrac, Nada; Zelezný, Filip; Tolar, Jakub

    2004-08-01

    Finding disease markers (classifiers) from gene expression data by machine learning algorithms is characterized by a high risk of overfitting the data due the abundance of attributes (simultaneously measured gene expression values) and shortage of available examples (observations). To avoid this pitfall and achieve predictor robustness, state-of-the-art approaches construct complex classifiers that combine relatively weak contributions of up to thousands of genes (attributes) to classify a disease. The complexity of such classifiers limits their transparency and consequently the biological insights they can provide. The goal of this study is to apply to this domain the methodology of constructing simple yet robust logic-based classifiers amenable to direct expert interpretation. On two well-known, publicly available gene expression classification problems, the paper shows the feasibility of this approach, employing a recently developed subgroup discovery methodology. Some of the discovered classifiers allow for novel biological interpretations. PMID:15465480

  12. Bioinformatic Screening of Autoimmune Disease Genes and Protein Structure Prediction with FAMS for Drug Discovery

    PubMed Central

    Ishida, Shigeharu; Umeyama, Hideaki; Iwadate, Mitsuo; Y-h, Taguchi

    2014-01-01

    Autoimmune diseases are often intractable because their causes are unknown. Identifying which genes contribute to these diseases may allow us to understand the pathogenesis, but it is difficult to determine which genes contribute to disease. Recently, epigenetic information has been considered to activate/deactivate disease-related genes. Thus, it may also be useful to study epigenetic information that differs between healthy controls and patients with autoimmune disease. Among several types of epigenetic information, promoter methylation is believed to be one of the most important factors. Here, we propose that principal component analysis is useful to identify specific gene promoters that are differently methylated between the normal healthy controls and patients with autoimmune disease. Full Automatic Modeling System (FAMS) was used to predict the three-dimensional structures of selected proteins and successfully inferred relatively confident structures. Several possibilities of the application to the drug discovery based on obtained structures are discussed. PMID:23855671

  13. In silico prioritization based on coexpression can aid epileptic encephalopathy gene discovery

    PubMed Central

    Oliver, Karen L.; Lukic, Vesna; Freytag, Saskia; Scheffer, Ingrid E.; Berkovic, Samuel F.

    2016-01-01

    Objective: To evaluate the performance of an in silico prioritization approach that was applied to 179 epileptic encephalopathy candidate genes in 2013 and to expand the application of this approach to the whole genome based on expression data from the Allen Human Brain Atlas. Methods: PubMed searches determined which of the 179 epileptic encephalopathy candidate genes had been validated. For validated genes, it was noted whether they were 1 of the 19 of 179 candidates prioritized in 2013. The in silico prioritization approach was applied genome-wide; all genes were ranked according to their coexpression strength with a reference set (i.e., 51 established epileptic encephalopathy genes) in both adult and developing human brain expression data sets. Candidate genes ranked in the top 10% for both data sets were cross-referenced with genes previously implicated in the epileptic encephalopathies due to a de novo variant. Results: Five of 6 validated epileptic encephalopathy candidate genes were among the 19 prioritized in 2013 (odds ratio = 54, 95% confidence interval [7,∞], p = 4.5 × 10−5, Fisher exact test); one gene was false negative. A total of 297 genes ranked in the top 10% for both the adult and developing brain data sets based on coexpression with the reference set. Of these, 9 had been previously implicated in the epileptic encephalopathies (FBXO41, PLXNA1, ACOT4, PAK6, GABBR2, YWHAG, NBEA, KNDC1, and SELRC1). Conclusions: We conclude that brain gene coexpression data can be used to assist epileptic encephalopathy gene discovery and propose 9 genes as strong epileptic encephalopathy candidates worthy of further investigation. PMID:27066588

  14. Discovery Systems

    NASA Technical Reports Server (NTRS)

    Pell, Barney

    2003-01-01

    A viewgraph presentation on NASA's Discovery Systems Project is given. The topics of discussion include: 1) NASA's Computing Information and Communications Technology Program; 2) Discovery Systems Program; and 3) Ideas for Information Integration Using the Web.

  15. Discovery of the faithfulness gene: a model of transmission and transformation of scientific information.

    PubMed

    Green, Eva G T; Clémence, Alain

    2008-09-01

    The purpose of this paper is to study the diffusion and transformation of scientific information in everyday discussions. Based on rumour models and social representations theory, the impact of interpersonal communication and pre-existing beliefs on transmission of the content of a scientific discovery was analysed. In three experiments, a communication chain was simulated to investigate how laypeople make sense of a genetic discovery first published in a scientific outlet, then reported in a mainstream newspaper and finally discussed in groups. Study 1 (N=40) demonstrated a transformation of information when the scientific discovery moved along the communication chain. During successive narratives, scientific expert terminology disappeared while scientific information associated with lay terminology persisted. Moreover, the idea of a discovery of a faithfulness gene emerged. Study 2 (N=70) revealed that transmission of the scientific message varied as a function of attitudes towards genetic explanations of behaviour (pro-genetics vs. anti-genetics). Pro-genetics employed more scientific terminology than anti-genetics. Study 3 (N=75) showed that endorsement of genetic explanations was related to descriptive accounts of the scientific information, whereas rejection of genetic explanations was related to evaluative accounts of the information. PMID:17945041

  16. The National Laboratory Gene Library Project

    SciTech Connect

    Deaven, L.L.; Van Dilla, M.A.

    1988-01-01

    The two National Laboratories at Livermore and Los Alamos have played a prominent role in the development and application of flow cytometry and sorting to chromosome classification and purification. Both laboratories began to receive numerous requests for specific human chromosomal types purified by flow sorting for gene library construction, but these requests were difficult to satisfy due to time and personnel constraints. The Department of Energy, through its Office of Health and Environmental Research, has a long-standing interest in the human genome in general and in the mutagenic and carcinogenic effects of energy-related environmental pollutants in particular. Hence, it was decided in 1983 to use the flow construct chromosome-specific gene libraries to be made available to the genetic research community. The National Laboratory Gene Library Project was envisioned as a practical way to deal with requests for sorted chromosomes, and also as a way to promote increased understanding of the human genome and the effects of mutagens and carcinogens on it. The strategy for the project was developed with the help of an advisory committee as well as suggestions and advice from many other geneticists. 4 refs., 2 tabs.

  17. Discovery of dominant and dormant genes from expression data using a novel generalization of SNR for multi-class problems

    PubMed Central

    Tsai, Yu-Shuen; Lin, Chin-Teng; Tseng, George C; Chung, I-Fang; Pal, Nikhil Ranjan

    2008-01-01

    Background The Signal-to-Noise-Ratio (SNR) is often used for identification of biomarkers for two-class problems and no formal and useful generalization of SNR is available for multiclass problems. We propose innovative generalizations of SNR for multiclass cancer discrimination through introduction of two indices, Gene Dominant Index and Gene Dormant Index (GDIs). These two indices lead to the concepts of dominant and dormant genes with biological significance. We use these indices to develop methodologies for discovery of dominant and dormant biomarkers with interesting biological significance. The dominancy and dormancy of the identified biomarkers and their excellent discriminating power are also demonstrated pictorially using the scatterplot of individual gene and 2-D Sammon's projection of the selected set of genes. Using information from the literature we have shown that the GDI based method can identify dominant and dormant genes that play significant roles in cancer biology. These biomarkers are also used to design diagnostic prediction systems. Results and discussion To evaluate the effectiveness of the GDIs, we have used four multiclass cancer data sets (Small Round Blue Cell Tumors, Leukemia, Central Nervous System Tumors, and Lung Cancer). For each data set we demonstrate that the new indices can find biologically meaningful genes that can act as biomarkers. We then use six machine learning tools, Nearest Neighbor Classifier (NNC), Nearest Mean Classifier (NMC), Support Vector Machine (SVM) classifier with linear kernel, and SVM classifier with Gaussian kernel, where both SVMs are used in conjunction with one-vs-all (OVA) and one-vs-one (OVO) strategies. We found GDIs to be very effective in identifying biomarkers with strong class specific signatures. With all six tools and for all data sets we could achieve better or comparable prediction accuracies usually with fewer marker genes than results reported in the literature using the same computational

  18. Discovery of nucleotide polymorphisms in the Musa gene pool by Ecotilling.

    PubMed

    Till, Bradley J; Jankowicz-Cieslak, Joanna; Sági, László; Huynh, Owen A; Utsushi, Hiroe; Swennen, Rony; Terauchi, Ryohei; Mba, Chikelu

    2010-11-01

    Musa (banana and plantain) is an important genus for the global export market and in local markets where it provides staple food for approximately 400 million people. Hybridization and polyploidization of several (sub)species, combined with vegetative propagation and human selection have produced a complex genetic history. We describe the application of the Ecotilling method for the discovery and characterization of nucleotide polymorphisms in diploid and polyploid accessions of Musa. We discovered over 800 novel alleles in 80 accessions. Sequencing and band evaluation shows Ecotilling to be a robust and accurate platform for the discovery of polymorphisms in homologous and homeologous gene targets. In the process of validating the method, we identified two single nucleotide polymorphisms that may be deleterious for the function of a gene putatively important for phototropism. Evaluation of heterozygous polymorphism and haplotype blocks revealed a high level of nucleotide diversity in Musa accessions. We further applied a strategy for the simultaneous discovery of heterozygous and homozygous polymorphisms in diploid accessions to rapidly evaluate nucleotide diversity in accessions of the same genome type. This strategy can be used to develop hypotheses for inheritance patterns of nucleotide polymorphisms within and between genome types. We conclude that Ecotilling is suitable for diversity studies in Musa, that it can be considered for functional genomics studies and as tool in selecting germplasm for traditional and mutation breeding approaches. PMID:20589365

  19. Genes Frequently Coexpressed with Hoxc8 Provide Insight into the Discovery of Target Genes.

    PubMed

    Kalyani, Ruthala; Lee, Ji-Yeon; Min, Hyehyun; Yoon, Heejei; Kim, Myoung Hee

    2016-05-31

    Identifying Hoxc8 target genes is at the crux of understanding the Hoxc8-mediated regulatory networks underlying its roles during development. However, identification of these genes remains difficult due to intrinsic factors of Hoxc8, such as low DNA binding specificity, context-dependent regulation, and unknown cofactors. Therefore, as an alternative, the present study attempted to test whether the roles of Hoxc8 could be inferred by simply analyzing genes frequently coexpressed with Hoxc8, and whether these genes include putative target genes. Using archived gene expression datasets in which Hoxc8 was differentially expressed, we identified a total of 567 genes that were positively coexpressed with Hoxc8 in at least four out of eight datasets. Among these, 23 genes were coexpressed in six datasets. Gene sets associated with extracellular matrix and cell adhesion were most significantly enriched, followed by gene sets for skeletal system development, morphogenesis, cell motility, and transcriptional regulation. In particular, transcriptional regulators, including paralogs of Hoxc8, known Hox co-factors, and transcriptional remodeling factors were enriched. We randomly selected Adam19, Ptpn13, Prkd1, Tgfbi, and Aldh1a3, and validated their coexpression in mouse embryonic tissues and cell lines following TGF-β2 treatment or ectopic Hoxc8 expression. Except for Aldh1a3, all genes showed concordant expression with that of Hoxc8, suggesting that the coexpressed genes might include direct or indirect target genes. Collectively, we suggest that the coexpressed genes provide a resource for constructing Hoxc8-mediated regulatory networks. PMID:27025388

  20. Genes Frequently Coexpressed with Hoxc8 Provide Insight into the Discovery of Target Genes

    PubMed Central

    Kalyani, Ruthala; Lee, Ji-Yeon; Min, Hyehyun; Yoon, Heejei; Kim, Myoung Hee

    2016-01-01

    Identifying Hoxc8 target genes is at the crux of understanding the Hoxc8-mediated regulatory networks underlying its roles during development. However, identification of these genes remains difficult due to intrinsic factors of Hoxc8, such as low DNA binding specificity, context-dependent regulation, and unknown cofactors. Therefore, as an alternative, the present study attempted to test whether the roles of Hoxc8 could be inferred by simply analyzing genes frequently coexpressed with Hoxc8, and whether these genes include putative target genes. Using archived gene expression datasets in which Hoxc8 was differentially expressed, we identified a total of 567 genes that were positively coexpressed with Hoxc8 in at least four out of eight datasets. Among these, 23 genes were coexpressed in six datasets. Gene sets associated with extracellular matrix and cell adhesion were most significantly enriched, followed by gene sets for skeletal system development, morphogenesis, cell motility, and transcriptional regulation. In particular, transcriptional regulators, including paralogs of Hoxc8, known Hox co-factors, and transcriptional remodeling factors were enriched. We randomly selected Adam19, Ptpn13, Prkd1, Tgfbi, and Aldh1a3, and validated their coexpression in mouse embryonic tissues and cell lines following TGF-β2 treatment or ectopic Hoxc8 expression. Except for Aldh1a3, all genes showed concordant expression with that of Hoxc8, suggesting that the coexpressed genes might include direct or indirect target genes. Collectively, we suggest that the coexpressed genes provide a resource for constructing Hoxc8-mediated regulatory networks. PMID:27025388

  1. Discovery of five conserved β-defensin gene clusters using a computational search strategy

    PubMed Central

    Schutte, Brian C.; Mitros, Joseph P.; Bartlett, Jennifer A.; Walters, Jesse D.; Jia, Hong Peng; Welsh, Michael J.; Casavant, Thomas L.; McCray, Paul B.

    2002-01-01

    The innate immune system includes antimicrobial peptides that protect multicellular organisms from a diverse spectrum of microorganisms. β-Defensins comprise one important family of mammalian antimicrobial peptides. The annotation of the human genome fails to reveal the expected diversity, and a recent query of the draft sequence with the blast search engine found only one new β-defensin gene (DEFB3). To define better the β-defensin gene family, we adopted a genomics approach that uses hmmer, a computational search tool based on hidden Markov models, in combination with blast. This strategy identified 28 new human and 43 new mouse β-defensin genes in five syntenic chromosomal regions. Within each syntenic cluster, the gene sequences and organization were similar, suggesting each cluster pair arose from a common ancestor and was retained because of conserved functions. Preliminary analysis indicates that at least 26 of the predicted genes are transcribed. These results demonstrate the value of a genomewide search strategy to identify genes with conserved structural motifs. Discovery of these genes represents a new starting point for exploring the role of β-defensins in innate immunity. PMID:11854508

  2. Evaluation of five ab initio gene prediction programs for the discovery of maize genes.

    PubMed

    Yao, Hong; Guo, Ling; Fu, Yan; Borsuk, Lisa A; Wen, Tsui-Jung; Skibbe, David S; Cui, Xiangqin; Scheffler, Brian E; Cao, Jun; Emrich, Scott J; Ashlock, Daniel A; Schnable, Patrick S

    2005-02-01

    Five ab initio programs (FGENESH, GeneMark.hmm, GENSCAN, GlimmerR and Grail) were evaluated for their accuracy in predicting maize genes. Two of these programs, GeneMark.hmm and GENSCAN had been trained for maize; FGENESH had been trained for monocots (including maize), and the others had been trained for rice or Arabidopsis. Initial evaluations were conducted using eight maize genes (gl8a, pdc2, pdc3, rf2c, rf2d, rf2e1, rth1, and rth3) of which the sequences were not released to the public prior to conducting this evaluation. The significant advantage of this data set for this evaluation is that these genes could not have been included in the training sets of the prediction programs. FGENESH yielded the most accurate and GeneMark.hmm the second most accurate predictions. The five programs were used in conjunction with RT-PCR to identify and establish the structures of two new genes in the a1-sh2 interval of the maize genome. FGENESH, GeneMark.hmm and GENSCAN were tested on a larger data set consisting of maize assembled genomic islands (MAGIs) that had been aligned to ESTs. FGENESH, GeneMark.hmm and GENSCAN correctly predicted gene models in 773, 625, and 371 MAGIs, respectively, out of the 1353 MAGIs that comprise data set 2. PMID:15830133

  3. Gene Discovery of Modular Diterpene Metabolism in Nonmodel Systems1[W][OA

    PubMed Central

    Zerbe, Philipp; Hamberger, Björn; Yuen, Macaire M.S.; Chiang, Angela; Sandhu, Harpreet K.; Madilao, Lina L.; Nguyen, Anh; Hamberger, Britta; Bach, Søren Spanner; Bohlmann, Jörg

    2013-01-01

    Plants produce over 10,000 different diterpenes of specialized (secondary) metabolism, and fewer diterpenes of general (primary) metabolism. Specialized diterpenes may have functions in ecological interactions of plants with other organisms and also benefit humanity as pharmaceuticals, fragrances, resins, and other industrial bioproducts. Examples of high-value diterpenes are taxol and forskolin pharmaceuticals or ambroxide fragrances. Yields and purity of diterpenes obtained from natural sources or by chemical synthesis are often insufficient for large-volume or high-end applications. Improvement of agricultural or biotechnological diterpene production requires knowledge of biosynthetic genes and enzymes. However, specialized diterpene pathways are extremely diverse across the plant kingdom, and most specialized diterpenes are taxonomically restricted to a few plant species, genera, or families. Consequently, there is no single reference system to guide gene discovery and rapid annotation of specialized diterpene pathways. Functional diversification of genes and plasticity of enzyme functions of these pathways further complicate correct annotation. To address this challenge, we used a set of 10 different plant species to develop a general strategy for diterpene gene discovery in nonmodel systems. The approach combines metabolite-guided transcriptome resources, custom diterpene synthase (diTPS) and cytochrome P450 reference gene databases, phylogenies, and, as shown for select diTPSs, single and coupled enzyme assays using microbial and plant expression systems. In the 10 species, we identified 46 new diTPS candidates and over 400 putatively terpenoid-related P450s in a resource of nearly 1 million predicted transcripts of diterpene-accumulating tissues. Phylogenetic patterns of lineage-specific blooms of genes guided functional characterization. PMID:23613273

  4. MUFFINN: cancer gene discovery via network analysis of somatic mutation data.

    PubMed

    Cho, Ara; Shim, Jung Eun; Kim, Eiru; Supek, Fran; Lehner, Ben; Lee, Insuk

    2016-01-01

    A major challenge for distinguishing cancer-causing driver mutations from inconsequential passenger mutations is the long-tail of infrequently mutated genes in cancer genomes. Here, we present and evaluate a method for prioritizing cancer genes accounting not only for mutations in individual genes but also in their neighbors in functional networks, MUFFINN (MUtations For Functional Impact on Network Neighbors). This pathway-centric method shows high sensitivity compared with gene-centric analyses of mutation data. Notably, only a marginal decrease in performance is observed when using 10 % of TCGA patient samples, suggesting the method may potentiate cancer genome projects with small patient populations. PMID:27333808

  5. RNA-Seq Analysis and Gene Discovery of Andrias davidianus Using Illumina Short Read Sequencing

    PubMed Central

    Li, Fenggang; Wang, Lixin; Lan, Qingjing; Yang, Hui; Li, Yang; Liu, Xiaolin; Yang, Zhaoxia

    2015-01-01

    The Chinese giant salamander, Andrias davidianus, is an important species in the course of evolution; however, there is insufficient genomic data in public databases for understanding its immunologic mechanisms. High-throughput transcriptome sequencing is necessary to generate an enormous number of transcript sequences from A. davidianus for gene discovery. In this study, we generated more than 40 million reads from samples of spleen and skin tissue using the Illumina paired-end sequencing technology. De novo assembly yielded 87,297 transcripts with a mean length of 734 base pairs (bp). Based on the sequence similarities, searching with known proteins, 38,916 genes were identified. Gene enrichment analysis determined that 981 transcripts were assigned to the immune system. Tissue-specific expression analysis indicated that 443 of transcripts were specifically expressed in the spleen and skin. Among these transcripts, 147 transcripts were found to be involved in immune responses and inflammatory reactions, such as fucolectin, β-defensins and lymphotoxin beta. Eight tissue-specific genes were selected for validation using real time reverse transcription quantitative PCR (qRT-PCR). The results showed that these genes were significantly more expressed in spleen and skin than in other tissues, suggesting that these genes have vital roles in the immune response. This work provides a comprehensive genomic sequence resource for A. davidianus and lays the foundation for future research on the immunologic and disease resistance mechanisms of A. davidianus and other amphibians. PMID:25874626

  6. High-throughput platform for the discovery of elicitors of silent bacterial gene clusters

    PubMed Central

    Seyedsayamdost, Mohammad R.

    2014-01-01

    Over the past decade, bacterial genome sequences have revealed an immense reservoir of biosynthetic gene clusters, sets of contiguous genes that have the potential to produce drugs or drug-like molecules. However, the majority of these gene clusters appear to be inactive for unknown reasons prompting terms such as “cryptic” or “silent” to describe them. Because natural products have been a major source of therapeutic molecules, methods that rationally activate these silent clusters would have a profound impact on drug discovery. Herein, a new strategy is outlined for awakening silent gene clusters using small molecule elicitors. In this method, a genetic reporter construct affords a facile read-out for activation of the silent cluster of interest, while high-throughput screening of small molecule libraries provides potential inducers. This approach was applied to two cryptic gene clusters in the pathogenic model Burkholderia thailandensis. The results not only demonstrate a prominent activation of these two clusters, but also reveal that the majority of elicitors are themselves antibiotics, most in common clinical use. Antibiotics, which kill B. thailandensis at high concentrations, act as inducers of secondary metabolism at low concentrations. One of these antibiotics, trimethoprim, served as a global activator of secondary metabolism by inducing at least five biosynthetic pathways. Further application of this strategy promises to uncover the regulatory networks that activate silent gene clusters while at the same time providing access to the vast array of cryptic molecules found in bacteria. PMID:24808135

  7. Inherited retinal diseases in dogs: advances in gene/mutation discovery

    PubMed Central

    Miyadera, Keiko

    2015-01-01

    1. Inherited retinal diseases (RDs) are vision-threatening conditions affecting humans as well as many domestic animals. Through many years of clinical studies of the domestic dog population, a wide array of RDs has been phenotypically characterized. Extensive effort to map the causative gene and to identify the underlying mutation followed. Through candidate gene, linkage analysis, genome-wide association studies, and more recently, by means of next-generation sequencing, as many as 31 mutations in 24 genes have been identified as the underlying cause for canine RDs. Most of these genes have been associated with human RDs providing opportunities to study their roles in the disease pathogenesis and in normal visual function. The canine model has also contributed in developing new treatments such as gene therapy which has been clinically applied to human patients. Meanwhile, with increasing knowledge of the molecular architecture of RDs in different subpopulations of dogs, the conventional understanding of RDs as a simple monogenic disease is beginning to change. Emerging evidence of modifiers that alters the disease outcome is complicating the interpretation of DNA tests. In this review, advances in the gene/mutation discovery approaches and the emerging genetic complexity of canine RDs are discussed. PMID:26120276

  8. Discovery of a novel imprinted gene by transcriptional analysis of parthenogenetic embryonic stem cells

    PubMed Central

    Sritanaudomchai, Hathaitip; Ma, Hong; Clepper, Lisa; Gokhale, Sumita; Bogan, Randy; Hennebold, Jon; Wolf, Don; Mitalipov, Shoukhrat

    2010-01-01

    BACKGROUND Parthenogenetic embryonic stem cells (PESCs) may have future utilities in cell replacement therapies since they are closely related to the female from which the activated oocyte was obtained. Furthermore, the avoidance of parthenogenetic development in mammals provides the most compelling rationale for the evolution of genomic imprinting, and the biological process of parthenogenesis raises complex issues regarding differential gene expression. METHODS AND RESULTS We describe here homozygous rhesus monkey PESCs derived from a spontaneously duplicated, haploid oocyte genome. Since the effect of homozygosity on PESCs pluripotency and differentiation potential is unknown, we assessed the similarities and differences in pluripotency markers and developmental potential by in vitro and in vivo differentiation of homozygous and heterozygous PESCs. To understand the differences in gene expression regulation between parthenogenetic and biparental embryonic stem cells (ESCs), we conducted microarray analysis of genome-wide mRNA profiles of primate PESCs and ESCs derived from fertilized embryos using the Affymetrix Rhesus Macaque Genome array. Several known paternally imprinted genes were in the highly down-regulated group in PESCs compared with ESCs. Furthermore, allele-specific expression analysis of other genes whose expression is also down-regulated in PESCs, led to the identification of one novel imprinted gene, inositol polyphosphate-5-phosphatase F (INPP5F), which was exclusively expressed from a paternal allele. CONCLUSION Our findings suggest that PESCs could be used as a model for studying genomic imprinting, and in the discovery of novel imprinted genes. PMID:20522441

  9. A joint modeling approach for uncovering associations between gene expression, bioactivity and chemical structure in early drug discovery to guide lead selection and genomic biomarker development.

    PubMed

    Perualila-Tan, Nolen; Kasim, Adetayo; Talloen, Willem; Verbist, Bie; Göhlmann, Hinrich W H; Shkedy, Ziv

    2016-08-01

    The modern drug discovery process involves multiple sources of high-dimensional data. This imposes the challenge of data integration. A typical example is the integration of chemical structure (fingerprint features), phenotypic bioactivity (bioassay read-outs) data for targets of interest, and transcriptomic (gene expression) data in early drug discovery to better understand the chemical and biological mechanisms of candidate drugs, and to facilitate early detection of safety issues prior to later and expensive phases of drug development cycles. In this paper, we discuss a joint model for the transcriptomic and the phenotypic variables conditioned on the chemical structure. This modeling approach can be used to uncover, for a given set of compounds, the association between gene expression and biological activity taking into account the influence of the chemical structure of the compound on both variables. The model allows to detect genes that are associated with the bioactivity data facilitating the identification of potential genomic biomarkers for compounds efficacy. In addition, the effect of every structural feature on both genes and pIC50 and their associations can be simultaneously investigated. Two oncology projects are used to illustrate the applicability and usefulness of the joint model to integrate multi-source high-dimensional information to aid drug discovery. PMID:27269248

  10. The Bering Sea Project Archive: a Prototype for Improved Discovery and Access

    NASA Astrophysics Data System (ADS)

    Stott, D.; Mayernik, M. S.; Daniels, M. D.; Moore, J. A.; Williams, S. F.; Allison, J.

    2015-12-01

    The Bering Sea Project was a research program from 2007 through 2012 that sought to understand the impacts of climate change and dynamic sea ice cover on the eastern Bering Sea ecosystem. More than 100 scientists engaged in field data collection, original research, and ecosystem modeling to link climate, physical oceanography, plankton, fishes, seabirds, marine mammals, humans, traditional knowledge and economic outcomes. Over the six-year period of the program hundreds of multidisciplinary datasets coming from a variety of instrumentation and measurement platforms within thirty-one categories of research were processed and curated by the National Center for Atmospheric Research (NCAR) Earth Observing Laboratory (EOL). For the investigator proposing a field project, the researcher performing synthesis, or the modeler seeking data for verification, the easy discovery and access to the most relevant data is of prime importance. The heterogeneous products of oceanographic field programs such as the Bering Sea Project challenge the ability of researchers to identify which data sets, people, or tools might be relevant to their research, and to understand how certain data, instruments, or methods were used to produce particular results.EOL, as a partner in the NSF funded EarthCollab project, is using linked open data to permit the direct interlinking of information and data across platforms and projects. We are leveraging an existing open-source semantic web application, VIVO, to address connectivity gaps across distributed networks of researchers and resources and identify relevant content, independent of location. We will present our approach in connecting ontologies and integrating them within the VIVO system, using the Bering Sea Project datasets as a case study, and will provide insight into how the geosciences can leverage linked data to produce more coherent methods of information and data discovery across large multi-disciplinary projects.

  11. Exploiting aberrant mRNA expression in autism for gene discovery and diagnosis.

    PubMed

    Guan, Jinting; Yang, Ence; Yang, Jizhou; Zeng, Yong; Ji, Guoli; Cai, James J

    2016-07-01

    Autism spectrum disorder (ASD) is characterized by substantial phenotypic and genetic heterogeneity, which greatly complicates the identification of genetic factors that contribute to the disease. Study designs have mainly focused on group differences between cases and controls. The problem is that, by their nature, group difference-based methods (e.g., differential expression analysis) blur or collapse the heterogeneity within groups. By ignoring genes with variable within-group expression, an important axis of genetic heterogeneity contributing to expression variability among affected individuals has been overlooked. To this end, we develop a new gene expression analysis method-aberrant gene expression analysis, based on the multivariate distance commonly used for outlier detection. Our method detects the discrepancies in gene expression dispersion between groups and identifies genes with significantly different expression variability. Using this new method, we re-visited RNA sequencing data generated from post-mortem brain tissues of 47 ASD and 57 control samples. We identified 54 functional gene sets whose expression dispersion in ASD samples is more pronounced than that in controls, as well as 76 co-expression modules present in controls but absent in ASD samples due to ASD-specific aberrant gene expression. We also exploited aberrantly expressed genes as biomarkers for ASD diagnosis. With a whole blood expression data set, we identified three aberrantly expressed gene sets whose expression levels serve as discriminating variables achieving >70 % classification accuracy. In summary, our method represents a novel discovery and diagnostic strategy for ASD. Our findings may help open an expression variability-centered research avenue for other genetically heterogeneous disorders. PMID:27131873

  12. Genomics-driven discovery of the pneumocandin biosynthetic gene cluster in the fungus Glarea lozoyensis

    PubMed Central

    2013-01-01

    Background The antifungal therapy caspofungin is a semi-synthetic derivative of pneumocandin B0, a lipohexapeptide produced by the fungus Glarea lozoyensis, and was the first member of the echinocandin class approved for human therapy. The nonribosomal peptide synthetase (NRPS)-polyketide synthases (PKS) gene cluster responsible for pneumocandin biosynthesis from G. lozoyensis has not been elucidated to date. In this study, we report the elucidation of the pneumocandin biosynthetic gene cluster by whole genome sequencing of the G. lozoyensis wild-type strain ATCC 20868. Results The pneumocandin biosynthetic gene cluster contains a NRPS (GLNRPS4) and a PKS (GLPKS4) arranged in tandem, two cytochrome P450 monooxygenases, seven other modifying enzymes, and genes for L-homotyrosine biosynthesis, a component of the peptide core. Thus, the pneumocandin biosynthetic gene cluster is significantly more autonomous and organized than that of the recently characterized echinocandin B gene cluster. Disruption mutants of GLNRPS4 and GLPKS4 no longer produced the pneumocandins (A0 and B0), and the Δglnrps4 and Δglpks4 mutants lost antifungal activity against the human pathogenic fungus Candida albicans. In addition to pneumocandins, the G. lozoyensis genome encodes a rich repertoire of natural product-encoding genes including 24 PKSs, six NRPSs, five PKS-NRPS hybrids, two dimethylallyl tryptophan synthases, and 14 terpene synthases. Conclusions Characterization of the gene cluster provides a blueprint for engineering new pneumocandin derivatives with improved pharmacological properties. Whole genome estimation of the secondary metabolite-encoding genes from G. lozoyensis provides yet another example of the huge potential for drug discovery from natural products from the fungal kingdom. PMID:23688303

  13. Evaluation of Gene Association Methods for Coexpression Network Construction and Biological Knowledge Discovery

    PubMed Central

    Kumari, Sapna; Nie, Jeff; Chen, Huann-Sheng; Ma, Hao; Stewart, Ron; Li, Xiang; Lu, Meng-Zhu; Taylor, William M.; Wei, Hairong

    2012-01-01

    Background Constructing coexpression networks and performing network analysis using large-scale gene expression data sets is an effective way to uncover new biological knowledge; however, the methods used for gene association in constructing these coexpression networks have not been thoroughly evaluated. Since different methods lead to structurally different coexpression networks and provide different information, selecting the optimal gene association method is critical. Methods and Results In this study, we compared eight gene association methods – Spearman rank correlation, Weighted Rank Correlation, Kendall, Hoeffding's D measure, Theil-Sen, Rank Theil-Sen, Distance Covariance, and Pearson – and focused on their true knowledge discovery rates in associating pathway genes and construction coordination networks of regulatory genes. We also examined the behaviors of different methods to microarray data with different properties, and whether the biological processes affect the efficiency of different methods. Conclusions We found that the Spearman, Hoeffding and Kendall methods are effective in identifying coexpressed pathway genes, whereas the Theil-sen, Rank Theil-Sen, Spearman, and Weighted Rank methods perform well in identifying coordinated transcription factors that control the same biological processes and traits. Surprisingly, the widely used Pearson method is generally less efficient, and so is the Distance Covariance method that can find gene pairs of multiple relationships. Some analyses we did clearly show Pearson and Distance Covariance methods have distinct behaviors as compared to all other six methods. The efficiencies of different methods vary with the data properties to some degree and are largely contingent upon the biological processes, which necessitates the pre-analysis to identify the best performing method for gene association and coexpression network construction. PMID:23226279

  14. Arctic Research Mapping Application (ARMAP) Showcases discovery level metadata for US Funded Research Projects

    NASA Astrophysics Data System (ADS)

    Gaylord, A. G.; Kassin, A.; Cody, R. P.; Manley, W. F.; Dover, M.; Score, R.; Garcia-Lavigne3, D.; Tweedie, C. E.

    2013-12-01

    The Arctic Research Mapping Application (ARMAP) is a suite of online applications and data services that support Arctic science by providing project tracking information (who's doing what, when and where in the region) for United States Government funded projects. Development of an interagency standard for tracking discovery level metadata for projects has been achieved through collaboration with the Alaska Data Integration work group. The US National Science Foundation plus 17 other agencies and organizations have adopted the standard with several entities successfully implementing XML based REST webservices. With ARMAP's web mapping applications and data services (http://armap.org), users can search for research projects by location, year, funding program, keyword, investigator, and discipline, among other variables. Key information about each project is displayed within the application with links to web pages that provide additional information. The ARMAP 2D mapping application has been significantly enhanced to include support for multiple projections, improved base maps, additional reference data layers, and optimization for better performance. In 2013, ship tracks for US National Science Foundation supported vessel based surveys and health care facilities have been included in ARMAP. The additional functionality of this tool will increase awareness of projects funded by numerous entities in the Arctic, enhance coordination for logistics support, help identify geographic gaps in research efforts and potentially foster more collaboration amongst researchers working in the region. Additionally, ARMAP can be used to demonstrate the effects of the International Polar Year (IPY) on funding of different research disciplines by the U.S. Government.

  15. Arctic Research Mapping Application (ARMAP) Showcases discovery level metadata for US Funded Research Projects

    NASA Astrophysics Data System (ADS)

    Score, R.; Gaylord, A. G.; Kassin, A.; Cody, R. P.; Copenhaver, W.; Manley, W. F.; Dover, M.; Tweedie, C. E.

    2014-12-01

    The Arctic Research Mapping Application (ARMAP) is a suite of online applications and data services that support Arctic science by providing project tracking information (who's doing what, when and where in the region) for United States Government funded projects. Development of an interagency standard for tracking discovery level metadata for projects has been achieved through collaboration with the Alaska Data Integration work group. The US National Science Foundation plus 17 other agencies and organizations have adopted the standard with several entities successfully implementing XML based REST webservices. With ARMAP's web mapping applications and data services (http://armap.org), users can search for research projects by location, year, funding program, keyword, investigator, and discipline, among other variables. Key information about each project is displayed within the application with links to web pages that provide additional information. The ARMAP 2D mapping application has been significantly enhanced to include support for multiple projections, improved base maps, additional reference data layers, and optimization for better performance. In 2014, ship tracks for US National Science Foundation supported vessel based surveys have been expanded. These enhancements have been made to increase awareness of projects funded by numerous entities in the Arctic, enhance coordination for logistics support, help identify geographic gaps in research efforts and potentially foster more collaboration amongst researchers working in the region. Additionally, ARMAP can be used to demonstrate past, present, and future research efforts supported by the U.S. Government.

  16. Designing and Developing a NASA Research Projects Knowledge Base and Implementing Knowledge Management and Discovery Techniques

    NASA Astrophysics Data System (ADS)

    Dabiru, L.; O'Hara, C. G.; Shaw, D.; Katragadda, S.; Anderson, D.; Kim, S.; Shrestha, B.; Aanstoos, J.; Frisbie, T.; Policelli, F.; Keblawi, N.

    2006-12-01

    The Research Project Knowledge Base (RPKB) is currently being designed and will be implemented in a manner that is fully compatible and interoperable with enterprise architecture tools developed to support NASA's Applied Sciences Program. Through user needs assessment, collaboration with Stennis Space Center, Goddard Space Flight Center, and NASA's DEVELOP Staff personnel insight to information needs for the RPKB were gathered from across NASA scientific communities of practice. To enable efficient, consistent, standard, structured, and managed data entry and research results compilation a prototype RPKB has been designed and fully integrated with the existing NASA Earth Science Systems Components database. The RPKB will compile research project and keyword information of relevance to the six major science focus areas, 12 national applications, and the Global Change Master Directory (GCMD). The RPKB will include information about projects awarded from NASA research solicitations, project investigator information, research publications, NASA data products employed, and model or decision support tools used or developed as well as new data product information. The RPKB will be developed in a multi-tier architecture that will include a SQL Server relational database backend, middleware, and front end client interfaces for data entry. The purpose of this project is to intelligently harvest the results of research sponsored by the NASA Applied Sciences Program and related research program results. We present various approaches for a wide spectrum of knowledge discovery of research results, publications, projects, etc. from the NASA Systems Components database and global information systems and show how this is implemented in SQL Server database. The application of knowledge discovery is useful for intelligent query answering and multiple-layered database construction. Using advanced EA tools such as the Earth Science Architecture Tool (ESAT), RPKB will enable NASA and

  17. Discovery of midgut genes for the RNA interference control of corn rootworm

    PubMed Central

    Hu, Xu; Richtman, Nina M.; Zhao, Jian-Zhou; Duncan, Keith E.; Niu, Xiping; Procyk, Lisa A.; Oneal, Meghan A.; Kernodle, Bliss M.; Steimel, Joseph P.; Crane, Virginia C.; Sandahl, Gary; Ritland, Julie L.; Howard, Richard J.; Presnail, James K.; Lu, Albert L.; Wu, Gusui

    2016-01-01

    RNA interference (RNAi) is a promising new technology for corn rootworm control. This paper presents the discovery of new gene targets - dvssj1 and dvssj2, in western corn rootworm (WCR). Dvssj1 and dvssj2 are orthologs of the Drosophila genes snakeskin (ssk) and mesh, respectively. These genes encode membrane proteins associated with smooth septate junctions (SSJ) which are required for intestinal barrier function. Based on bioinformatics analysis, dvssj1 appears to be an arthropod-specific gene. Diet based insect feeding assays using double-stranded RNA (dsRNA) targeting dvssj1 and dvssj2 demonstrate targeted mRNA suppression, larval growth inhibition, and mortality. In RNAi treated WCR, injury to the midgut was manifested by “blebbing” of the midgut epithelium into the gut lumen. Ultrastructural examination of midgut epithelial cells revealed apoptosis and regenerative activities. Transgenic plants expressing dsRNA targeting dvssj1 show insecticidal activity and significant plant protection from WCR damage. The data indicate that dvssj1 and dvssj2 are effective gene targets for the control of WCR using RNAi technology, by apparent suppression of production of their respective smooth septate junction membrane proteins located within the intestinal lining, leading to growth inhibition and mortality. PMID:27464714

  18. PiggyBac transposon mutagenesis: a tool for cancer gene discovery in mice.

    PubMed

    Rad, Roland; Rad, Lena; Wang, Wei; Cadinanos, Juan; Vassiliou, George; Rice, Stephen; Campos, Lia S; Yusa, Kosuke; Banerjee, Ruby; Li, Meng Amy; de la Rosa, Jorge; Strong, Alexander; Lu, Dong; Ellis, Peter; Conte, Nathalie; Yang, Fang Tang; Liu, Pentao; Bradley, Allan

    2010-11-19

    Transposons are mobile DNA segments that can disrupt gene function by inserting in or near genes. Here, we show that insertional mutagenesis by the PiggyBac transposon can be used for cancer gene discovery in mice. PiggyBac transposition in genetically engineered transposon-transposase mice induced cancers whose type (hematopoietic versus solid) and latency were dependent on the regulatory elements introduced into transposons. Analysis of 63 hematopoietic tumors revealed that PiggyBac is capable of genome-wide mutagenesis. The PiggyBac screen uncovered many cancer genes not identified in previous retroviral or Sleeping Beauty transposon screens, including Spic, which encodes a PU.1-related transcription factor, and Hdac7, a histone deacetylase gene. PiggyBac and Sleeping Beauty have different integration preferences. To maximize the utility of the tool, we engineered 21 mouse lines to be compatible with both transposon systems in constitutive, tissue- or temporal-specific mutagenesis. Mice with different transposon types, copy numbers, and chromosomal locations support wide applicability. PMID:20947725

  19. Discovery of midgut genes for the RNA interference control of corn rootworm.

    PubMed

    Hu, Xu; Richtman, Nina M; Zhao, Jian-Zhou; Duncan, Keith E; Niu, Xiping; Procyk, Lisa A; Oneal, Meghan A; Kernodle, Bliss M; Steimel, Joseph P; Crane, Virginia C; Sandahl, Gary; Ritland, Julie L; Howard, Richard J; Presnail, James K; Lu, Albert L; Wu, Gusui

    2016-01-01

    RNA interference (RNAi) is a promising new technology for corn rootworm control. This paper presents the discovery of new gene targets - dvssj1 and dvssj2, in western corn rootworm (WCR). Dvssj1 and dvssj2 are orthologs of the Drosophila genes snakeskin (ssk) and mesh, respectively. These genes encode membrane proteins associated with smooth septate junctions (SSJ) which are required for intestinal barrier function. Based on bioinformatics analysis, dvssj1 appears to be an arthropod-specific gene. Diet based insect feeding assays using double-stranded RNA (dsRNA) targeting dvssj1 and dvssj2 demonstrate targeted mRNA suppression, larval growth inhibition, and mortality. In RNAi treated WCR, injury to the midgut was manifested by "blebbing" of the midgut epithelium into the gut lumen. Ultrastructural examination of midgut epithelial cells revealed apoptosis and regenerative activities. Transgenic plants expressing dsRNA targeting dvssj1 show insecticidal activity and significant plant protection from WCR damage. The data indicate that dvssj1 and dvssj2 are effective gene targets for the control of WCR using RNAi technology, by apparent suppression of production of their respective smooth septate junction membrane proteins located within the intestinal lining, leading to growth inhibition and mortality. PMID:27464714

  20. Large-scale gene discovery in human airway epithelia reveals novel transcripts.

    PubMed

    Scheetz, Todd E; Zabner, Joseph; Welsh, Michael J; Coco, Justin; Eyestone, Mari de Fatima; Bonaldo, Maria; Kucaba, Tamara; Casavant, Thomas L; Soares, M Bento; McCray, Paul B

    2004-03-12

    The airway epithelium represents an important barrier between the host and the environment. It is a first site of contact with pathogens, particulates, and other stimuli, and has evolved the means to dynamically respond to these challenges. In an effort to define the transcript profile of airway epithelia, we created and sequenced cDNA libraries from cystic fibrosis (CF) and non-CF epithelia and from human lung tissue. Sequencing of these libraries produced approximately 53,000 3'-expressed sequence tags (3'-ESTs). From these, a nonredundant UniGene set of more than 19,000 sequences was generated. Despite the relatively small contribution of airway epithelia to the total mass of the lung, focused gene discovery in this tissue yielded novel results. The ESTs included several thousand transcripts (6,416) not previously identified from cDNA sequences as expressed in the lung. Among the abundant transcripts were several genes involved in host defense. Most importantly, the set also included 879 3'-ESTs that appear to be novel sequences not previously represented in the National Center for Biotechnology Information UniGene collection. This UniGene set should be useful for studies of pulmonary diseases involving the airway epithelium including cystic fibrosis, respiratory infections and asthma. It also provides a reagent for large-scale expression profiling. PMID:14701920

  1. SPARCoC: A New Framework for Molecular Pattern Discovery and Cancer Gene Identification

    PubMed Central

    Ma, Shiqian; Johnson, Daniel; Ashby, Cody; Xiong, Donghai; Cramer, Carole L.; Moore, Jason H.; Zhang, Shuzhong; Huang, Xiuzhen

    2015-01-01

    It is challenging to cluster cancer patients of a certain histopathological type into molecular subtypes of clinical importance and identify gene signatures directly relevant to the subtypes. Current clustering approaches have inherent limitations, which prevent them from gauging the subtle heterogeneity of the molecular subtypes. In this paper we present a new framework: SPARCoC (Sparse-CoClust), which is based on a novel Common-background and Sparse-foreground Decomposition (CSD) model and the Maximum Block Improvement (MBI) co-clustering technique. SPARCoC has clear advantages compared with widely-used alternative approaches: hierarchical clustering (Hclust) and nonnegative matrix factorization (NMF). We apply SPARCoC to the study of lung adenocarcinoma (ADCA), an extremely heterogeneous histological type, and a significant challenge for molecular subtyping. For testing and verification, we use high quality gene expression profiling data of lung ADCA patients, and identify prognostic gene signatures which could cluster patients into subgroups that are significantly different in their overall survival (with p-values < 0.05). Our results are only based on gene expression profiling data analysis, without incorporating any other feature selection or clinical information; we are able to replicate our findings with completely independent datasets. SPARCoC is broadly applicable to large-scale genomic data to empower pattern discovery and cancer gene identification. PMID:25768286

  2. Long Serial Analysis of Gene Expression for Gene Discovery and Transcriptome Profiling in the Widespread Marine Coccolithophore Emiliania huxleyi†

    PubMed Central

    Dyhrman, Sonya T.; Haley, Sheean T.; Birkeland, Shanda R.; Wurch, Louie L.; Cipriano, Michael J.; McArthur, Andrew G.

    2006-01-01

    The abundant and widespread coccolithophore Emiliania huxleyi plays an important role in mediating CO2 exchange between the ocean and the atmosphere through its impact on marine photosynthesis and calcification. Here, we use long serial analysis of gene expression (SAGE) to identify E. huxleyi genes responsive to nitrogen (N) or phosphorus (P) starvation. Long SAGE is an elegant approach for examining quantitative and comprehensive gene expression patterns without a priori knowledge of gene sequences via the detection of 21-bp nucleotide sequence tags. E. huxleyi appears to have a robust transcriptional-level response to macronutrient deficiency, with 42 tags uniquely present or up-regulated twofold or greater in the N-starved library and 128 tags uniquely present or up-regulated twofold or greater in the P-starved library. The expression patterns of several tags were validated with reverse transcriptase PCR. Roughly 48% of these differentially expressed tags could be mapped to publicly available genomic or expressed sequence tag (EST) sequence data. For example, in the P-starved library a number of the tags mapped to genes with a role in P scavenging, including a putative phosphate-repressible permease and a putative polyphosphate synthetase. In short, the long SAGE analyses have (i) identified many new differentially regulated gene sequences, (ii) assigned regulation data to EST sequences with no database homology and unknown function, and (iii) highlighted previously uncharacterized aspects of E. huxleyi N and P physiology. To this end, our long SAGE libraries provide a new public resource for gene discovery and transcriptional analysis in this biogeochemically important marine organism. PMID:16391051

  3. Genes, genomes and identity. Projections on matter.

    PubMed

    Hauskeller, Christine

    2004-12-01

    This paper aims to show that references to genes and genomes are counterproductive in legal and political understandings of what it is to be human and a unique individual. To support this claim, I will give a brief overview of the many incompatible meanings the term 'identity' has gathered in reference to genes or genome in the contexts of biology and family ancestry, personal identity, species identity. One finds various and incompatible understandings of these expressions. While genetics is usually considered to deliver definitive knowledge about history and the future, genomics seems to work with more complicated relations between DNA, inheritance and phenotype. In genomics, 'identity' is no longer about identification and status markers but about individualization. Regulatory and legal documents project from traits to genomes, implying that individuality is at least represented, if not created, in a unique genome. Boundaries between humans and other animals, between different 'kinds' of humans, and between all individual humans are re-established via reference to the chemical matter of DNA. My analysis will show how this trend is a reactionary response to modern understandings of identities as social products and that it ignores new biomedical understandings of human bodies. PMID:15828152

  4. Exploiting pre-rRNA processing in Diamond Blackfan anemia gene discovery and diagnosis.

    PubMed

    Farrar, Jason E; Quarello, Paola; Fisher, Ross; O'Brien, Kelly A; Aspesi, Anna; Parrella, Sara; Henson, Adrianna L; Seidel, Nancy E; Atsidaftos, Eva; Prakash, Supraja; Bari, Shahla; Garelli, Emanuela; Arceci, Robert J; Dianzani, Irma; Ramenghi, Ugo; Vlachos, Adrianna; Lipton, Jeffrey M; Bodine, David M; Ellis, Steven R

    2014-10-01

    Diamond Blackfan anemia (DBA), a syndrome primarily characterized by anemia and physical abnormalities, is one among a group of related inherited bone marrow failure syndromes (IBMFS) which share overlapping clinical features. Heterozygous mutations or single-copy deletions have been identified in 12 ribosomal protein genes in approximately 60% of DBA cases, with the genetic etiology unexplained in most remaining patients. Unlike many IBMFS, for which functional screening assays complement clinical and genetic findings, suspected DBA in the absence of typical alterations of the known genes must frequently be diagnosed after exclusion of other IBMFS. We report here a novel deletion in a child that presented such a diagnostic challenge and prompted development of a novel functional assay that can assist in the diagnosis of a significant fraction of patients with DBA. The ribosomal proteins affected in DBA are required for pre-rRNA processing, a process which can be interrogated to monitor steps in the maturation of 40S and 60S ribosomal subunits. In contrast to prior methods used to assess pre-rRNA processing, the assay reported here, based on capillary electrophoresis measurement of the maturation of rRNA in pre-60S ribosomal subunits, would be readily amenable to use in diagnostic laboratories. In addition to utility as a diagnostic tool, we applied this technique to gene discovery in DBA, resulting in the identification of RPL31 as a novel DBA gene. PMID:25042156

  5. An Update on Soybean Functional Genomics and Microarray Resources for Gene Discovery and Crop Improvement

    Technology Transfer Automated Retrieval System (TEKTRAN)

    DNA microarrays are powerful tools to analyze the expression patterns of thousands of genes simultaneously. We review recent soybean genomics projects that have produced public-sector resources for this important legume crop. As part of the NSF-sponsored “Soybean Functional Genomics Program”, we hav...

  6. Using Osteoclast Differentiation as a Model for Gene Discovery in an Undergraduate Cell Biology Laboratory

    ERIC Educational Resources Information Center

    Birnbaum, Mark J.; Picco, Jenna; Clements, Meghan; Witwicka, Hanna; Yang, Meiheng; Hoey, Margaret T.; Odgren, Paul R.

    2010-01-01

    A key goal of molecular/cell biology/biotechnology is to identify essential genes in virtually every physiological process to uncover basic mechanisms of cell function and to establish potential targets of drug therapy combating human disease. This article describes a semester-long, project-oriented molecular/cellular/biotechnology laboratory…

  7. SNP discovery in candidate adaptive genes using exon capture in a free-ranging alpine ungulate.

    PubMed

    Roffler, Gretchen H; Amish, Stephen J; Smith, Seth; Cosart, Ted; Kardos, Marty; Schwartz, Michael K; Luikart, Gordon

    2016-09-01

    Identification of genes underlying genomic signatures of natural selection is key to understanding adaptation to local conditions. We used targeted resequencing to identify SNP markers in 5321 candidate adaptive genes associated with known immunological, metabolic and growth functions in ovids and other ungulates. We selectively targeted 8161 exons in protein-coding and nearby 5' and 3' untranslated regions of chosen candidate genes. Targeted sequences were taken from bighorn sheep (Ovis canadensis) exon capture data and directly from the domestic sheep genome (Ovis aries v. 3; oviAri3). The bighorn sheep sequences used in the Dall's sheep (Ovis dalli dalli) exon capture aligned to 2350 genes on the oviAri3 genome with an average of 2 exons each. We developed a microfluidic qPCR-based SNP chip to genotype 476 Dall's sheep from locations across their range and test for patterns of selection. Using multiple corroborating approaches (lositan and bayescan), we detected 28 SNP loci potentially under selection. We additionally identified candidate loci significantly associated with latitude, longitude, precipitation and temperature, suggesting local environmental adaptation. The three methods demonstrated consistent support for natural selection on nine genes with immune and disease-regulating functions (e.g. Ovar-DRA, APC, BATF2, MAGEB18), cell regulation signalling pathways (e.g. KRIT1, PI3K, ORRC3), and respiratory health (CYSLTR1). Characterizing adaptive allele distributions from novel genetic techniques will facilitate investigation of the influence of environmental variation on local adaptation of a northern alpine ungulate throughout its range. This research demonstrated the utility of exon capture for gene-targeted SNP discovery and subsequent SNP chip genotyping using low-quality samples in a nonmodel species. PMID:27327375

  8. SAGExplore: a web server for unambiguous tag mapping in serial analysis of gene expression oriented to gene discovery and annotation.

    PubMed

    Norambuena, Tomás; Malig, Rodrigo; Melo, Francisco

    2007-07-01

    We describe a web server for the accurate mapping of experimental tags in serial analysis of gene expression (SAGE). The core of the server relies on a database of genomic virtual tags built by a recently described method that attempts to reduce the amount of ambiguous assignments for those tags that are not unique in the genome. The method provides a complete annotation of potential virtual SAGE tags within a genome, along with an estimation of their confidence for experimental observation that ranks tags that present multiple matches in the genome. The output of the server consists of a table in HTML format that contains links to a graphic representation of the results and to some external servers and databases, facilitating the tasks of analysis of gene expression and gene discovery. Also, a table in tab delimited text format is produced, allowing the user to export the results into custom databases and software for further analysis. The current server version provides the most accurate and complete SAGE tag mapping source that is available for the yeast organism. In the near future, this server will also allow the accurate mapping of experimental SAGE-tags from other model organisms such as human, mouse, frog and fly. The server is freely available on the web at: http://dna.bio.puc.cl/SAGExplore.html. PMID:17626053

  9. Genomics-Based Discovery of Plant Genes for Synthetic Biology of Terpenoid Fragrances: A Case Study in Sandalwood oil Biosynthesis.

    PubMed

    Celedon, J M; Bohlmann, J

    2016-01-01

    Terpenoid fragrances are powerful mediators of ecological interactions in nature and have a long history of traditional and modern industrial applications. Plants produce a great diversity of fragrant terpenoid metabolites, which make them a superb source of biosynthetic genes and enzymes. Advances in fragrance gene discovery have enabled new approaches in synthetic biology of high-value speciality molecules toward applications in the fragrance and flavor, food and beverage, cosmetics, and other industries. Rapid developments in transcriptome and genome sequencing of nonmodel plant species have accelerated the discovery of fragrance biosynthetic pathways. In parallel, advances in metabolic engineering of microbial and plant systems have established platforms for synthetic biology applications of some of the thousands of plant genes that underlie fragrance diversity. While many fragrance molecules (eg, simple monoterpenes) are abundant in readily renewable plant materials, some highly valuable fragrant terpenoids (eg, santalols, ambroxides) are rare in nature and interesting targets for synthetic biology. As a representative example for genomics/transcriptomics enabled gene and enzyme discovery, we describe a strategy used successfully for elucidation of a complete fragrance biosynthetic pathway in sandalwood (Santalum album) and its reconstruction in yeast (Saccharomyces cerevisiae). We address questions related to the discovery of specific genes within large gene families and recovery of rare gene transcripts that are selectively expressed in recalcitrant tissues. To substantiate the validity of the approaches, we describe the combination of methods used in the gene and enzyme discovery of a cytochrome P450 in the fragrant heartwood of tropical sandalwood, responsible for the fragrance defining, final step in the biosynthesis of (Z)-santalols. PMID:27480682

  10. De novo transcriptome sequencing and discovery of genes related to copper tolerance in Paeonia ostii.

    PubMed

    Wang, Yanjie; Dong, Chunlan; Xue, Zeyun; Jin, Qijiang; Xu, Yingchun

    2016-01-15

    Paeonia ostii, an important ornamental and medicinal plant, grows normally on copper (Cu) mines with widespread Cu contamination of soils, and it has the ability to lower Cu contents in the Cu-contaminated soils. However, very little molecular information concerned with Cu resistance of P. ostii is available. In this study, high-throughput de novo transcriptome sequencing was carried out for P. ostii with and without Cu treatment using Illumina HiSeq 2000 platform. A total of 77,704 All-unigenes were obtained with a mean length of 710 bp. Of these unigenes, 47,461 were annotated with public databases based on sequence similarities. Comparative transcript profiling allowed the discovery of 4324 differentially expressed genes (DEGs), with 2207 up-regulated and 2117 down-regulated unigenes in Cu-treated library as compared to the control counterpart. Based on these DEGs, Gene Ontology (GO) enrichment analysis indicated Cu stress-relevant terms, such as 'membrane' and 'antioxidant activity'. Meanwhile, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis uncovered some important pathways, including 'biosynthesis of secondary metabolites' and 'metabolic pathways'. In addition, expression patterns of 12 selected DEGs derived from quantitative real-time polymerase chain reaction (qRT-PCR) were consistent with their transcript abundance changes obtained by transcriptomic analyses, suggesting that all the 12 genes were authentically involved in Cu tolerance in P. ostii. This is the first report to identify genes related to Cu stress responses in P. ostii, which could offer valuable information on the molecular mechanisms of Cu resistance, and provide a basis for further genomics research on this and related ornamental species for phytoremediation. PMID:26435192

  11. Discovery of MicroRNA169 Gene Copies in Genomes of Flowering Plants through Positional Information

    PubMed Central

    Calviño, Martín; Messing, Joachim

    2013-01-01

    Expansion and contraction of microRNA (miRNA) families can be studied in sequenced plant genomes through sequence alignments. Here, we focused on miR169 in sorghum because of its implications in drought tolerance and stem-sugar content. We were able to discover many miR169 copies that have escaped standard genome annotation methods. A new miR169 cluster was found on sorghum chromosome 1. This cluster is composed of the previously annotated sbi-MIR169o together with two newly found MIR169 copies, named sbi-MIR169t and sbi-MIR169u. We also found that a miR169 cluster on sorghum chr7 consisting of sbi-MIR169l, sbi-MIR169m, and sbi-MIR169n is contained within a chromosomal inversion of at least 500 kb that occurred in sorghum relative to Brachypodium, rice, foxtail millet, and maize. Surprisingly, synteny of chromosomal segments containing MIR169 copies with linked bHLH and CONSTANS-LIKE genes extended from Brachypodium to dictotyledonous species such as grapevine, soybean, and cassava, indicating a strong conservation of linkages of certain flowering and/or plant height genes and microRNAs, which may explain linkage drag of drought and flowering traits and would have consequences for breeding new varieties. Furthermore, alignment of rice and sorghum orthologous regions revealed the presence of two additional miR169 gene copies (miR169r and miR169s) on sorghum chr7 that formed an antisense miRNA gene pair. Both copies are expressed and target different set of genes. Synteny-based analysis of microRNAs among different plant species should lead to the discovery of new microRNAs in general and contribute to our understanding of their evolution. PMID:23348041

  12. Discovery of MicroRNA169 gene copies in genomes of flowering plants through positional information.

    PubMed

    Calviño, Martín; Messing, Joachim

    2013-01-01

    Expansion and contraction of microRNA (miRNA) families can be studied in sequenced plant genomes through sequence alignments. Here, we focused on miR169 in sorghum because of its implications in drought tolerance and stem-sugar content. We were able to discover many miR169 copies that have escaped standard genome annotation methods. A new miR169 cluster was found on sorghum chromosome 1. This cluster is composed of the previously annotated sbi-MIR169o together with two newly found MIR169 copies, named sbi-MIR169t and sbi-MIR169u. We also found that a miR169 cluster on sorghum chr7 consisting of sbi-MIR169l, sbi-MIR169m, and sbi-MIR169n is contained within a chromosomal inversion of at least 500 kb that occurred in sorghum relative to Brachypodium, rice, foxtail millet, and maize. Surprisingly, synteny of chromosomal segments containing MIR169 copies with linked bHLH and CONSTANS-LIKE genes extended from Brachypodium to dictotyledonous species such as grapevine, soybean, and cassava, indicating a strong conservation of linkages of certain flowering and/or plant height genes and microRNAs, which may explain linkage drag of drought and flowering traits and would have consequences for breeding new varieties. Furthermore, alignment of rice and sorghum orthologous regions revealed the presence of two additional miR169 gene copies (miR169r and miR169s) on sorghum chr7 that formed an antisense miRNA gene pair. Both copies are expressed and target different set of genes. Synteny-based analysis of microRNAs among different plant species should lead to the discovery of new microRNAs in general and contribute to our understanding of their evolution. PMID:23348041

  13. Sleeping Beauty transposon insertional mutagenesis based mouse models for cancer gene discovery

    PubMed Central

    Moriarity, Branden S; Largaespada, David A

    2016-01-01

    Large-scale genomic efforts to study human cancer, such as the cancer gene atlas (TCGA), have identified numerous cancer drivers in a wide variety of tumor types. However, there are limitations to this approach, the mutations and expression or copy number changes that are identified are not always clearly functionally relevant, and only annotated genes and genetic elements are thoroughly queried. The use of complimentary, nonbiased, functional approaches to identify drivers of cancer development and progression is ideal to maximize the rate at which cancer discoveries are achieved. One such approach that has been successful is the use of the Sleeping Beauty (SB) transposon-based mutagenesis system in mice. This system uses a conditionally expressed transposase and mutagenic transposon allele to target mutagenesis to somatic cells of a given tissue in mice to cause random mutations leading to tumor development. Analysis of tumors for transposon common insertion sites (CIS) identifies candidate cancer genes specific to that tumor type. While similar screens have been performed in mice with the PiggyBac (PB) transposon and viral approaches, we limit extensive discussion to SB. Here we discuss the basic structure of these screens, screens that have been performed, methods used to identify CIS. PMID:26051241

  14. Gene Discovery for Synthetic Biology: Exploring the Novel Natural Product Biosynthetic Capacity of Eukaryotic Microalgae.

    PubMed

    O'Neill, E C; Saalbach, G; Field, R A

    2016-01-01

    Eukaryotic microalgae are an incredibly diverse group of organisms whose sole unifying feature is their ability to photosynthesize. They are known for producing a range of potent toxins, which can build up during harmful algal blooms causing damage to ecosystems and fisheries. Genome sequencing is lagging behind in these organisms because of their genetic complexity, but transcriptome sequencing is beginning to make up for this deficit. As more sequence data becomes available, it is apparent that eukaryotic microalgae possess a range of complex natural product biosynthesis capabilities. Some of the genes concerned are responsible for the biosynthesis of known toxins, but there are many more for which we do not know the products. Bioinformatic and analytical techniques have been developed for natural product discovery in bacteria and these approaches can be used to extract information about the products synthesized by algae. Recent analyses suggest that eukaryotic microalgae produce many complex natural products that remain to be discovered. PMID:27480684

  15. Applying knowledge-anchored hypothesis discovery methods to advance clinical and translational research: the OAMiner project

    PubMed Central

    Jackson, Rebecca D; Best, Thomas M; Borlawsky, Tara B; Lai, Albert M; James, Stephen; Gurcan, Metin N

    2012-01-01

    The conduct of clinical and translational research regularly involves the use of a variety of heterogeneous and large-scale data resources. Scalable methods for the integrative analysis of such resources, particularly when attempting to leverage computable domain knowledge in order to generate actionable hypotheses in a high-throughput manner, remain an open area of research. In this report, we describe both a generalizable design pattern for such integrative knowledge-anchored hypothesis discovery operations and our experience in applying that design pattern in the experimental context of a set of driving research questions related to the publicly available Osteoarthritis Initiative data repository. We believe that this ‘test bed’ project and the lessons learned during its execution are both generalizable and representative of common clinical and translational research paradigms. PMID:22647689

  16. Molecular Networking and Pattern-Based Genome Mining Improves discovery of biosynthetic gene clusters and their products from Salinispora species

    PubMed Central

    Duncan, Katherine R.; Crüsemann, Max; Lechner, Anna; Sarkar, Anindita; Li, Jie; Ziemert, Nadine; Wang, Mingxun; Bandeira, Nuno; Moore, Bradley S.; Dorrestein, Pieter C.; Jensen, Paul R.

    2015-01-01

    Summary Genome sequencing has revealed that bacteria contain many more biosynthetic gene clusters than predicted based on the number of secondary metabolites discovered to date. While this biosynthetic reservoir has fostered interest in new tools for natural product discovery, there remains a gap between gene cluster detection and compound discovery. Here we apply molecular networking and the new concept of pattern-based genome mining to 35 Salinispora strains including 30 for which draft genome sequences were either available or obtained for this study. The results provide a method to simultaneously compare large numbers of complex microbial extracts, which facilitated the identification of media components, known compounds and their derivatives, and new compounds that could be prioritized for structure elucidation. These efforts revealed considerable metabolite diversity and led to several molecular family-gene cluster pairings, of which the quinomycin-type depsipeptide retimycin A was characterized and linked to gene cluster NRPS40 using pattern-based bioinformatic approaches. PMID:25865308

  17. Accelerating Gene Discovery by Phenotyping Whole-Genome Sequenced Multi-mutation Strains and Using the Sequence Kernel Association Test (SKAT).

    PubMed

    Timbers, Tiffany A; Garland, Stephanie J; Mohan, Swetha; Flibotte, Stephane; Edgley, Mark; Muncaster, Quintin; Au, Vinci; Li-Leger, Erica; Rosell, Federico I; Cai, Jerry; Rademakers, Suzanne; Jansen, Gert; Moerman, Donald G; Leroux, Michel R

    2016-08-01

    Forward genetic screens represent powerful, unbiased approaches to uncover novel components in any biological process. Such screens suffer from a major bottleneck, however, namely the cloning of corresponding genes causing the phenotypic variation. Reverse genetic screens have been employed as a way to circumvent this issue, but can often be limited in scope. Here we demonstrate an innovative approach to gene discovery. Using C. elegans as a model system, we used a whole-genome sequenced multi-mutation library, from the Million Mutation Project, together with the Sequence Kernel Association Test (SKAT), to rapidly screen for and identify genes associated with a phenotype of interest, namely defects in dye-filling of ciliated sensory neurons. Such anomalies in dye-filling are often associated with the disruption of cilia, organelles which in humans are implicated in sensory physiology (including vision, smell and hearing), development and disease. Beyond identifying several well characterised dye-filling genes, our approach uncovered three genes not previously linked to ciliated sensory neuron development or function. From these putative novel dye-filling genes, we confirmed the involvement of BGNT-1.1 in ciliated sensory neuron function and morphogenesis. BGNT-1.1 functions at the trans-Golgi network of sheath cells (glia) to influence dye-filling and cilium length, in a cell non-autonomous manner. Notably, BGNT-1.1 is the orthologue of human B3GNT1/B4GAT1, a glycosyltransferase associated with Walker-Warburg syndrome (WWS). WWS is a multigenic disorder characterised by muscular dystrophy as well as brain and eye anomalies. Together, our work unveils an effective and innovative approach to gene discovery, and provides the first evidence that B3GNT1-associated Walker-Warburg syndrome may be considered a ciliopathy. PMID:27508411

  18. Accelerating Gene Discovery by Phenotyping Whole-Genome Sequenced Multi-mutation Strains and Using the Sequence Kernel Association Test (SKAT)

    PubMed Central

    Garland, Stephanie J.; Mohan, Swetha; Flibotte, Stephane; Muncaster, Quintin; Cai, Jerry; Rademakers, Suzanne; Moerman, Donald G.; Leroux, Michel R.

    2016-01-01

    Forward genetic screens represent powerful, unbiased approaches to uncover novel components in any biological process. Such screens suffer from a major bottleneck, however, namely the cloning of corresponding genes causing the phenotypic variation. Reverse genetic screens have been employed as a way to circumvent this issue, but can often be limited in scope. Here we demonstrate an innovative approach to gene discovery. Using C. elegans as a model system, we used a whole-genome sequenced multi-mutation library, from the Million Mutation Project, together with the Sequence Kernel Association Test (SKAT), to rapidly screen for and identify genes associated with a phenotype of interest, namely defects in dye-filling of ciliated sensory neurons. Such anomalies in dye-filling are often associated with the disruption of cilia, organelles which in humans are implicated in sensory physiology (including vision, smell and hearing), development and disease. Beyond identifying several well characterised dye-filling genes, our approach uncovered three genes not previously linked to ciliated sensory neuron development or function. From these putative novel dye-filling genes, we confirmed the involvement of BGNT-1.1 in ciliated sensory neuron function and morphogenesis. BGNT-1.1 functions at the trans-Golgi network of sheath cells (glia) to influence dye-filling and cilium length, in a cell non-autonomous manner. Notably, BGNT-1.1 is the orthologue of human B3GNT1/B4GAT1, a glycosyltransferase associated with Walker-Warburg syndrome (WWS). WWS is a multigenic disorder characterised by muscular dystrophy as well as brain and eye anomalies. Together, our work unveils an effective and innovative approach to gene discovery, and provides the first evidence that B3GNT1-associated Walker-Warburg syndrome may be considered a ciliopathy. PMID:27508411

  19. Next-generation gene discovery for variants of large impact on lipid traits

    PubMed Central

    Rosenthal, Elisabeth; Blue, Elizabeth; Jarvik, Gail P.

    2015-01-01

    Purpose of review Detection of high impact variants on lipid traits is complicated by complex genetic architecture. Although genome-wide association studies (GWAS) successfully identified many novel genes associated with lipid traits, it was less successful in identifying variants with a large impact on the phenotype. This is not unexpected, as the more common variants detectable by GWAS typically have small effects. The availability of large familial datasets and sequence data has changed the paradigm for successful genomic discovery of the novel genes and pathogenic variants underlying lipid disorders. Recent findings Novel loci with large effects have been successfully mapped in families, and next-generation sequencing allowed for the identification of the underlying lipid associated variants of large effect size. The success of this strategy relies on the simplification of the underlying genetic variation by focusing on large single families segregating extreme lipid phenotypes. Summary Rare, high impact variants are expected to have large effects and be more relevant for medical and pharmaceutical applications. Family data have many advantages over population-based data because they allow for the efficient detection of high-impact variants with an exponentially smaller sample size and increased power for follow-up studies. PMID:25636063

  20. ESTs from a wild Arachis species for gene discovery and marker development

    PubMed Central

    Proite, Karina; Leal-Bertioli, Soraya CM; Bertioli, David J; Moretzsohn, Márcio C; da Silva, Felipe R; Martins, Natalia F; Guimarães, Patrícia M

    2007-01-01

    Background Due to its origin, peanut has a very narrow genetic background. Wild relatives can be a source of genetic variability for cultivated peanut. In this study, the transcriptome of the wild species Arachis stenosperma accession V10309 was analyzed. Results ESTs were produced from four cDNA libraries of RNAs extracted from leaves and roots of A. stenosperma. Randomly selected cDNA clones were sequenced to generate 8,785 ESTs, of which 6,264 (71.3%) had high quality, with 3,500 clusters: 963 contigs and 2537 singlets. Only 55.9% matched homologous sequences of known genes. ESTs were classified into 23 different categories according to putative protein functions. Numerous sequences related to disease resistance, drought tolerance and human health were identified. Two hundred and six microsatellites were found and markers have been developed for 188 of these. The microsatellite profile was analyzed and compared to other transcribed and genomic sequence data. Conclusion This is, to date, the first report on the analysis of transcriptome of a wild relative of peanut. The ESTs produced in this study are a valuable resource for gene discovery, the characterization of new wild alleles, and for marker development. The ESTs were released in the [GenBank:EH041934 to EH048197]. PMID:17302987

  1. Topological and functional discovery in a gene coexpression meta-network of gastric cancer.

    PubMed

    Aggarwal, Amit; Guo, Dong Li; Hoshida, Yujin; Yuen, Siu Tsan; Chu, Kent-Man; So, Samuel; Boussioutas, Alex; Chen, Xin; Bowtell, David; Aburatani, Hiroyuki; Leung, Suet Yi; Tan, Patrick

    2006-01-01

    Gastric cancer is a leading cause of global cancer mortality, but comparatively little is known about the cellular pathways regulating different aspects of the gastric cancer phenotype. To achieve a better understanding of gastric cancer at the levels of systems topology, functional modules, and constituent genes, we assembled and systematically analyzed a consensus gene coexpression meta-network of gastric cancer incorporating >300 tissue samples from four independent patient populations (the "gastrome"). We find that the gastrome exhibits a hierarchical scale-free architecture, with an internal structure comprising multiple deeply embedded modules associated with diverse cellular functions. Individual modules display distinct subtopologies, with some (cellular proliferation) being integrated within the primary network, and others (ribosomal biosynthesis) being relatively isolated. One module associated with intestinal differentiation exhibited a remarkably high degree of autonomy, raising the possibility that its specific topological features may contribute towards the frequent occurrence of intestinal metaplasia in gastric cancer. At the single-gene level, we discovered a novel conserved interaction between the PLA2G2A prognostic marker and the EphB2 receptor, and used tissue microarrays to validate the PLA2G2A/EphB2 association. Finally, because EphB2 is a known target of the Wnt signaling pathway, we tested and provide evidence that the Wnt pathway may also similarly regulate PLA2G2A. Many of these findings were not discernible by studying the single patient populations in isolation. Thus, besides enhancing our knowledge of gastric cancer, our results show the broad utility of applying meta-analytic approaches to genome-wide data for the purposes of biological discovery. PMID:16397236

  2. Adeno-Associated Virus at 50: A Golden Anniversary of Discovery, Research, and Gene Therapy Success—A Personal Perspective

    PubMed Central

    Hastie, Eric

    2015-01-01

    Abstract Fifty years after the discovery of adeno-associated virus (AAV) and more than 30 years after the first gene transfer experiment was conducted, dozens of gene therapy clinical trials are in progress, one vector is approved for use in Europe, and breakthroughs in virus modification and disease modeling are paving the way for a revolution in the treatment of rare diseases, cancer, as well as HIV. This review will provide a historical perspective on the progression of AAV for gene therapy from discovery to the clinic, focusing on contributions from the Samulski lab regarding basic science and cloning of AAV, optimized large-scale production of vectors, preclinical large animal studies and safety data, vector modifications for improved efficacy, and successful clinical applications. PMID:25807962

  3. A genome-wide cis-regulatory element discovery method based on promoter sequences and gene co-expression networks

    PubMed Central

    2013-01-01

    Background Deciphering cis-regulatory networks has become an attractive yet challenging task. This paper presents a simple method for cis-regulatory network discovery which aims to avoid some of the common problems of previous approaches. Results Using promoter sequences and gene expression profiles as input, rather than clustering the genes by the expression data, our method utilizes co-expression neighborhood information for each individual gene, thereby overcoming the disadvantages of current clustering based models which may miss specific information for individual genes. In addition, rather than using a motif database as an input, it implements a simple motif count table for each enumerated k-mer for each gene promoter sequence. Thus, it can be used for species where previous knowledge of cis-regulatory motifs is unknown and has the potential to discover new transcription factor binding sites. Applications on Saccharomyces cerevisiae and Arabidopsis have shown that our method has a good prediction accuracy and outperforms a phylogenetic footprinting approach. Furthermore, the top ranked gene-motif regulatory clusters are evidently functionally co-regulated, and the regulatory relationships between the motifs and the enriched biological functions can often be confirmed by literature. Conclusions Since this method is simple and gene-specific, it can be readily utilized for insufficiently studied species or flexibly used as an additional step or data source for previous transcription regulatory networks discovery models. PMID:23368633

  4. Gene invasion in distant eukaryotic lineages: discovery of mutually exclusive genetic elements reveals marine biodiversity.

    PubMed

    Monier, Adam; Sudek, Sebastian; Fast, Naomi M; Worden, Alexandra Z

    2013-09-01

    Inteins are rare, translated genetic parasites mainly found in bacteria and archaea, while spliceosomal introns are distinctly eukaryotic features abundant in most nuclear genomes. Using targeted metagenomics, we discovered an intein in an Atlantic population of the photosynthetic eukaryote, Bathycoccus, harbored by the essential spliceosomal protein PRP8 (processing factor 8 protein). Although previously thought exclusive to fungi, we also identified PRP8 inteins in parasitic (Capsaspora) and predatory (Salpingoeca) protists. Most new PRP8 inteins were at novel insertion sites that, surprisingly, were not in the most conserved regions of the gene. Evolutionarily, Dikarya fungal inteins at PRP8 insertion site a appeared more related to the Bathycoccus intein at a unique insertion site, than to other fungal and opisthokont inteins. Strikingly, independent analyses of Pacific and Atlantic samples revealed an intron at the same codon as the Bathycoccus PRP8 intein. The two elements are mutually exclusive and neither was found in cultured Bathycoccus or other picoprasinophyte genomes. Thus, wild Bathycoccus contain one of few non-fungal eukaryotic inteins known and a rare polymorphic intron. Our data indicate at least two Bathycoccus ecotypes exist, associated respectively with oceanic or mesotrophic environments. We hypothesize that intein propagation is facilitated by marine viruses; and, while intron gain is still poorly understood, presence of a spliceosomal intron where a locus lacks an intein raises the possibility of new, intein-primed mechanisms for intron gain. The discovery of nucleus-encoded inteins and associated sequence polymorphisms in uncultivated marine eukaryotes highlights their diversity and reveals potential sexual boundaries between populations indistinguishable by common marker genes. PMID:23635865

  5. Gene invasion in distant eukaryotic lineages: discovery of mutually exclusive genetic elements reveals marine biodiversity

    PubMed Central

    Monier, Adam; Sudek, Sebastian; Fast, Naomi M; Worden, Alexandra Z

    2013-01-01

    Inteins are rare, translated genetic parasites mainly found in bacteria and archaea, while spliceosomal introns are distinctly eukaryotic features abundant in most nuclear genomes. Using targeted metagenomics, we discovered an intein in an Atlantic population of the photosynthetic eukaryote, Bathycoccus, harbored by the essential spliceosomal protein PRP8 (processing factor 8 protein). Although previously thought exclusive to fungi, we also identified PRP8 inteins in parasitic (Capsaspora) and predatory (Salpingoeca) protists. Most new PRP8 inteins were at novel insertion sites that, surprisingly, were not in the most conserved regions of the gene. Evolutionarily, Dikarya fungal inteins at PRP8 insertion site a appeared more related to the Bathycoccus intein at a unique insertion site, than to other fungal and opisthokont inteins. Strikingly, independent analyses of Pacific and Atlantic samples revealed an intron at the same codon as the Bathycoccus PRP8 intein. The two elements are mutually exclusive and neither was found in cultured Bathycoccus or other picoprasinophyte genomes. Thus, wild Bathycoccus contain one of few non-fungal eukaryotic inteins known and a rare polymorphic intron. Our data indicate at least two Bathycoccus ecotypes exist, associated respectively with oceanic or mesotrophic environments. We hypothesize that intein propagation is facilitated by marine viruses; and, while intron gain is still poorly understood, presence of a spliceosomal intron where a locus lacks an intein raises the possibility of new, intein-primed mechanisms for intron gain. The discovery of nucleus-encoded inteins and associated sequence polymorphisms in uncultivated marine eukaryotes highlights their diversity and reveals potential sexual boundaries between populations indistinguishable by common marker genes. PMID:23635865

  6. Display technologies: application for the discovery of drug and gene delivery agents

    PubMed Central

    Sergeeva, Anna; Kolonin, Mikhail G.; Molldrem, Jeffrey J.; Pasqualini, Renata; Arap, Wadih

    2007-01-01

    Recognition of molecular diversity of cell surface proteomes in disease is essential for the development of targeted therapies. Progress in targeted therapeutics requires establishing effective approaches for high-throughput identification of agents specific for clinically relevant cell surface markers. Over the past decade, a number of platform strategies have been developed to screen polypeptide libraries for ligands targeting receptors selectively expressed in the context of various cell surface proteomes. Streamlined procedures for identification of ligand-receptor pairs that could serve as targets in disease diagnosis, profiling, imaging and therapy have relied on the display technologies, in which polypeptides with desired binding profiles can be serially selected, in a process called biopanning, based on their physical linkage with the encoding nucleic acid. These technologies include virus/phage display, cell display, ribosomal display, mRNA display and covalent DNA display (CDT), with phage display being by far the most utilized. The scope of this review is the recent advancements in the display technologies with a particular emphasis on molecular mapping of cell surface proteomes with peptide phage display. Prospective applications of targeted compounds derived from display libraries in the discovery of targeted drugs and gene therapy vectors are discussed. PMID:17123658

  7. An Evaluation of Active Learning Causal Discovery Methods for Reverse-Engineering Local Causal Pathways of Gene Regulation

    PubMed Central

    Ma, Sisi; Kemmeren, Patrick; Aliferis, Constantin F.; Statnikov, Alexander

    2016-01-01

    Reverse-engineering of causal pathways that implicate diseases and vital cellular functions is a fundamental problem in biomedicine. Discovery of the local causal pathway of a target variable (that consists of its direct causes and direct effects) is essential for effective intervention and can facilitate accurate diagnosis and prognosis. Recent research has provided several active learning methods that can leverage passively observed high-throughput data to draft causal pathways and then refine the inferred relations with a limited number of experiments. The current study provides a comprehensive evaluation of the performance of active learning methods for local causal pathway discovery in real biological data. Specifically, 54 active learning methods/variants from 3 families of algorithms were applied for local causal pathways reconstruction of gene regulation for 5 transcription factors in S. cerevisiae. Four aspects of the methods’ performance were assessed, including adjacency discovery quality, edge orientation accuracy, complete pathway discovery quality, and experimental cost. The results of this study show that some methods provide significant performance benefits over others and therefore should be routinely used for local causal pathway discovery tasks. This study also demonstrates the feasibility of local causal pathway reconstruction in real biological systems with significant quality and low experimental cost. PMID:26939894

  8. Gene-based single nucleotide polymorphism discovery in bovine muscle using next-generation transcriptomic sequencing

    PubMed Central

    2013-01-01

    Background Genetic information based on molecular markers has increasingly being used in cattle breeding improvement programmes, as a mean to improve conventionally phenotypic selection. Advances in molecular genetics have led to the identification of several genetic markers associated with genes affecting economic traits. Until recently, the identification of the causative genetic variants involved in the phenotypes of interest has remained a difficult task. The advent of novel sequencing technologies now offers a new opportunity for the identification of such variants. Despite sequencing costs plummeting, sequencing whole-genomes or large targeted regions is still too expensive for most laboratories. A transcriptomic-based sequencing approach offers a cheaper alternative to identify a large number of polymorphisms and possibly to discover causative variants. In the present study, we performed a gene-based single nucleotide polymorphism (SNP) discovery analysis in bovine Longissimus thoraci, using RNA-Seq. To our knowledge, this represents the first study done in bovine muscle. Results Messenger RNAs from Longissimus thoraci from three Limousin bull calves were subjected to high-throughput sequencing. Approximately 36–46 million paired-end reads were obtained per library. A total of 19,752 transcripts were identified and 34,376 different SNPs were detected. Fifty-five percent of the SNPs were found in coding regions and ~22% resulted in an amino acid change. Applying a very stringent SNP quality threshold, we detected 8,407 different high-confidence SNPs, 18% of which are non synonymous coding SNPs. To analyse the accuracy of RNA-Seq technology for SNP detection, 48 SNPs were selected for validation by genotyping. No discrepancies were observed when using the highest SNP probability threshold. To test the usefulness of the identified SNPs, the 48 selected SNPs were assessed by genotyping 93 bovine samples, representing mostly the nine major breeds used in France

  9. 2008 Homer W. Smith Award: insights into the pathogenesis of polycystic kidney disease from gene discovery.

    PubMed

    Harris, Peter C

    2009-06-01

    Polycystic kidney diseases (PKD) are a group of inherited disorders characterized by morbidity-associated development of renal cysts. Three forms of PKD are described here: The common, late onset, autosomal dominant PKD (ADPKD); the mainly infantile, autosomal recessive PKD (ARPKD); and the lethal, syndromic, Meckel syndrome that also includes central nervous system and digital defects. Positional cloning approaches based on genetic linkage have identified the disease genes in these disorders. Completion of the Human Genome Project, cases with atypical mutation, and animal models have greatly aided gene identification, and characterization of the disease genes has allowed establishment of molecular diagnostics. Genetic and allelic heterogeneity, plus genetic modification, underlie the significant phenotypic variability in each disorder. Positional cloning identified novel disease-associated protein families: The polycystins (ADPKD); fibrocystins (ARPKD); and meckelin. A common feature of pathogenesis in each disorder seems to be the primary cilia, implicating detection of fluid flow and the developmental process of planar cell polarity. Identifying the primary defect has contributed to our understanding of defective cellular processes and highlights potential therapeutic targets. A number of agents are now in Phase 3 trials, and many others show promise preclinically, providing hope of effective treatments for ADPKD in the foreseeable future. PMID:19423684

  10. De novo Assembly and Characterization of the Transcriptome of Broomcorn Millet (Panicum miliaceum L.) for Gene Discovery and Marker Development.

    PubMed

    Yue, Hong; Wang, Le; Liu, Hui; Yue, Wenjie; Du, Xianghong; Song, Weining; Nie, Xiaojun

    2016-01-01

    Broomcorn millet (Panicum miliaceum L.) is one of the world's oldest cultivated cereals, which is well-adapted to extreme environments such as drought, heat, and salinity with an efficient C4 carbon fixation. Discovery and identification of genes involved in these processes will provide valuable information to improve the crop for meeting the challenge of global climate change. However, the lack of genetic resources and genomic information make gene discovery and molecular mechanism studies very difficult. Here, we sequenced and assembled the transcriptome of broomcorn millet using Illumina sequencing technology. After sequencing, a total of 45,406,730 and 51,160,820 clean paired-end reads were obtained for two genotypes Yumi No. 2 and Yumi No. 3. These reads were mixed and then assembled into 113,643 unigenes, with the length ranging from 351 to 15,691 bp, of which 62,543 contings could be assigned to 315 gene ontology (GO) categories. Cluster of orthologous groups and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses assigned could map 15,514 unigenes into 202 KEGG pathways and 51,020 unigenes to 25 COG categories, respectively. Furthermore, 35,216 simple sequence repeats (SSRs) were identified in 27,055 unigene sequences, of which trinucleotides were the most abundant repeat unit, accounting for 66.72% of SSRs. In addition, 292 differentially expressed genes were identified between the two genotypes, which were significantly enriched in 88 GO terms and 12 KEGG pathways. Finally, the expression patterns of four selected transcripts were validated through quantitative reverse transcription polymerase chain reaction analysis. Our study for the first time sequenced and assembled the transcriptome of broomcorn millet, which not only provided a rich sequence resource for gene discovery and marker development in this important crop, but will also facilitate the further investigation of the molecular mechanism of its favored agronomic traits and beyond. PMID

  11. De novo Assembly and Characterization of the Transcriptome of Broomcorn Millet (Panicum miliaceum L.) for Gene Discovery and Marker Development

    PubMed Central

    Yue, Hong; Wang, Le; Liu, Hui; Yue, Wenjie; Du, Xianghong; Song, Weining; Nie, Xiaojun

    2016-01-01

    Broomcorn millet (Panicum miliaceum L.) is one of the world’s oldest cultivated cereals, which is well-adapted to extreme environments such as drought, heat, and salinity with an efficient C4 carbon fixation. Discovery and identification of genes involved in these processes will provide valuable information to improve the crop for meeting the challenge of global climate change. However, the lack of genetic resources and genomic information make gene discovery and molecular mechanism studies very difficult. Here, we sequenced and assembled the transcriptome of broomcorn millet using Illumina sequencing technology. After sequencing, a total of 45,406,730 and 51,160,820 clean paired-end reads were obtained for two genotypes Yumi No. 2 and Yumi No. 3. These reads were mixed and then assembled into 113,643 unigenes, with the length ranging from 351 to 15,691 bp, of which 62,543 contings could be assigned to 315 gene ontology (GO) categories. Cluster of orthologous groups and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses assigned could map 15,514 unigenes into 202 KEGG pathways and 51,020 unigenes to 25 COG categories, respectively. Furthermore, 35,216 simple sequence repeats (SSRs) were identified in 27,055 unigene sequences, of which trinucleotides were the most abundant repeat unit, accounting for 66.72% of SSRs. In addition, 292 differentially expressed genes were identified between the two genotypes, which were significantly enriched in 88 GO terms and 12 KEGG pathways. Finally, the expression patterns of four selected transcripts were validated through quantitative reverse transcription polymerase chain reaction analysis. Our study for the first time sequenced and assembled the transcriptome of broomcorn millet, which not only provided a rich sequence resource for gene discovery and marker development in this important crop, but will also facilitate the further investigation of the molecular mechanism of its favored agronomic traits and beyond. PMID

  12. European approach to the Human Gene Project.

    PubMed

    Ferguson-Smith, M A

    1991-01-01

    In the history of gene mapping, which extends through most of the present century, Europe has played an important role. This has continued during the evolution of the 10 International Human Gene Mapping Workshops that have been held in seven different countries since 1973. Nationally coordinated programs have been a recent development, and several European countries, including the United Kingdom and Italy, have followed the lead of the United States in investing substantial sums of money in research on the human genome. In addition, the European Community has launched a multinational program of research on Human Genome Analysis to complement the various national initiatives. The particular approach in Europe has been to support those in the field by establishing resource centers for distributing biomaterials and accessing databases, by assisting in the training of scientists, and by funding programs of research directed at present needs in both physical and genetic mapping. PMID:1991586

  13. De Novo Transcriptomic Analysis of Peripheral Blood Lymphocytes from the Chinese Goose: Gene Discovery and Immune System Pathway Description

    PubMed Central

    Tariq, Mansoor; Chen, Rong; Yuan, Hongyu; Liu, Yanjie; Wu, Yanan; Wang, Junya; Xia, Chun

    2015-01-01

    Background The Chinese goose is one of the most economically important poultry birds and is a natural reservoir for many avian viruses. However, the nature and regulation of the innate and adaptive immune systems of this waterfowl species are not completely understood due to limited information on the goose genome. Recently, transcriptome sequencing technology was applied in the genomic studies focused on novel gene discovery. Thus, this study described the transcriptome of the goose peripheral blood lymphocytes to identify immunity relevant genes. Principal Findings De novo transcriptome assembly of the goose peripheral blood lymphocytes was sequenced by Illumina-Solexa technology. In total, 211,198 unigenes were assembled from the 69.36 million cleaned reads. The average length, N50 size and the maximum length of the assembled unigenes were 687 bp, 1,298 bp and 18,992 bp, respectively. A total of 36,854 unigenes showed similarity by BLAST search against the NCBI non-redundant (Nr) protein database. For functional classification, 163,161 unigenes were comprised of three Gene Ontology (Go) categories and 67 subcategories. A total of 15,334 unigenes were annotated into 25 eukaryotic orthologous groups (KOGs) categories. Kyoto Encyclopedia of Genes and Genomes (KEGG) database annotated 39,585 unigenes into six biological functional groups and 308 pathways. Among the 2,757 unigenes that participated in the 15 immune system KEGG pathways, 125 of the most important immune relevant genes were summarized and analyzed by STRING analysis to identify gene interactions and relationships. Moreover, 10 genes were confirmed by PCR and analyzed. Of these 125 unigenes, 109 unigenes, approximately 87%, were not previously identified in the goose. Conclusion This de novo transcriptome analysis could provide important Chinese goose sequence information and highlights the value of new gene discovery, pathways investigation and immune system gene identification, and comparison with

  14. Motif discovery in promoters of genes co-localized and co-expressed during myeloid cells differentiation

    PubMed Central

    Coppe, Alessandro; Ferrari, Francesco; Bisognin, Andrea; Danieli, Gian Antonio; Ferrari, Sergio; Bicciato, Silvio; Bortoluzzi, Stefania

    2009-01-01

    Genes co-expressed may be under similar promoter-based and/or position-based regulation. Although data on expression, position and function of human genes are available, their true integration still represents a challenge for computational biology, hampering the identification of regulatory mechanisms. We carried out an integrative analysis of genomic position, functional annotation and promoters of genes expressed in myeloid cells. Promoter analysis was conducted by a novel multi-step method for discovering putative regulatory elements, i.e. over-represented motifs, in a selected set of promoters, as compared with a background model. The combination of transcriptional, structural and functional data allowed the identification of sets of promoters pertaining to groups of genes co-expressed and co-localized in regions of the human genome. The application of motif discovery to 26 groups of genes co-expressed in myeloid cells differentiation and co-localized in the genome showed that there are more over-represented motifs in promoters of co-expressed and co-localized genes than in promoters of simply co-expressed genes (CEG). Motifs, which are similar to the binding sequences of known transcription factors, non-uniformly distributed along promoter sequences and/or occurring in highly co-expressed subset of genes were identified. Co-expressed and co-localized gene sets were grouped in two co-expressed genomic meta-regions, putatively representing functional domains of a high-level expression regulation. PMID:19059999

  15. Genome-Scale Discovery of Cell Wall Biosynthesis Genes in Populus (JGI Seventh Annual User Meeting 2012: Genomics of Energy and Environment)

    ScienceCinema

    Muchero, Wellington [Oak Ridge National Laboratory

    2013-01-22

    Wellington Muchero from Oak Ridge National Laboratory gives a talk titled "Discovery of Cell Wall Biosynthesis Genes in Populus" at the JGI 7th Annual Users Meeting: Genomics of Energy & Environment Meeting on March 22, 2012 in Walnut Creek, California.

  16. Genome-Scale Discovery of Cell Wall Biosynthesis Genes in Populus (JGI Seventh Annual User Meeting 2012: Genomics of Energy and Environment)

    SciTech Connect

    Muchero, Wellington

    2012-03-22

    Wellington Muchero from Oak Ridge National Laboratory gives a talk titled "Discovery of Cell Wall Biosynthesis Genes in Populus" at the JGI 7th Annual Users Meeting: Genomics of Energy & Environment Meeting on March 22, 2012 in Walnut Creek, California.

  17. Improving data discovery and usability through commentary and user feedback: the CHARMe project

    NASA Astrophysics Data System (ADS)

    Alegre, R.; Blower, J. D.

    2014-12-01

    Earth science datasets are highly diverse. Users of these datasets are similarly varied, ranging from research scientists through industrial users to government decision- and policy-makers. It is very important for these users to understand the applicability of any dataset to their particular problem so that they can select the most appropriate data sources for their needs. Although data providers often provide rich supporting information in the form of metadata, typically this information does not include community usage information that can help other users judge fitness-for-purpose.The CHARMe project (http://www.charme.org.uk) is filling this gap by developing a system for sharing "commentary metadata". These are annotations that are generated and shared by the user community and include: Links between publications and datasets. The CHARMe system can record information about why a particular dataset was used (e.g. the paper may describe the dataset, it may use the dataset as a source, or it may be publishing results of a dataset assessment). These publications may appear in the peer-reviewed literature, or may be technical reports, websites or blog posts. Free-text comments supplied by the user. Provenance information, including links between datasets and descriptions of processing algorithms and sensors. External events that may affect data quality (e.g. large volcanic eruptions or El Niño events); we call these "significant events". Data quality information, e.g. system maturity indices. Commentary information can be linked to anything that can be uniquely identified (e.g. a dataset with a DOI or a persistent web address). It is also possible to associate commentary with particular subsets of datasets, for example to highlight an issue that is confined to a particular geographic region. We will demonstrate tools that show these capabilities in action, showing how users can apply commentary information during data discovery, visualization and analysis. The

  18. How Formal Methods Impels Discovery: A Short History of an Air Traffic Management Project

    NASA Technical Reports Server (NTRS)

    Butler, Ricky W.; Hagen, George; Maddalon, Jeffrey M.; Munoz, Cesar A.; Narkawicz, Anthony; Dowek, Gilles

    2010-01-01

    In this paper we describe a process of algorithmic discovery that was driven by our goal of achieving complete, mechanically verified algorithms that compute conflict prevention bands for use in en route air traffic management. The algorithms were originally defined in the PVS specification language and subsequently have been implemented in Java and C++. We do not present the proofs in this paper: instead, we describe the process of discovery and the key ideas that enabled the final formal proof of correctness

  19. IMG-ABC: A Knowledge Base To Fuel Discovery of Biosynthetic Gene Clusters and Novel Secondary Metabolites

    PubMed Central

    Hadjithomas, Michalis; Chen, I-Min Amy; Chu, Ken; Ratner, Anna; Palaniappan, Krishna; Szeto, Ernest; Huang, Jinghua; Reddy, T. B. K.; Cimermančič, Peter; Fischbach, Michael A.; Ivanova, Natalia N.; Markowitz, Victor M.

    2015-01-01

    ABSTRACT In the discovery of secondary metabolites, analysis of sequence data is a promising exploration path that remains largely underutilized due to the lack of computational platforms that enable such a systematic approach on a large scale. In this work, we present IMG-ABC (https://img.jgi.doe.gov/abc), an atlas of biosynthetic gene clusters within the Integrated Microbial Genomes (IMG) system, which is aimed at harnessing the power of “big” genomic data for discovering small molecules. IMG-ABC relies on IMG’s comprehensive integrated structural and functional genomic data for the analysis of biosynthetic gene clusters (BCs) and associated secondary metabolites (SMs). SMs and BCs serve as the two main classes of objects in IMG-ABC, each with a rich collection of attributes. A unique feature of IMG-ABC is the incorporation of both experimentally validated and computationally predicted BCs in genomes as well as metagenomes, thus identifying BCs in uncultured populations and rare taxa. We demonstrate the strength of IMG-ABC’s focused integrated analysis tools in enabling the exploration of microbial secondary metabolism on a global scale, through the discovery of phenazine-producing clusters for the first time in Alphaproteobacteria. IMG-ABC strives to fill the long-existent void of resources for computational exploration of the secondary metabolism universe; its underlying scalable framework enables traversal of uncovered phylogenetic and chemical structure space, serving as a doorway to a new era in the discovery of novel molecules. PMID:26173699

  20. Metabologenomics: Correlation of Microbial Gene Clusters with Metabolites Drives Discovery of a Nonribosomal Peptide with an Unusual Amino Acid Monomer

    PubMed Central

    2016-01-01

    For more than half a century the pharmaceutical industry has sifted through natural products produced by microbes, uncovering new scaffolds and fashioning them into a broad range of vital drugs. We sought a strategy to reinvigorate the discovery of natural products with distinctive structures using bacterial genome sequencing combined with metabolomics. By correlating genetic content from 178 actinomycete genomes with mass spectrometry-enabled analyses of their exported metabolomes, we paired new secondary metabolites with their biosynthetic gene clusters. We report the use of this new approach to isolate and characterize tambromycin, a new chlorinated natural product, composed of several nonstandard amino acid monomeric units, including a unique pyrrolidine-containing amino acid we name tambroline. Tambromycin shows antiproliferative activity against cancerous human B- and T-cell lines. The discovery of tambromycin via large-scale correlation of gene clusters with metabolites (a.k.a. metabologenomics) illuminates a path for structure-based discovery of natural products at a sharply increased rate. PMID:27163034

  1. IMG-ABC: An Atlas of Biosynthetic Gene Clusters to Fuel the Discovery of Novel Secondary Metabolites

    SciTech Connect

    Chen, I-Min; Chu, Ken; Ratner, Anna; Palaniappan, Krishna; Huang, Jinghua; Reddy, T. B.K.; Cimermancic, Peter; Fischbach, Michael; Ivanova, Natalia; Markowitz, Victor; Kyrpides, Nikos; Pati, Amrita

    2014-10-28

    In the discovery of secondary metabolites (SMs), large-scale analysis of sequence data is a promising exploration path that remains largely underutilized due to the lack of relevant computational resources. We present IMG-ABC (https://img.jgi.doe.gov/abc/) -- An Atlas of Biosynthetic gene Clusters within the Integrated Microbial Genomes (IMG) system1. IMG-ABC is a rich repository of both validated and predicted biosynthetic clusters (BCs) in cultured isolates, single-cells and metagenomes linked with the SM chemicals they produce and enhanced with focused analysis tools within IMG. The underlying scalable framework enables traversal of phylogenetic dark matter and chemical structure space -- serving as a doorway to a new era in the discovery of novel molecules.

  2. The Berkeley Drosophila Genome Project gene disruption project: Single P-element insertions mutating 25% of vital Drosophila genes.

    PubMed Central

    Spradling, A C; Stern, D; Beaton, A; Rhem, E J; Laverty, T; Mozden, N; Misra, S; Rubin, G M

    1999-01-01

    A fundamental goal of genetics and functional genomics is to identify and mutate every gene in model organisms such as Drosophila melanogaster. The Berkeley Drosophila Genome Project (BDGP) gene disruption project generates single P-element insertion strains that each mutate unique genomic open reading frames. Such strains strongly facilitate further genetic and molecular studies of the disrupted loci, but it has remained unclear if P elements can be used to mutate all Drosophila genes. We now report that the primary collection has grown to contain 1045 strains that disrupt more than 25% of the estimated 3600 Drosophila genes that are essential for adult viability. Of these P insertions, 67% have been verified by genetic tests to cause the associated recessive mutant phenotypes, and the validity of most of the remaining lines is predicted on statistical grounds. Sequences flanking >920 insertions have been determined to exactly position them in the genome and to identify 376 potentially affected transcripts from collections of EST sequences. Strains in the BDGP collection are available from the Bloomington Stock Center and have already assisted the research community in characterizing >250 Drosophila genes. The likely identity of 131 additional genes in the collection is reported here. Our results show that Drosophila genes have a wide range of sensitivity to inactivation by P elements, and provide a rationale for greatly expanding the BDGP primary collection based entirely on insertion site sequencing. We predict that this approach can bring >85% of all Drosophila open reading frames under experimental control. PMID:10471706

  3. Ataxin1L is a regulator of HSC function highlighting the utility of cross-tissue comparisons for gene discovery.

    PubMed

    Kahle, Juliette J; Souroullas, George P; Yu, Peng; Zohren, Fabian; Lee, Yoontae; Shaw, Chad A; Zoghbi, Huda Y; Goodell, Margaret A

    2013-03-01

    Hematopoietic stem cells (HSCs) are rare quiescent cells that continuously replenish the cellular components of the peripheral blood. Observing that the ataxia-associated gene Ataxin-1-like (Atxn1L) was highly expressed in HSCs, we examined its role in HSC function through in vitro and in vivo assays. Mice lacking Atxn1L had greater numbers of HSCs that regenerated the blood more quickly than their wild-type counterparts. Molecular analyses indicated Atxn1L null HSCs had gene expression changes that regulate a program consistent with their higher level of proliferation, suggesting that Atxn1L is a novel regulator of HSC quiescence. To determine if additional brain-associated genes were candidates for hematologic regulation, we examined genes encoding proteins from autism- and ataxia-associated protein-protein interaction networks for their representation in hematopoietic cell populations. The interactomes were found to be highly enriched for proteins encoded by genes specifically expressed in HSCs relative to their differentiated progeny. Our data suggest a heretofore unappreciated similarity between regulatory modules in the brain and HSCs, offering a new strategy for novel gene discovery in both systems. PMID:23555280

  4. SSHscreen and SSHdb, generic software for microarray based gene discovery: application to the stress response in cowpea

    PubMed Central

    2010-01-01

    Background Suppression subtractive hybridization is a popular technique for gene discovery from non-model organisms without an annotated genome sequence, such as cowpea (Vigna unguiculata (L.) Walp). We aimed to use this method to enrich for genes expressed during drought stress in a drought tolerant cowpea line. However, current methods were inefficient in screening libraries and management of the sequence data, and thus there was a need to develop software tools to facilitate the process. Results Forward and reverse cDNA libraries enriched for cowpea drought response genes were screened on microarrays, and the R software package SSHscreen 2.0.1 was developed (i) to normalize the data effectively using spike-in control spot normalization, and (ii) to select clones for sequencing based on the calculation of enrichment ratios with associated statistics. Enrichment ratio 3 values for each clone showed that 62% of the forward library and 34% of the reverse library clones were significantly differentially expressed by drought stress (adjusted p value < 0.05). Enrichment ratio 2 calculations showed that > 88% of the clones in both libraries were derived from rare transcripts in the original tester samples, thus supporting the notion that suppression subtractive hybridization enriches for rare transcripts. A set of 118 clones were chosen for sequencing, and drought-induced cowpea genes were identified, the most interesting encoding a late embryogenesis abundant Lea5 protein, a glutathione S-transferase, a thaumatin, a universal stress protein, and a wound induced protein. A lipid transfer protein and several components of photosynthesis were down-regulated by the drought stress. Reverse transcriptase quantitative PCR confirmed the enrichment ratio values for the selected cowpea genes. SSHdb, a web-accessible database, was developed to manage the clone sequences and combine the SSHscreen data with sequence annotations derived from BLAST and Blast2GO. The self

  5. Discovery of Possible Gene Relationships through the Application of Self-Organizing Maps to DNA Microarray Databases

    PubMed Central

    Chavez-Alvarez, Rocio; Chavoya, Arturo; Mendez-Vazquez, Andres

    2014-01-01

    DNA microarrays and cell cycle synchronization experiments have made possible the study of the mechanisms of cell cycle regulation of Saccharomyces cerevisiae by simultaneously monitoring the expression levels of thousands of genes at specific time points. On the other hand, pattern recognition techniques can contribute to the analysis of such massive measurements, providing a model of gene expression level evolution through the cell cycle process. In this paper, we propose the use of one of such techniques –an unsupervised artificial neural network called a Self-Organizing Map (SOM)–which has been successfully applied to processes involving very noisy signals, classifying and organizing them, and assisting in the discovery of behavior patterns without requiring prior knowledge about the process under analysis. As a test bed for the use of SOMs in finding possible relationships among genes and their possible contribution in some biological processes, we selected 282 S. cerevisiae genes that have been shown through biological experiments to have an activity during the cell cycle. The expression level of these genes was analyzed in five of the most cited time series DNA microarray databases used in the study of the cell cycle of this organism. With the use of SOM, it was possible to find clusters of genes with similar behavior in the five databases along two cell cycles. This result suggested that some of these genes might be biologically related or might have a regulatory relationship, as was corroborated by comparing some of the clusters obtained with SOMs against a previously reported regulatory network that was generated using biological knowledge, such as protein-protein interactions, gene expression levels, metabolism dynamics, promoter binding, and modification, regulation and transport of proteins. The methodology described in this paper could be applied to the study of gene relationships of other biological processes in different organisms. PMID:24699245

  6. The discovery of eukaryotic genome design and its forgotten corollary--the postulate of gene regulation by nuclear RNA.

    PubMed

    Pederson, Thoru

    2009-07-01

    We now know that more of the DNA in eukaryotic cells is copied into RNA than previously had been thought. Many of these transcripts serve regulatory instead of template functions in gene readout. Some of these newly recognized RNAs come from regions of the genome that had heretofore been deemed "junk DNA," yet no one could answer the obvious question: if "junk," then why still around? Before memory fades, we should note that there were some reasonably well articulated ideas 30-40 years ago that anticipated these recent discoveries. It seems fitting to recall the prescience of those who first raised such unorthodoxy. They powerfully catalyzed progress. PMID:19567373

  7. Discovery of CTCF-Sensitive Cis-Spliced Fusion RNAs between Adjacent Genes in Human Prostate Cells

    PubMed Central

    Qin, Fujun; Song, Zhenguo; Babiceanu, Mihaela; Song, Yansu; Facemire, Loryn; Singh, Ritambhara; Adli, Mazhar; Li, Hui

    2015-01-01

    Genes or their encoded products are not expected to mingle with each other unless in some disease situations. In cancer, a frequent mechanism that can produce gene fusions is chromosomal rearrangement. However, recent discoveries of RNA trans-splicing and cis-splicing between adjacent genes (cis-SAGe) support for other mechanisms in generating fusion RNAs. In our transcriptome analyses of 28 prostate normal and cancer samples, 30% fusion RNAs on average are the transcripts that contain exons belonging to same-strand neighboring genes. These fusion RNAs may be the products of cis-SAGe, which was previously thought to be rare. To validate this finding and to better understand the phenomenon, we used LNCaP, a prostate cell line as a model, and identified 16 additional cis-SAGe events by silencing transcription factor CTCF and paired-end RNA sequencing. About half of the fusions are expressed at a significant level compared to their parental genes. Silencing one of the in-frame fusions resulted in reduced cell motility. Most out-of-frame fusions are likely to function as non-coding RNAs. The majority of the 16 fusions are also detected in other prostate cell lines, as well as in the 14 clinical prostate normal and cancer pairs. By studying the features associated with these fusions, we developed a set of rules: 1) the parental genes are same-strand-neighboring genes; 2) the distance between the genes is within 30kb; 3) the 5′ genes are actively transcribing; and 4) the chimeras tend to have the second-to-last exon in the 5′ genes joined to the second exon in the 3′ genes. We then randomly selected 20 neighboring genes in the genome, and detected four fusion events using these rules in prostate cancer and non-cancerous cells. These results suggest that splicing between neighboring gene transcripts is a rather frequent phenomenon, and it is not a feature unique to cancer cells. PMID:25658338

  8. Discovery of agents that eradicate leukemia stem cells using an in silico screen of public gene expression data

    PubMed Central

    Hassane, Duane C.; Guzman, Monica L.; Corbett, Cheryl; Li, Xiaojie; Abboud, Ramzi; Young, Fay; Liesveld, Jane L.; Carroll, Martin

    2008-01-01

    Increasing evidence indicates that malignant stem cells are important for the pathogenesis of acute myelogenous leukemia (AML) and represent a reservoir of cells that drive the development of AML and relapse. Therefore, new treatment regimens are necessary to prevent relapse and improve therapeutic outcomes. Previous studies have shown that the sesquiterpene lactone, parthenolide (PTL), ablates bulk, progenitor, and stem AML cells while causing no appreciable toxicity to normal hematopoietic cells. Thus, PTL must evoke cellular responses capable of mediating AML selective cell death. Given recent advances in chemical genomics such as gene expression-based high-throughput screening (GE-HTS) and the Connectivity Map, we hypothesized that the gene expression signature resulting from treatment of primary AML with PTL could be used to search for similar signatures in publicly available gene expression profiles deposited into the Gene Expression Omnibus (GEO). We therefore devised a broad in silico screen of the GEO database using the PTL gene expression signature as a template and discovered 2 new agents, celastrol and 4-hydroxy-2-nonenal, that effectively eradicate AML at the bulk, progenitor, and stem cell level. These findings suggest the use of multicenter collections of high-throughput data to facilitate discovery of leukemia drugs and drug targets. PMID:18305216

  9. The BDGP gene disruption project: Single transposon insertions associated with 40 percent of Drosophila genes

    SciTech Connect

    Bellen, Hugo J.; Levis, Robert W.; Liao, Guochun; He, Yuchun; Carlson, Joseph W.; Tsang, Garson; Evans-Holm, Martha; Hiesinger, P. Robin; Schulze, Karen L.; Rubin, Gerald M.; Hoskins, Roger A.; Spradling, Allan C.

    2004-01-13

    The Berkeley Drosophila Genome Project (BDGP) strives to disrupt each Drosophila gene by the insertion of a single transposable element. As part of this effort, transposons in more than 30,000 fly strains were localized and analyzed relative to predicted Drosophila gene structures. Approximately 6,300 lines that maximize genomic coverage were selected to be sent to the Bloomington Stock Center for public distribution, bringing the size of the BDGP gene disruption collection to 7,140 lines. It now includes individual lines predicted to disrupt 5,362 of the 13,666 currently annotated Drosophila genes (39 percent). Other lines contain an insertion at least 2 kb from others in the collection and likely mutate additional incompletely annotated or uncharacterized genes and chromosomal regulatory elements. The remaining strains contain insertions likely to disrupt alternative gene promoters or to allow gene mis-expression. The expanded BDGP gene disruption collection provides a public resource that will facilitate the application of Drosophila genetics to diverse biological problems. Finally, the project reveals new insight into how transposons interact with a eukaryotic genome and helps define optimal strategies for using insertional mutagenesis as a genomic tool.

  10. The BDGP gene disruption project: single transposon insertions associated with 40% of Drosophila genes.

    PubMed Central

    Bellen, Hugo J; Levis, Robert W; Liao, Guochun; He, Yuchun; Carlson, Joseph W; Tsang, Garson; Evans-Holm, Martha; Hiesinger, P Robin; Schulze, Karen L; Rubin, Gerald M; Hoskins, Roger A; Spradling, Allan C

    2004-01-01

    The Berkeley Drosophila Genome Project (BDGP) strives to disrupt each Drosophila gene by the insertion of a single transposable element. As part of this effort, transposons in >30,000 fly strains were localized and analyzed relative to predicted Drosophila gene structures. Approximately 6300 lines that maximize genomic coverage were selected to be sent to the Bloomington Stock Center for public distribution, bringing the size of the BDGP gene disruption collection to 7140 lines. It now includes individual lines predicted to disrupt 5362 of the 13,666 currently annotated Drosophila genes (39%). Other lines contain an insertion at least 2 kb from others in the collection and likely mutate additional incompletely annotated or uncharacterized genes and chromosomal regulatory elements. The remaining strains contain insertions likely to disrupt alternative gene promoters or to allow gene misexpression. The expanded BDGP gene disruption collection provides a public resource that will facilitate the application of Drosophila genetics to diverse biological problems. Finally, the project reveals new insight into how transposons interact with a eukaryotic genome and helps define optimal strategies for using insertional mutagenesis as a genomic tool. PMID:15238527

  11. A Hybrid Computational Method for the Discovery of Novel Reproduction-Related Genes

    PubMed Central

    Chen, Lei; Chu, Chen; Kong, Xiangyin; Huang, Guohua; Huang, Tao; Cai, Yu-Dong

    2015-01-01

    Uncovering the molecular mechanisms underlying reproduction is of great importance to infertility treatment and to the generation of healthy offspring. In this study, we discovered novel reproduction-related genes with a hybrid computational method, integrating three different types of method, which offered new clues for further reproduction research. This method was first executed on a weighted graph, constructed based on known protein-protein interactions, to search the shortest paths connecting any two known reproduction-related genes. Genes occurring in these paths were deemed to have a special relationship with reproduction. These newly discovered genes were filtered with a randomization test. Then, the remaining genes were further selected according to their associations with known reproduction-related genes measured by protein-protein interaction score and alignment score obtained by BLAST. The in-depth analysis of the high confidence novel reproduction genes revealed hidden mechanisms of reproduction and provided guidelines for further experimental validations. PMID:25768094

  12. Co-clustering phenome–genome for phenotype classification and disease gene discovery

    PubMed Central

    Hwang, TaeHyun; Atluri, Gowtham; Xie, MaoQiang; Dey, Sanjoy; Hong, Changjin; Kumar, Vipin; Kuang, Rui

    2012-01-01

    Understanding the categorization of human diseases is critical for reliably identifying disease causal genes. Recently, genome-wide studies of abnormal chromosomal locations related to diseases have mapped >2000 phenotype–gene relations, which provide valuable information for classifying diseases and identifying candidate genes as drug targets. In this article, a regularized non-negative matrix tri-factorization (R-NMTF) algorithm is introduced to co-cluster phenotypes and genes, and simultaneously detect associations between the detected phenotype clusters and gene clusters. The R-NMTF algorithm factorizes the phenotype–gene association matrix under the prior knowledge from phenotype similarity network and protein–protein interaction network, supervised by the label information from known disease classes and biological pathways. In the experiments on disease phenotype–gene associations in OMIM and KEGG disease pathways, R-NMTF significantly improved the classification of disease phenotypes and disease pathway genes compared with support vector machines and Label Propagation in cross-validation on the annotated phenotypes and genes. The newly predicted phenotypes in each disease class are highly consistent with human phenotype ontology annotations. The roles of the new member genes in the disease pathways are examined and validated in the protein–protein interaction subnetworks. Extensive literature review also confirmed many new members of the disease classes and pathways as well as the predicted associations between disease phenotype classes and pathways. PMID:22735708

  13. Targetfinder.org: a resource for systematic discovery of transcription factor target genes

    PubMed Central

    Kiełbasa, Szymon M.; Blüthgen, Nils; Fähling, Michael

    2010-01-01

    Targetfinder.org (http://targetfinder.org/) provides a web-based resource for finding genes that show a similar expression pattern to a group of user-selected genes. It is based on a large-scale gene expression compendium (>1200 experiments, >13 000 genes). The primary application of Targetfinder.org is to expand a list of known transcription factor targets by new candidate target genes. The user submits a group of genes (the ‘seed’), and as a result the web site provides a list of other genes ranked by similarity of their expression to the expression of the seed genes. Additionally, the web site provides information on a recovery/cross-validation test to check for consistency of the provided seed and the quality of the ranking. Furthermore, the web site allows to analyse affinities of a selected transcription factor to the promoter regions of the top-ranked genes in order to select the best new candidate target genes for further experimental analysis. PMID:20460454

  14. Discovery of core biotic stress responsive genes in Arabidopsis by weighted gene co-expression network analysis.

    PubMed

    Amrine, Katherine C H; Blanco-Ulate, Barbara; Cantu, Dario

    2015-01-01

    Intricate signal networks and transcriptional regulators translate the recognition of pathogens into defense responses. In this study, we carried out a gene co-expression analysis of all currently publicly available microarray data, which were generated in experiments that studied the interaction of the model plant Arabidopsis thaliana with microbial pathogens. This work was conducted to identify (i) modules of functionally related co-expressed genes that are differentially expressed in response to multiple biotic stresses, and (ii) hub genes that may function as core regulators of disease responses. Using Weighted Gene Co-expression Network Analysis (WGCNA) we constructed an undirected network leveraging a rich curated expression dataset comprising 272 microarrays that involved microbial infections of Arabidopsis plants with a wide array of fungal and bacterial pathogens with biotrophic, hemibiotrophic, and necrotrophic lifestyles. WGCNA produced a network with scale-free and small-world properties composed of 205 distinct clusters of co-expressed genes. Modules of functionally related co-expressed genes that are differentially regulated in response to multiple pathogens were identified by integrating differential gene expression testing with functional enrichment analyses of gene ontology terms, known disease associated genes, transcriptional regulators, and cis-regulatory elements. The significance of functional enrichments was validated by comparisons with randomly generated networks. Network topology was then analyzed to identify intra- and inter-modular gene hubs. Based on high connectivity, and centrality in meta-modules that are clearly enriched in defense responses, we propose a list of 66 target genes for reverse genetic experiments to further dissect the Arabidopsis immune system. Our results show that statistical-based data trimming prior to network analysis allows the integration of expression datasets generated by different groups, under different

  15. Discovery of Core Biotic Stress Responsive Genes in Arabidopsis by Weighted Gene Co-Expression Network Analysis

    PubMed Central

    Amrine, Katherine C. H.; Blanco-Ulate, Barbara; Cantu, Dario

    2015-01-01

    Intricate signal networks and transcriptional regulators translate the recognition of pathogens into defense responses. In this study, we carried out a gene co-expression analysis of all currently publicly available microarray data, which were generated in experiments that studied the interaction of the model plant Arabidopsis thaliana with microbial pathogens. This work was conducted to identify (i) modules of functionally related co-expressed genes that are differentially expressed in response to multiple biotic stresses, and (ii) hub genes that may function as core regulators of disease responses. Using Weighted Gene Co-expression Network Analysis (WGCNA) we constructed an undirected network leveraging a rich curated expression dataset comprising 272 microarrays that involved microbial infections of Arabidopsis plants with a wide array of fungal and bacterial pathogens with biotrophic, hemibiotrophic, and necrotrophic lifestyles. WGCNA produced a network with scale-free and small-world properties composed of 205 distinct clusters of co-expressed genes. Modules of functionally related co-expressed genes that are differentially regulated in response to multiple pathogens were identified by integrating differential gene expression testing with functional enrichment analyses of gene ontology terms, known disease associated genes, transcriptional regulators, and cis-regulatory elements. The significance of functional enrichments was validated by comparisons with randomly generated networks. Network topology was then analyzed to identify intra- and inter-modular gene hubs. Based on high connectivity, and centrality in meta-modules that are clearly enriched in defense responses, we propose a list of 66 target genes for reverse genetic experiments to further dissect the Arabidopsis immune system. Our results show that statistical-based data trimming prior to network analysis allows the integration of expression datasets generated by different groups, under different

  16. Discovery of germline-related genes in Cephalochordate amphioxus: A genome wide survey using genome annotation and transcriptome data.

    PubMed

    Yue, Jia-Xing; Li, Kun-Lung; Yu, Jr-Kai

    2015-12-01

    The generation of germline cells is a critical process in the reproduction of multicellular organisms. Studies in animal models have identified a common repertoire of genes that play essential roles in primordial germ cell (PGC) formation. However, comparative studies also indicate that the timing and regulation of this core genetic program vary considerably in different animals, raising the intriguing questions regarding the evolution of PGC developmental mechanisms in metazoans. Cephalochordates (commonly called amphioxus or lancelets) represent one of the invertebrate chordate groups and can provide important information about the evolution of developmental mechanisms in the chordate lineage. In this study, we used genome and transcriptome data to identify germline-related genes in two distantly related cephalochordate species, Branchiostoma floridae and Asymmetron lucayanum. Branchiostoma and Asymmetron diverged more than 120 MYA, and the most conspicuous difference between them is their gonadal morphology. We used important germline developmental genes in several model animals to search the amphioxus genome and transcriptome dataset for conserved homologs. We also annotated the assembled transcriptome data using Gene Ontology (GO) terms to facilitate the discovery of putative genes associated with germ cell development and reproductive functions in amphioxus. We further confirmed the expression of 14 genes in developing oocytes or mature eggs using whole mount in situ hybridization, suggesting their potential functions in amphioxus germ cell development. The results of this global survey provide a useful resource for testing potential functions of candidate germline-related genes in cephalochordates and for investigating differences in gonad developmental mechanisms between Branchiostoma and Asymmetron species. PMID:25847029

  17. Discovery of estrogen-responsive genes using an improved method which combines subtractive hybridization and PCR.

    PubMed Central

    Liu, W; Su, W; Roberts, T M

    1998-01-01

    Here we describe a reliable method for isolating genes that are differentially expressed in two cell populations. The method is a combination of subtractive hybridization and PCR. Among many improvements to previously described methods is the incorporation of a new technology into the procedure which sterilizes(inactivates) PCR amplicons, and thereby overcomes the limitation of similar procedures. To test this improved method, we conducted a search for estrogen-responsive genes. Estrogen-regulated genes dominated the subtracted libraries after four rounds of subtractive hybridizations. Four estrogen-regulated genes were identified from the initial screening. PMID:9671829

  18. IMG-ABC. A knowledge base to fuel discovery of biosynthetic gene clusters and novel secondary metabolites

    DOE PAGESBeta

    Hadjithomas, Michalis; Chen, I-Min Amy; Chu, Ken; Ratner, Anna; Palaniappan, Krishna; Szeto, Ernest; Huang, Jinghua; Reddy, T. B. K.; Cimermančič, Peter; Fischbach, Michael A.; et al

    2015-07-14

    In the discovery of secondary metabolites, analysis of sequence data is a promising exploration path that remains largely underutilized due to the lack of computational platforms that enable such a systematic approach on a large scale. In this work, we present IMG-ABC (https://img.jgi.doe.gov/abc), an atlas of biosynthetic gene clusters within the Integrated Microbial Genomes (IMG) system, which is aimed at harnessing the power of “big” genomic data for discovering small molecules. IMG-ABC relies on IMG’s comprehensive integrated structural and functional genomic data for the analysis of biosynthetic gene clusters (BCs) and associated secondary metabolites (SMs). SMs and BCs serve asmore » the two main classes of objects in IMG-ABC, each with a rich collection of attributes. A unique feature of IMG-ABC is the incorporation of both experimentally validated and computationally predicted BCs in genomes as well as metagenomes, thus identifying BCs in uncultured populations and rare taxa. We demonstrate the strength of IMG-ABC’s focused integrated analysis tools in enabling the exploration of microbial secondary metabolism on a global scale, through the discovery of phenazine-producing clusters for the first time in lphaproteobacteria. IMG-ABC strives to fill the long-existent void of resources for computational exploration of the secondary metabolism universe; its underlying scalable framework enables traversal of uncovered phylogenetic and chemical structure space, serving as a doorway to a new era in the discovery of novel molecules. IMG-ABC is the largest publicly available database of predicted and experimental biosynthetic gene clusters and the secondary metabolites they produce. The system also includes powerful search and analysis tools that are integrated with IMG’s extensive genomic/metagenomic data and analysis tool kits. As new research on biosynthetic gene clusters and secondary metabolites is published and more genomes are sequenced, IMG

  19. IMG-ABC. A knowledge base to fuel discovery of biosynthetic gene clusters and novel secondary metabolites

    SciTech Connect

    Hadjithomas, Michalis; Chen, I-Min Amy; Chu, Ken; Ratner, Anna; Palaniappan, Krishna; Szeto, Ernest; Huang, Jinghua; Reddy, T. B. K.; Cimermančič, Peter; Fischbach, Michael A.; Ivanova, Natalia N.; Markowitz, Victor M.; Kyrpides, Nikos C.; Pati, Amrita

    2015-07-14

    In the discovery of secondary metabolites, analysis of sequence data is a promising exploration path that remains largely underutilized due to the lack of computational platforms that enable such a systematic approach on a large scale. In this work, we present IMG-ABC (https://img.jgi.doe.gov/abc), an atlas of biosynthetic gene clusters within the Integrated Microbial Genomes (IMG) system, which is aimed at harnessing the power of “big” genomic data for discovering small molecules. IMG-ABC relies on IMG’s comprehensive integrated structural and functional genomic data for the analysis of biosynthetic gene clusters (BCs) and associated secondary metabolites (SMs). SMs and BCs serve as the two main classes of objects in IMG-ABC, each with a rich collection of attributes. A unique feature of IMG-ABC is the incorporation of both experimentally validated and computationally predicted BCs in genomes as well as metagenomes, thus identifying BCs in uncultured populations and rare taxa. We demonstrate the strength of IMG-ABC’s focused integrated analysis tools in enabling the exploration of microbial secondary metabolism on a global scale, through the discovery of phenazine-producing clusters for the first time in lphaproteobacteria. IMG-ABC strives to fill the long-existent void of resources for computational exploration of the secondary metabolism universe; its underlying scalable framework enables traversal of uncovered phylogenetic and chemical structure space, serving as a doorway to a new era in the discovery of novel molecules. IMG-ABC is the largest publicly available database of predicted and experimental biosynthetic gene clusters and the secondary metabolites they produce. The system also includes powerful search and analysis tools that are integrated with IMG’s extensive genomic/metagenomic data and analysis tool kits. As new research on biosynthetic gene clusters and secondary metabolites is published and more genomes are sequenced, IMG

  20. Discovery and Replication of Gene Influences on Brain Structure Using LASSO Regression

    PubMed Central

    Kohannim, Omid; Hibar, Derrek P.; Stein, Jason L.; Jahanshad, Neda; Hua, Xue; Rajagopalan, Priya; Toga, Arthur W.; Jack, Clifford R.; Weiner, Michael W.; de Zubicaray, Greig I.; McMahon, Katie L.; Hansell, Narelle K.; Martin, Nicholas G.; Wright, Margaret J.; Thompson, Paul M.

    2012-01-01

    We implemented least absolute shrinkage and selection operator (LASSO) regression to evaluate gene effects in genome-wide association studies (GWAS) of brain images, using an MRI-derived temporal lobe volume measure from 729 subjects scanned as part of the Alzheimer’s Disease Neuroimaging Initiative (ADNI). Sparse groups of SNPs in individual genes were selected by LASSO, which identifies efficient sets of variants influencing the data. These SNPs were considered jointly when assessing their association with neuroimaging measures. We discovered 22 genes that passed genome-wide significance for influencing temporal lobe volume. This was a substantially greater number of significant genes compared to those found with standard, univariate GWAS. These top genes are all expressed in the brain and include genes previously related to brain function or neuropsychiatric disorders such as MACROD2, SORCS2, GRIN2B, MAGI2, NPAS3, CLSTN2, GABRG3, NRXN3, PRKAG2, GAS7, RBFOX1, ADARB2, CHD4, and CDH13. The top genes we identified with this method also displayed significant and widespread post hoc effects on voxelwise, tensor-based morphometry (TBM) maps of the temporal lobes. The most significantly associated gene was an autism susceptibility gene known as MACROD2. We were able to successfully replicate the effect of the MACROD2 gene in an independent cohort of 564 young, Australian healthy adult twins and siblings scanned with MRI (mean age: 23.8 ± 2.2 SD years). Our approach powerfully complements univariate techniques in detecting influences of genes on the living brain. PMID:22888310

  1. The Utility of Next-Generation Sequencing in Gene Discovery for Mutation-Negative Patients with Rett Syndrome

    PubMed Central

    Gold, Wendy Anne; Christodoulou, John

    2015-01-01

    Rett syndrome (RTT) is a rare, severe disorder of neuronal plasticity that predominantly affects girls. Girls with RTT usually appear asymptomatic in the first 6–18 months of life, but gradually develop severe motor, cognitive, and behavioral abnormalities that persist for life. A predominance of neuronal and synaptic dysfunction, with altered excitatory–inhibitory neuronal synaptic transmission and synaptic plasticity, are overarching features of RTT in children and in mouse models. Over 90% of patients with classical RTT have mutations in the X-linked methyl-CpG-binding (MECP2) gene, while other genes, including cyclin-dependent kinase-like 5 (CDKL5), Forkhead box protein G1 (FOXG1), myocyte-specific enhancer factor 2C (MEF2C), and transcription factor 4 (TCF4), have been associated with phenotypes overlapping with RTT. However, there remain a proportion of patients who carry a clinical diagnosis of RTT, but who are mutation negative. In recent years, next-generation sequencing technologies have revolutionized approaches to genetic studies, making whole-exome and even whole-genome sequencing possible strategies for the detection of rare and de novo mutations, aiding the discovery of novel disease genes. Here, we review the recent progress that is emerging in identifying pathogenic variations, specifically from exome sequencing in RTT patients, and emphasize the need for the use of this technology to identify known and new disease genes in RTT patients. PMID:26236194

  2. Systematic discovery of novel ciliary genes through functional genomics in the zebrafish

    PubMed Central

    Choksi, Semil P.; Babu, Deepak; Lau, Doreen; Yu, Xianwen; Roy, Sudipto

    2014-01-01

    Cilia are microtubule-based hair-like organelles that play many important roles in development and physiology, and are implicated in a rapidly expanding spectrum of human diseases, collectively termed ciliopathies. Primary ciliary dyskinesia (PCD), one of the most prevalent of ciliopathies, arises from abnormalities in the differentiation or motility of the motile cilia. Despite their biomedical importance, a methodical functional screen for ciliary genes has not been carried out in any vertebrate at the organismal level. We sought to systematically discover novel motile cilia genes by identifying the genes induced by Foxj1, a winged-helix transcription factor that has an evolutionarily conserved role as the master regulator of motile cilia biogenesis. Unexpectedly, we find that the majority of the Foxj1-induced genes have not been associated with cilia before. To characterize these novel putative ciliary genes, we subjected 50 randomly selected candidates to a systematic functional phenotypic screen in zebrafish embryos. Remarkably, we find that over 60% are required for ciliary differentiation or function, whereas 30% of the proteins encoded by these genes localize to motile cilia. We also show that these genes regulate the proper differentiation and beating of motile cilia. This collection of Foxj1-induced genes will be invaluable for furthering our understanding of ciliary biology, and in the identification of new mutations underlying ciliary disorders in humans. PMID:25139857

  3. SNP discovery and marker development for disease resistance candidate genes in common carp (Cyprinus carpio)

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Single nucleotide polymorphisms (SNPs) in immune response genes have been reported as markers of susceptibility to infectious diseases in human and livestock. A disease caused by cyprinid herpes virus 3 (CyHV-3) is highly contagious and virulent in common carp. With the aim to investigate the gene...

  4. A multi-gene transcriptional profiling approach to the discovery of cell signature markers

    PubMed Central

    Wada, Youichiro; Li, Dan; Merley, Anne; Zukauskas, Andrew; Aird, William C.; Dvorak, Harold F.

    2010-01-01

    A profile of transcript abundances from multiple genes constitutes a molecular signature if the expression pattern is unique to one cell type. Here we measure mRNA copy numbers per cell by normalizing per million copies of 18S rRNA and identify 6 genes (TIE1, KDR, CDH5, TIE2, EFNA1 and MYO5C) out of 79 genes tested as excellent molecular signature markers for endothelial cells (ECs) in vitro. The selected genes are uniformly expressed in ECs of 4 different origins but weakly or not expressed in 4 non-EC cell lines. A multi-gene transcriptional profile of these 6 genes clearly distinguishes ECs from non-ECs in vitro. We conclude that (i) a profile of mRNA copy numbers per cell from a well-chosen multi-gene panel can act as a sensitive and accurate cell type signature marker, and (ii) the method described here can be applied to in vivo cell fingerprinting and molecular diagnosis. PMID:20972619

  5. A multi-gene transcriptional profiling approach to the discovery of cell signature markers.

    PubMed

    Wada, Youichiro; Li, Dan; Merley, Anne; Zukauskas, Andrew; Aird, William C; Dvorak, Harold F; Shih, Shou-Ching

    2011-01-01

    A profile of transcript abundances from multiple genes constitutes a molecular signature if the expression pattern is unique to one cell type. Here we measure mRNA copy numbers per cell by normalizing per million copies of 18S rRNA and identify 6 genes (TIE1, KDR, CDH5, TIE2, EFNA1 and MYO5C) out of 79 genes tested as excellent molecular signature markers for endothelial cells (ECs) in vitro. The selected genes are uniformly expressed in ECs of 4 different origins but weakly or not expressed in 4 non-EC cell lines. A multi-gene transcriptional profile of these 6 genes clearly distinguishes ECs from non-ECs in vitro. We conclude that (i) a profile of mRNA copy numbers per cell from a well-chosen multi-gene panel can act as a sensitive and accurate cell type signature marker, and (ii) the method described here can be applied to in vivo cell fingerprinting and molecular diagnosis. PMID:20972619

  6. Discovery of clubroot-resistant genes in Brassica napus by transcriptome sequencing.

    PubMed

    Chen, S W; Liu, T; Gao, Y; Zhang, C; Peng, S D; Bai, M B; Li, S J; Xu, L; Zhou, X Y; Lin, L B

    2016-01-01

    Clubroot significantly affects plants of the Brassicaceae family and is one of the main diseases causing serious losses in B. napus yield. Few studies have investigated the clubroot-resistance mechanism in B. napus. Identification of clubroot-resistant genes may be used in clubroot-resistant breeding, as well as to elucidate the molecular mechanism behind B. napus clubroot-resistance. We used three B. napus transcriptome samples to construct a transcriptome sequencing library by using Illumina HiSeq™ 2000 sequencing and bioinformatic analysis. In total, 171 million high-quality reads were obtained, containing 96,149 unigenes of N50-value. We aligned the obtained unigenes with the Nr, Swiss-Prot, clusters of orthologous groups, and gene ontology databases and annotated their functions. In the Kyoto encyclopedia of genes and genomes database, 25,033 unigenes (26.04%) were assigned to 124 pathways. Many genes, including broad-spectrum disease-resistance genes, specific clubroot-resistant genes, and genes related to indole-3-acetic acid (IAA) signal transduction, cytokinin synthesis, and myrosinase synthesis in the Huashuang 3 variety of B. napus were found to be related to clubroot-resistance. The effective clubroot-resistance observed in this variety may be due to the induced increased expression of these disease-resistant genes and strong inhibition of the IAA signal transduction, cytokinin synthesis, and myrosinase synthesis. The homology observed between unigenes 0048482, 0061770 and the Crr1 gene shared 94% nucleotide similarity. Furthermore, unigene 0061770 could have originated from an inversion of the Crr1 5'-end sequence. PMID:27525940

  7. Seed-based systematic discovery of specific transcription factor target genes.

    PubMed

    Mrowka, Ralf; Blüthgen, Nils; Fähling, Michael

    2008-06-01

    Reliable prediction of specific transcription factor target genes is a major challenge in systems biology and functional genomics. Current sequence-based methods yield many false predictions, due to the short and degenerated DNA-binding motifs. Here, we describe a new systematic genome-wide approach, the seed-distribution-distance method, that searches large-scale genome-wide expression data for genes that are similarly expressed as known targets. This method is used to identify genes that are likely targets, allowing sequence-based methods to focus on a subset of genes, giving rise to fewer false-positive predictions. We show by cross-validation that this method is robust in recovering specific target genes. Furthermore, this method identifies genes with typical functions and binding motifs of the seed. The method is illustrated by predicting novel targets of the transcription factor nuclear factor kappaB (NF-kappaB). Among the new targets is optineurin, which plays a key role in the pathogenesis of acquired blindness caused by adult-onset primary open-angle glaucoma. We show experimentally that the optineurin gene and other predicted genes are targets of NF-kappaB. Thus, our data provide a missing link in the signalling of NF-kappaB and the damping function of optineurin in signalling feedback of NF-kappaB. We present a robust and reliable method to enhance the genome-wide prediction of specific transcription factor target genes that exploits the vast amount of expression information available in public databases today. PMID:18485006

  8. Challenges in microarray class discovery: a comprehensive examination of normalization, gene selection and clustering

    PubMed Central

    2010-01-01

    Background Cluster analysis, and in particular hierarchical clustering, is widely used to extract information from gene expression data. The aim is to discover new classes, or sub-classes, of either individuals or genes. Performing a cluster analysis commonly involve decisions on how to; handle missing values, standardize the data and select genes. In addition, pre-processing, involving various types of filtration and normalization procedures, can have an effect on the ability to discover biologically relevant classes. Here we consider cluster analysis in a broad sense and perform a comprehensive evaluation that covers several aspects of cluster analyses, including normalization. Result We evaluated 2780 cluster analysis methods on seven publicly available 2-channel microarray data sets with common reference designs. Each cluster analysis method differed in data normalization (5 normalizations were considered), missing value imputation (2), standardization of data (2), gene selection (19) or clustering method (11). The cluster analyses are evaluated using known classes, such as cancer types, and the adjusted Rand index. The performances of the different analyses vary between the data sets and it is difficult to give general recommendations. However, normalization, gene selection and clustering method are all variables that have a significant impact on the performance. In particular, gene selection is important and it is generally necessary to include a relatively large number of genes in order to get good performance. Selecting genes with high standard deviation or using principal component analysis are shown to be the preferred gene selection methods. Hierarchical clustering using Ward's method, k-means clustering and Mclust are the clustering methods considered in this paper that achieves the highest adjusted Rand. Normalization can have a significant positive impact on the ability to cluster individuals, and there are indications that background correction is

  9. Multiplex gene expression analysis for high-throughput drug discovery: screening and analysis of compounds affecting genes overexpressed in cancer cells.

    PubMed

    Johnson, Paul H; Walker, Roger P; Jones, Steven W; Stephens, Kathy; Meurer, Janet; Zajchowski, Deborah A; Luke, May M; Eeckman, Frank; Tan, Yuping; Wong, Linda; Parry, Gordon; Morgan, Thomas K; McCarrick, Meg A; Monforte, Joseph

    2002-12-01

    Drug discovery strategies are needed that can rapidly exploit multiple therapeutic targets associated with the complex gene expression changes that characterize a polygenic disease such as cancer. We report a new cell-based high-throughput technology for screening chemical libraries against several potential cancer target genes in parallel. Multiplex gene expression (MGE) analysis provides direct and quantitative measurement of multiple endogenous mRNAs using a multiplexed detection system coupled to reverse transcription-PCR. A multiplex assay for six genes overexpressed in cancer cells was used to screen 9000 chemicals and known drugs in the human prostate cancer cell line PC-3. Active compounds that modulated gene expression levels were identified, and IC50 values were determined for compounds that bind DNA, cell surface receptors, and components of intracellular signaling pathways. A class of steroids related to the cardiac glycosides was identified that potently inhibited the plasma membrane Na(+)K(+)-ATPase resulting in the inhibition of four of the prostate target genes including transcription factors Hoxb-13, hPSE/PDEF, hepatocyte nuclear factor-3alpha, and the inhibitor of apoptosis, survivin. Representative compounds selectively induced apoptosis in PC-3 cells compared with the nonmetastatic cell line BPH-1. The multiplex assay distinguished potencies among structural variants, enabling structure-activity analysis suitable for chemical optimization studies. A second multiplex assay for five toxicological markers, Hsp70, Gadd153, Gadd45, O6-methylguanine-DNA methyltransferase, and cyclophilin, detected compounds that caused DNA damage and cellular stress and was a more sensitive and specific indicator of potential toxicity than measurement of cell viability. MGE analysis facilitates rapid drug screening and compound optimization, the simultaneous measurement of toxicological end points, and gene function analysis. PMID:12516962

  10. Biochemical genomics for gene discovery in benzylisoquinoline alkaloid biosynthesis in opium poppy and related species.

    PubMed

    Dang, Thu Thuy T; Onoyovwi, Akpevwe; Farrow, Scott C; Facchini, Peter J

    2012-01-01

    Benzylisoquinoline alkaloids (BIAs) are a large, diverse group of ∼2500 specialized plant metabolites. Many BIAs display potent pharmacological activities, including the narcotic analgesics codeine and morphine, the vasodilator papaverine, the cough suppressant and potential anticancer drug noscapine, the antimicrobial agents sanguinarine and berberine, and the muscle relaxant (+)-tubocurarine. Opium poppy remains the sole commercial source for codeine, morphine, and a variety of semisynthetic drugs, including oxycodone and buprenorphine, derived primarily from the biosynthetic pathway intermediate thebaine. Recent advances in transcriptomics, proteomics, and metabolomics have created unprecedented opportunities for isolating and characterizing novel BIA biosynthetic genes. Here, we describe the application of next-generation sequencing and cDNA microarrays for selecting gene candidates based on comparative transcriptome analysis. We outline the basic mass spectrometric techniques to perform deep proteome and targeted metabolite analyses on BIA-producing plant tissues and provide methodologies for functionally characterizing biosynthetic gene candidates through in vitro enzyme assays and transient gene silencing in planta. PMID:22999177

  11. Discovery and validation of gene classifiers for endocrine-disrupting chemicals in zebrafish (Danio rerio)

    EPA Science Inventory

    Development and application of transcriptomics-based gene classifiers for ecotoxicological applications lag far behind those of human biomedical science. Many such classifiers discovered thus far lack vigorous statistical and experimental validations, with their stability and rel...

  12. Discovery of single-gene inborn errors of immunity by next generation sequencing

    PubMed Central

    Conley, Mary Ellen; Casanova, Jean-Laurent

    2014-01-01

    Many patients with clinical and laboratory evidence of primary immunodeficiency do not have a gene specific diagnosis. The use of next generation sequencing, particularly whole exome sequencing, has given us an extraordinarily powerful tool to identify the disease-causing genes in some of these patients. At least 34 new gene defects have been identified in the last 4 years. These findings document the striking heterogeneity of the phenotype in patients with mutations in the same gene. In some cases this can be attributed to loss-of-function mutations in some patients, but gain-of-function mutations in others. In addition, the surprisingly high frequency of autosomal dominant immunodeficiencies with variable penetrance, and de novo mutations in disorders with a severe phenotype has been unmasked. PMID:24886697

  13. Discovery of diversity in xylan biosynthetic genes by transcriptional profiling of a heteroxylan containing mucilaginous tissue

    PubMed Central

    Jensen, Jacob K.; Johnson, Nathan; Wilkerson, Curtis G.

    2013-01-01

    The exact biochemical steps of xylan backbone synthesis remain elusive. In Arabidopsis, three non-redundant genes from two glycosyltransferase (GT) families, IRX9 and IRX14 from GT43 and IRX10 from GT47, are candidates for forming the xylan backbone. In other plants, evidence exists that different tissues express these three genes at widely different levels, which suggests that diversity in the makeup of the xylan synthase complex exists. Recently we have profiled the transcripts present in the developing mucilaginous tissue of psyllium (Plantago ovata Forsk). This tissue was found to have high expression levels of an IRX10 homolog, but very low levels of the two GT43 family members. This contrasts with recent wheat endosperm tissue profiling that found a relatively high abundance of the GT43 family members. We have performed an in-depth analysis of all GTs genes expressed in four developmental stages of the psyllium mucilagenous layer and in a single stage of the psyllium stem using RNA-Seq. This analysis revealed several IRX10 homologs, an expansion in GT61 (homologs of At3g18170/At3g18180), and several GTs from other GT families that are highly abundant and specifically expressed in the mucilaginous tissue. Our current hypothesis is that the four IRX10 genes present in the mucilagenous tissues have evolved to function without the GT43 genes. These four genes represent some of the most divergent IRX10 genes identified to date. Conversely, those present in the psyllium stem are very similar to those in other eudicots. This suggests these genes are under selective pressure, likely due to the synthesis of the various xylan structures present in mucilage that has a different biochemical role than that present in secondary walls. The numerous GT61 family members also show a wide sequence diversity and may be responsible for the larger number of side chain structures present in the psyllium mucilage. PMID:23761806

  14. Discovery of diversity in xylan biosynthetic genes by transcriptional profiling of a heteroxylan containing mucilaginous tissue.

    PubMed

    Jensen, Jacob K; Johnson, Nathan; Wilkerson, Curtis G

    2013-01-01

    The exact biochemical steps of xylan backbone synthesis remain elusive. In Arabidopsis, three non-redundant genes from two glycosyltransferase (GT) families, IRX9 and IRX14 from GT43 and IRX10 from GT47, are candidates for forming the xylan backbone. In other plants, evidence exists that different tissues express these three genes at widely different levels, which suggests that diversity in the makeup of the xylan synthase complex exists. Recently we have profiled the transcripts present in the developing mucilaginous tissue of psyllium (Plantago ovata Forsk). This tissue was found to have high expression levels of an IRX10 homolog, but very low levels of the two GT43 family members. This contrasts with recent wheat endosperm tissue profiling that found a relatively high abundance of the GT43 family members. We have performed an in-depth analysis of all GTs genes expressed in four developmental stages of the psyllium mucilagenous layer and in a single stage of the psyllium stem using RNA-Seq. This analysis revealed several IRX10 homologs, an expansion in GT61 (homologs of At3g18170/At3g18180), and several GTs from other GT families that are highly abundant and specifically expressed in the mucilaginous tissue. Our current hypothesis is that the four IRX10 genes present in the mucilagenous tissues have evolved to function without the GT43 genes. These four genes represent some of the most divergent IRX10 genes identified to date. Conversely, those present in the psyllium stem are very similar to those in other eudicots. This suggests these genes are under selective pressure, likely due to the synthesis of the various xylan structures present in mucilage that has a different biochemical role than that present in secondary walls. The numerous GT61 family members also show a wide sequence diversity and may be responsible for the larger number of side chain structures present in the psyllium mucilage. PMID:23761806

  15. G-SESAME: web tools for GO-term-based gene similarity analysis and knowledge discovery

    PubMed Central

    Du, Zhidian; Li, Lin; Chen, Chin-Fu; Yu, Philip S.; Wang, James Z.

    2009-01-01

    We have developed a set of online tools for measuring the semantic similarities of Gene Ontology (GO) terms and the functional similarities of gene products, and for further discovering biomedical knowledge from the GO database. The tools have been used for about 6.9 million times by 417 institutions from 43 countries since October 2006. The online tools are available at: http://bioinformatics.clemson.edu/G-SESAME. PMID:19491312

  16. Genome-wide discovery of Pax7 target genes during development.

    PubMed

    White, Robert B; Ziman, Melanie R

    2008-03-14

    Pax7 plays critical roles in development of brain, spinal cord, neural crest, and skeletal muscle. As a sequence-specific DNA-binding transcription factor, any direct functional role played by Pax7 during development is mediated through target gene selection. Thus, we have sought to identify genes targeted by Pax7 during embryonic development using an unbiased chromatin immunoprecipitation (ChIP) cloning assay to isolate cis-regulatory regions bound by Pax7 in vivo. Sequencing and genomic localization of a library of chromatin-DNA fragments bound by Pax7 has identified 34 candidate Pax7 target genes, with occupancy of a selection confirmed with independent chromatin enrichment tests (ChIP-PCR). To assess the capacity of Pax7 to regulate transcription from these loci, we have cloned alternate transcripts of Pax7 (differing significantly in their DNA binding domain) into expression vectors and transfected cultured cells with these constructs, then analyzed target gene expression levels using RT-PCR. We show that Pax7 directly occupies sites within genes encoding transcription factors Gbx1 and Eya4, the neurogenic cytokine receptor ciliary neurotrophic factor receptor, the neuronal potassium channel Kcnk2, and the signal transduction kinase Camk1d in vivo and regulates the transcriptional state of these genes in cultured cells. This analysis gives us greater insight into the direct functional role played by Pax7 during embryonic development. PMID:18198279

  17. Discovery and characterization of two novel salt-tolerance genes in Puccinellia tenuiflora.

    PubMed

    Li, Ying; Takano, Tetsuo; Liu, Shenkui

    2014-01-01

    Puccinellia tenuiflora is a monocotyledonous halophyte that is able to survive in extreme saline soil environments at an alkaline pH range of 9-10. In this study, we transformed full-length cDNAs of P. tenuiflora into Saccharomyces cerevisiae by using the full-length cDNA over-expressing gene-hunting system to identify novel salt-tolerance genes. In all, 32 yeast clones overexpressing P. tenuiflora cDNA were obtained by screening under NaCl stress conditions; of these, 31 clones showed stronger tolerance to NaCl and were amplified using polymerase chain reaction (PCR) and sequenced. Four novel genes encoding proteins with unknown function were identified; these genes had no homology with genes from higher plants. Of the four isolated genes, two that encoded proteins with two transmembrane domains showed the strongest resistance to 1.3 M NaCl. RT-PCR and northern blot analysis of P. tenuiflora cultured cells confirmed the endogenous NaCl-induced expression of the two proteins. Both of the proteins conferred better tolerance in yeasts to high salt, alkaline and osmotic conditions, some heavy metals and H2O2 stress. Thus, we inferred that the two novel proteins might alleviate oxidative and other stresses in P. tenuiflora. PMID:25238412

  18. Novel cell lines promote the discovery of genes involved in early heart development.

    PubMed

    Brunskill, E W; Witte, D P; Yutzey, K E; Potter, S S

    2001-07-15

    Clonal cell lines representing early cardiomyocytes would provide valuable reagents for the dissection of the genetic program of early cardiogenesis. Here we describe the establishment and characterization of cell lines from the hearts of transgenic mice and embryos with SV40 large T antigen expressed in the heart-forming region. Ultrastructure analysis by transmission electron microscopy showed the primitive, precontractile nature of the resulting cells, with the absence of myofilaments, Z lines, and intercalated disks. Immunohistochemistry, RT-PCR, Northern blots, and oligonucleotide microarrays were used to determine the expression levels of thousands of genes in the 1H and ECL-2 cell lines. The resulting gene-expression profiles showed the transcription of early cardiomyocyte genes such as Nkx2.5, GATA4, Tbx5, dHAND, cardiac troponin C, and SM22-alpha. Furthermore, many genes not previously implicated in early cardiac development were expressed. Two of these genes, Hic-5, a possible negative regulator of muscle differentiation, and the transcription enhancing factor TEF-5 were selected and shown by in situ hybridizations to be expressed in the early developing heart. The results show that the 1H and ECL-2 cell lines can be used to discover novel genes expressed in the early cardiomyocyte. PMID:11437454

  19. De Novo Assembly of Auricularia polytricha Transcriptome Using Illumina Sequencing for Gene Discovery and SSR Marker Identification

    PubMed Central

    Zhou, Yan; Chen, Lianfu; Fan, Xiuzhi; Bian, Yinbing

    2014-01-01

    Auricularia polytricha (Mont.) Sacc., a type of edible black-brown mushroom with a gelatinous and modality-specific fruiting body, is in high demand in Asia due to its nutritional and medicinal properties. Illumina Solexa sequenceing technology was used to generate very large transcript sequences from the mycelium and the mature fruiting body of A. polytricha for gene discovery and molecular marker development. De novo assembly generated 36,483 ESTs with an N50 length of 636 bp. A total of 28,108 ESTs demonstrated significant hits with known proteins in the nr database, and 94.03% of the annotated ESTs showed the greatest similarity to A. delicata, a related species of A. polytricha. Functional categorization of the Gene Ontology (GO), Clusters of Orthologous Groups (COG) and Kyoto Encyclopedia of Genes and Genomes (KEGG) metabolic pathways revealed the conservation of genes involved in various biological processes in A. polytricha. Gene expression profile analysis indicated that a total of 2,057 ESTs were differentially expressed, including 1,020 ESTs that were up-regulated in the mycelium and 1,037 up-regulated in the fruiting body. Functional enrichment showed that the ESTs associated with biosynthesis, metabolism and assembly of proteins were more active in fruiting body development. The expression patterns of homologous transcription factors indicated that the molecular mechanisms of fruiting body formation and development were not exactly the same as for other agarics. Interestingly, an EST encoding tyrosinase was significantly up-regulated in the fruiting body, indicating that melanins accumulated during the processes of the formation of the black-brown color of the fruiting body in A. polytricha development. In addition, a total of 1,715 potential SSRs were detected in this transcriptome. The transcriptome analysis of A. polytricha provides valuable sequence resources and numerous molecular markers to facilitate further functional genomics studies and

  20. Discovery of Pituitary Adenylate Cyclase-Activating Polypeptide-Regulated Genes through Microarray Analyses in Cell Culture and In Vivo

    PubMed Central

    Eiden, Lee E.; Samal, Babru; Gerdin, Matthew J.; Mustafa, Tomris; Vaudry, David; Stroth, Nikolas

    2010-01-01

    Pituitary adenylate cyclase-activating polypeptide (PACAP) is an evolutionarily well conserved neuropeptide with multiple functions in the nervous, endocrine, and immune systems. PACAP provides neuroprotection from ischemia and toxin exposure, is anti-inflammatory in gastric inflammatory disease and sepsis, controls proliferative signaling pathways involved in neural cell transformation, and modulates glucohomeostasis. PACAP-based, disease-targeted therapeutics might thus be both effective and benign, enhancing homeostatic responses to behavioral, metabolic, oncogenic, and inflammatory stressors. PACAP signal transduction employs synergistic regulation of calcium and cyclic adenosine monophosphate (cAMP), and noncanonical activation of both calcium- and cAMP-dependent processes. Pharmacological activation of PACAP signaling should consequently have highly specific effects even in vivo. Here, a combined cellular biochemical, pharmacologic, transcriptomic, and bioinformatic approach to understanding PACAP signal transduction by identifying PACAP target genes with oligonucleotide- and cDNA-based microarray is described. Calcium- and cAMP-dependent PACAP signaling pathways for regulation of genes encoding proteins required for neuritogenesis, changes in cell morphology, and cell survival have been traced in PC12 cells. Pharmacological experiments have linked gene expression to cell physiological responses in this system, in which gene silencing can also be employed to confirm the functional significance of induction of specific transcripts. Differential transcriptional responses to metabolic, ischemic, and other stressors in wild type compared to PACAP-deficient mice establish in principle which PACAP-responsive transcripts in culture are PACAP-dependent in vivo. Bioinformatic approaches aid in creating a pipeline for identifying neuropeptide-regulated genes, validating their cellular functions, and defining their expression in the context of neuropeptide signaling

  1. Transcriptome Analysis and Discovery of Genes Involved in Immune Pathways from Hepatopancreas of Microbial Challenged Mitten Crab Eriocheir sinensis

    PubMed Central

    Li, Xihong; Cui, Zhaoxia; Liu, Yuan; Song, Chengwen; Shi, Guohui

    2013-01-01

    Background The Chinese mitten crab Eriocheir sinensis is an important economic crustacean and has been seriously attacked by various diseases, which requires more and more information for immune relevant genes on genome background. Recently, high-throughput RNA sequencing (RNA-seq) technology provides a powerful and efficient method for transcript analysis and immune gene discovery. Methods/Principal Findings A cDNA library from hepatopancreas of E. sinensis challenged by a mixture of three pathogen strains (Gram-positive bacteria Micrococcus luteus, Gram-negative bacteria Vibrio alginolyticus and fungi Pichia pastoris; 108 cfu·mL−1) was constructed and randomly sequenced using Illumina technique. Totally 39.76 million clean reads were assembled to 70,300 unigenes. After ruling out short-length and low-quality sequences, 52,074 non-redundant unigenes were compared to public databases for homology searching and 17,617 of them showed high similarity to sequences in NCBI non-redundant protein (Nr) database. For function classification and pathway assignment, 18,734 (36.00%) unigenes were categorized to three Gene Ontology (GO) categories, 12,243 (23.51%) were classified to 25 Clusters of Orthologous Groups (COG), and 8,983 (17.25%) were assigned to six Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Potentially, 24, 14, 47 and 132 unigenes were characterized to be involved in Toll, IMD, JAK-STAT and MAPK pathways, respectively. Conclusions/Significance This is the first systematical transcriptome analysis of components relating to innate immune pathways in E. sinensis. Functional genes and putative pathways identified here will contribute to better understand immune system and prevent various diseases in crab. PMID:23874555

  2. DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes.

    PubMed

    Piñero, Janet; Queralt-Rosinach, Núria; Bravo, Àlex; Deu-Pons, Jordi; Bauer-Mehren, Anna; Baron, Martin; Sanz, Ferran; Furlong, Laura I

    2015-01-01

    DisGeNET is a comprehensive discovery platform designed to address a variety of questions concerning the genetic underpinning of human diseases. DisGeNET contains over 380,000 associations between >16,000 genes and 13,000 diseases, which makes it one of the largest repositories currently available of its kind. DisGeNET integrates expert-curated databases with text-mined data, covers information on Mendelian and complex diseases, and includes data from animal disease models. It features a score based on the supporting evidence to prioritize gene-disease associations. It is an open access resource available through a web interface, a Cytoscape plugin and as a Semantic Web resource. The web interface supports user-friendly data exploration and navigation. DisGeNET data can also be analysed via the DisGeNET Cytoscape plugin, and enriched with the annotations of other plugins of this popular network analysis software suite. Finally, the information contained in DisGeNET can be expanded and complemented using Semantic Web technologies and linked to a variety of resources already present in the Linked Data cloud. Hence, DisGeNET offers one of the most comprehensive collections of human gene-disease associations and a valuable set of tools for investigating the molecular mechanisms underlying diseases of genetic origin, designed to fulfill the needs of different user profiles, including bioinformaticians, biologists and health-care practitioners. Database URL: http://www.disgenet.org/ PMID:25877637

  3. DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes

    PubMed Central

    Piñero, Janet; Queralt-Rosinach, Núria; Bravo, Àlex; Deu-Pons, Jordi; Bauer-Mehren, Anna; Baron, Martin; Sanz, Ferran; Furlong, Laura I.

    2015-01-01

    DisGeNET is a comprehensive discovery platform designed to address a variety of questions concerning the genetic underpinning of human diseases. DisGeNET contains over 380 000 associations between >16 000 genes and 13 000 diseases, which makes it one of the largest repositories currently available of its kind. DisGeNET integrates expert-curated databases with text-mined data, covers information on Mendelian and complex diseases, and includes data from animal disease models. It features a score based on the supporting evidence to prioritize gene-disease associations. It is an open access resource available through a web interface, a Cytoscape plugin and as a Semantic Web resource. The web interface supports user-friendly data exploration and navigation. DisGeNET data can also be analysed via the DisGeNET Cytoscape plugin, and enriched with the annotations of other plugins of this popular network analysis software suite. Finally, the information contained in DisGeNET can be expanded and complemented using Semantic Web technologies and linked to a variety of resources already present in the Linked Data cloud. Hence, DisGeNET offers one of the most comprehensive collections of human gene-disease associations and a valuable set of tools for investigating the molecular mechanisms underlying diseases of genetic origin, designed to fulfill the needs of different user profiles, including bioinformaticians, biologists and health-care practitioners. Database URL: http://www.disgenet.org/ PMID:25877637

  4. Discovery and characterization of nutritionally regulated genes associated with muscle growth in Atlantic salmon.

    PubMed

    Bower, Neil I; Johnston, Ian A

    2010-10-01

    A genomics approach was used to identify nutritionally regulated genes involved in growth of fast skeletal muscle in Atlantic salmon (Salmo salar L.). Forward and reverse subtractive cDNA libraries were prepared comparing fish with zero growth rates to fish growing rapidly. We produced 7,420 ESTs and assembled them into nonredundant clusters prior to annotation. Contigs representing 40 potentially unrecognized nutritionally responsive candidate genes were identified. Twenty-three of the subtractive library candidates were also differentially regulated by nutritional state in an independent fasting-refeeding experiment and their expression placed in the context of 26 genes with established roles in muscle growth regulation. The expression of these genes was also determined during the maturation of a primary myocyte culture, identifying 13 candidates from the subtractive cDNA libraries with putative roles in the myogenic program. During early stages of refeeding DNAJA4, HSPA1B, HSP90A, and CHAC1 expression increased, indicating activation of unfolded protein response pathways. Four genes were considered inhibitory to myogenesis based on their in vivo and in vitro expression profiles (CEBPD, ASB2, HSP30, novel transcript GE623928). Other genes showed increased expression with feeding and highest in vitro expression during the proliferative phase of the culture (FOXD1, DRG1) or as cells differentiated (SMYD1, RTN1, MID1IP1, HSP90A, novel transcript GE617747). The genes identified were associated with chromatin modification (SMYD1, RTN1), microtubule stabilization (MID1IP1), cell cycle regulation (FOXD1, CEBPD, DRG1), and negative regulation of signaling (ASB2) and may play a role in the stimulation of myogenesis during the transition from a catabolic to anabolic state in skeletal muscle. PMID:20663983

  5. Discovery of functional non-coding conserved regions in the α-synuclein gene locus

    PubMed Central

    Sterling, Lori; Walter, Michael; Ting, Dennis; Schüle, Birgitt

    2014-01-01

    Several single nucleotide polymorphisms (SNPs) and the Rep-1 microsatellite marker of the α-synuclein ( SNCA) gene have consistently been shown to be associated with Parkinson’s disease, but the functional relevance is unclear. Based on these findings we hypothesized that conserved cis-regulatory elements in the SNCA genomic region regulate expression of SNCA, and that SNPs in these regions could be functionally modulating the expression of SNCA, thus contributing to neuronal demise and predisposing to Parkinson’s disease. In a pair-wise comparison of a 206kb genomic region encompassing the SNCA gene, we revealed 34 evolutionary conserved DNA sequences between human and mouse. All elements were cloned into reporter vectors and assessed for expression modulation in dual luciferase reporter assays.  We found that 12 out of 34 elements exhibited either an enhancement or reduction of the expression of the reporter gene. Three elements upstream of the SNCA gene displayed an approximately 1.5 fold (p<0.009) increase in expression. Of the intronic regions, three showed a 1.5 fold increase and two others indicated a 2 and 2.5 fold increase in expression (p<0.002). Three elements downstream of the SNCA gene showed 1.5 fold and 2.5 fold increase (p<0.0009). One element downstream of SNCA had a reduced expression of the reporter gene of 0.35 fold (p<0.0009) of normal activity. Our results demonstrate that the SNCA gene contains cis-regulatory regions that might regulate the transcription and expression of SNCA. Further studies in disease-relevant tissue types will be important to understand the functional impact of regulatory regions and specific Parkinson’s disease-associated SNPs and its function in the disease process. PMID:25566351

  6. Drosophila and Caenorhabditis elegans as Discovery Platforms for Genes Involved in Human Alcohol Use Disorder

    PubMed Central

    Grotewiel, Mike; Bettinger, Jill C.

    2015-01-01

    Background Despite the profound clinical significance and strong heritability of alcohol use disorder (AUD), we do not yet have a comprehensive understanding of the naturally occurring genetic variance within the human genome that drives its development. This lack of understanding is likely to be due in part to the large phenotypic and genetic heterogeneities that underlie human AUD. As a complement to genetic studies in humans, many laboratories are using the invertebrate model organisms (iMOs) Drosophila melanogaster (fruit fly) and Caenorhabditis elegans (nematode worm) to identify genetic mechanisms that influence the effects of alcohol (ethanol) on behavior. While these extremely powerful models have identified many genes that influence the behavioral responses to alcohol, in most cases it has remained unclear whether results from behavioral–genetic studies in iMOs are directly applicable to understanding the genetic basis of human AUD. Methods In this review, we critically evaluate the utility of the fly and worm models for identifying genes that influence AUD in humans. Results Based on results published through early 2015, studies in flies and worms have identified 91 and 50 genes, respectively, that influence 1 or more aspects of behavioral responses to alcohol. Collectively, these fly and worm genes correspond to 293 orthologous genes in humans. Intriguingly, 51 of these 293 human genes have been implicated in AUD by at least 1 study in human populations. Conclusions Our analyses strongly suggest that the Drosophila and C. elegans models have considerable utility for identifying orthologs of genes that influence human AUD. PMID:26173477

  7. Target genes discovery through copy number alteration analysis in human hepatocellular carcinoma.

    PubMed

    Gu, De-Leung; Chen, Yen-Hsieh; Shih, Jou-Ho; Lin, Chi-Hung; Jou, Yuh-Shan; Chen, Chian-Feng

    2013-12-21

    High-throughput short-read sequencing of exomes and whole cancer genomes in multiple human hepatocellular carcinoma (HCC) cohorts confirmed previously identified frequently mutated somatic genes, such as TP53, CTNNB1 and AXIN1, and identified several novel genes with moderate mutation frequencies, including ARID1A, ARID2, MLL, MLL2, MLL3, MLL4, IRF2, ATM, CDKN2A, FGF19, PIK3CA, RPS6KA3, JAK1, KEAP1, NFE2L2, C16orf62, LEPR, RAC2, and IL6ST. Functional classification of these mutated genes suggested that alterations in pathways participating in chromatin remodeling, Wnt/β-catenin signaling, JAK/STAT signaling, and oxidative stress play critical roles in HCC tumorigenesis. Nevertheless, because there are few druggable genes used in HCC therapy, the identification of new therapeutic targets through integrated genomic approaches remains an important task. Because a large amount of HCC genomic data genotyped by high density single nucleotide polymorphism arrays is deposited in the public domain, copy number alteration (CNA) analyses of these arrays is a cost-effective way to reveal target genes through profiling of recurrent and overlapping amplicons, homozygous deletions and potentially unbalanced chromosomal translocations accumulated during HCC progression. Moreover, integration of CNAs with other high-throughput genomic data, such as aberrantly coding transcriptomes and non-coding gene expression in human HCC tissues and rodent HCC models, provides lines of evidence that can be used to facilitate the identification of novel HCC target genes with the potential of improving the survival of HCC patients. PMID:24379610

  8. Essential Gene Discovery in the Basidiomycete Cryptococcus neoformans for Antifungal Drug Target Prioritization

    PubMed Central

    Ianiri, Giuseppe

    2015-01-01

    ABSTRACT Fungal diseases represent a major burden to health care globally. As with other pathogenic microbes, there is a limited number of agents suitable for use in treating fungal diseases, and resistance to these agents can develop rapidly. Cryptococcus neoformans is a basidiomycete fungus that causes cryptococcosis worldwide in both immunocompromised and healthy individuals. As a basidiomycete, it diverged from other common pathogenic or model ascomycete fungi more than 500 million years ago. Here, we report C. neoformans genes that are essential for viability as identified through forward and reverse genetic approaches, using an engineered diploid strain and genetic segregation after meiosis. The forward genetic approach generated random insertional mutants in the diploid strain, the induction of meiosis and sporulation, and selection for haploid cells with counterselection of the insertion event. More than 2,500 mutants were analyzed, and transfer DNA (T-DNA) insertions in several genes required for viability were identified. The genes include those encoding the thioredoxin reductase (Trr1), a ribosome assembly factor (Rsa4), an mRNA-capping component (Cet1), and others. For targeted gene replacement, the C. neoformans homologs of 35 genes required for viability in ascomycete fungi were disrupted, meiosis and sporulation were induced, and haploid progeny were evaluated for their ability to grow on selective media. Twenty-one (60%) were found to be required for viability in C. neoformans. These genes are involved in mitochondrial translation, ergosterol biosynthesis, and RNA-related functions. The heterozygous diploid mutants were evaluated for haploinsufficiency on a number of perturbing agents and drugs, revealing phenotypes due to the loss of one copy of an essential gene in C. neoformans. This study expands the knowledge of the essential genes in fungi using a basidiomycete as a model organism. Genes that have no mammalian homologs and are essential

  9. Use of model organism and disease databases to support matchmaking for human disease gene discovery.

    PubMed

    Mungall, Christopher J; Washington, Nicole L; Nguyen-Xuan, Jeremy; Condit, Christopher; Smedley, Damian; Köhler, Sebastian; Groza, Tudor; Shefchek, Kent; Hochheiser, Harry; Robinson, Peter N; Lewis, Suzanna E; Haendel, Melissa A

    2015-10-01

    The Matchmaker Exchange application programming interface (API) allows searching a patient's genotypic or phenotypic profiles across clinical sites, for the purposes of cohort discovery and variant disease causal validation. This API can be used not only to search for matching patients, but also to match against public disease and model organism data. This public disease data enable matching known diseases and variant-phenotype associations using phenotype semantic similarity algorithms developed by the Monarch Initiative. The model data can provide additional evidence to aid diagnosis, suggest relevant models for disease mechanism and treatment exploration, and identify collaborators across the translational divide. The Monarch Initiative provides an implementation of this API for searching multiple integrated sources of data that contextualize the knowledge about any given patient or patient family into the greater biomedical knowledge landscape. While this corpus of data can aid diagnosis, it is also the beginning of research to improve understanding of rare human diseases. PMID:26269093

  10. Discovery of Antibiotics-derived Polymers for Gene Delivery using Combinatorial Synthesis and Cheminformatics Modeling

    PubMed Central

    Potta, Thrimoorthy; Zhen, Zhuo; Grandhi, Taraka Sai Pavan; Christensen, Matthew D.; Ramos, James; Breneman, Curt M.; Rege, Kaushal

    2014-01-01

    We describe the combinatorial synthesis and cheminformatics modeling of aminoglycoside antibiotics-derived polymers for transgene delivery and expression. Fifty-six polymers were synthesized by polymerizing aminoglycosides with diglycidyl ether cross-linkers. Parallel screening resulted in identification of several lead polymers that resulted in high transgene expression levels in cells. The role of polymer physicochemical properties in determining efficacy of transgene expression was investigated using Quantitative Structure-Activity Relationship (QSAR) cheminformatics models based on Support Vector Regression (SVR) and ‘building block’ polymer structures. The QSAR model exhibited high predictive ability, and investigation of descriptors in the model, using molecular visualization and correlation plots, indicated that physicochemical attributes related to both, aminoglycosides and diglycidyl ethers facilitated transgene expression. This work synergistically combines combinatorial synthesis and parallel screening with cheminformatics-based QSAR models for discovery and physicochemical elucidation of effective antibiotics-derived polymers for transgene delivery in medicine and biotechnology. PMID:24331709

  11. Genomic Approaches For the Discovery of Genes Mutated in Inherited Retinal Degeneration

    PubMed Central

    Siemiatkowska, Anna M.; Collin, Rob W.J.; den Hollander, Anneke I.; Cremers, Frans P.M.

    2014-01-01

    In view of their high degree of genetic heterogeneity, inherited retinal diseases (IRDs) pose a significant challenge for identifying novel genetic causes. Thus far, more than 200 genes have been found to be mutated in IRDs, which together contain causal variants in >80% of the cases. Accurate genetic diagnostics is particularly important for isolated cases, in which X-linked and de novo autosomal dominant variants are not uncommon. In addition, new gene- or mutation-specific therapies are emerging, underlining the importance of identifying causative mutations in each individual. Sanger sequencing of selected genes followed by cost-effective targeted next-generation sequencing (NGS) can identify defects in known IRD-associated genes in the majority of the cases. Exome NGS in combination with genetic linkage or homozygosity mapping studies can aid the identification of the remaining causal genes. As these are thought to be mutated in <1% of the cases, validation through functional modeling in, for example, zebrafish and/or replication through the genotyping of large patient cohorts is required. In the near future, whole genome NGS in combination with transcriptome NGS may reveal mutations that are currently hidden in the noncoding regions of the human genome. PMID:24939053

  12. Discovery and identification of candidate genes from the chitinase gene family for Verticillium dahliae resistance in cotton

    PubMed Central

    Xu, Jun; Xu, Xiaoyang; Tian, Liangliang; Wang, Guilin; Zhang, Xueying; Wang, Xinyu; Guo, Wangzhen

    2016-01-01

    Verticillium dahliae, a destructive and soil-borne fungal pathogen, causes massive losses in cotton yields. However, the resistance mechanism to V. dahilae in cotton is still poorly understood. Accumulating evidence indicates that chitinases are crucial hydrolytic enzymes, which attack fungal pathogens by catalyzing the fungal cell wall degradation. As a large gene family, to date, the chitinase genes (Chis) have not been systematically analyzed and effectively utilized in cotton. Here, we identified 47, 49, 92, and 116 Chis from four sequenced cotton species, diploid Gossypium raimondii (D5), G. arboreum (A2), tetraploid G. hirsutum acc. TM-1 (AD1), and G. barbadense acc. 3–79 (AD2), respectively. The orthologous genes were not one-to-one correspondence in the diploid and tetraploid cotton species, implying changes in the number of Chis in different cotton species during the evolution of Gossypium. Phylogenetic classification indicated that these Chis could be classified into six groups, with distinguishable structural characteristics. The expression patterns of Chis indicated their various expressions in different organs and tissues, and in the V. dahliae response. Silencing of Chi23, Chi32, or Chi47 in cotton significantly impaired the resistance to V. dahliae, suggesting these genes might act as positive regulators in disease resistance to V. dahliae. PMID:27354165

  13. Discovery of Molecular Mechanisms of Traditional Chinese Medicinal Formula Si-Wu-Tang Using Gene Expression Microarray and Connectivity Map

    PubMed Central

    Wen, Zhining; Wang, Zhijun; Wang, Steven; Ravula, Ranadheer; Yang, Lun; Xu, Jun; Wang, Charles; Zuo, Zhong; Chow, Moses S. S.; Shi, Leming; Huang, Ying

    2011-01-01

    To pursue a systematic approach to discovery of mechanisms of action of traditional Chinese medicine (TCM), we used microarrays, bioinformatics and the “Connectivity Map” (CMAP) to examine TCM-induced changes in gene expression. We demonstrated that this approach can be used to elucidate new molecular targets using a model TCM herbal formula Si-Wu-Tang (SWT) which is widely used for women's health. The human breast cancer MCF-7 cells treated with 0.1 µM estradiol or 2.56 mg/ml of SWT showed dramatic gene expression changes, while no significant change was detected for ferulic acid, a known bioactive compound of SWT. Pathway analysis using differentially expressed genes related to the treatment effect identified that expression of genes in the nuclear factor erythroid 2-related factor 2 (Nrf2) cytoprotective pathway was most significantly affected by SWT, but not by estradiol or ferulic acid. The Nrf2-regulated genes HMOX1, GCLC, GCLM, SLC7A11 and NQO1 were upreguated by SWT in a dose-dependent manner, which was validated by real-time RT-PCR. Consistently, treatment with SWT and its four herbal ingredients resulted in an increased antioxidant response element (ARE)-luciferase reporter activity in MCF-7 and HEK293 cells. Furthermore, the gene expression profile of differentially expressed genes related to SWT treatment was used to compare with those of 1,309 compounds in the CMAP database. The CMAP profiles of estradiol-treated MCF-7 cells showed an excellent match with SWT treatment, consistent with SWT's widely claimed use for women's diseases and indicating a phytoestrogenic effect. The CMAP profiles of chemopreventive agents withaferin A and resveratrol also showed high similarity to the profiles of SWT. This study identified SWT as an Nrf2 activator and phytoestrogen, suggesting its use as a nontoxic chemopreventive agent, and demonstrated the feasibility of combining microarray gene expression profiling with CMAP mining to discover mechanisms of actions

  14. Diversity of ribulose-1,5-bisphosphate carboxylase/oxygenase large-subunit genes in the MgCl2-dominated deep hypersaline anoxic basin discovery.

    PubMed

    van der Wielen, Paul W J J

    2006-06-01

    Partial sequences of the form I (cbbL) and form II (cbbM) of the ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCO) large subunit genes were obtained from the brine and interface of the MgCl2-dominated deep hypersaline anoxic basin Discovery. CbbL and cbbM genes were found in both brine and interface of the Discovery Basin but were absent in the overlying seawater. The diversity of both genes in the brine and interface was low, which might caused by the extreme saline conditions in Discovery of approximately 5 M MgCl2. None of the retrieved sequences were closely related to sequences deposited in the GenBank database. A phylogenetic analysis demonstrated that the cbbL sequences were affiliated with a Thiobacillus sp. or with one of the RuBisCO genes from Hydrogenovibrio marinus. The cbbM sequences clustered with thiobacilli or formed a new group with no close relatives. The results implicate that bacteria with the potential for carbon dioxide fixation and chemoautotrophy are present in the Discovery Basin. This is the first report demonstrating that RuBisCO genes are present under hypersaline conditions of 5 M MgCl2. PMID:16734797

  15. Phylogenomic Analysis of Natural Products Biosynthetic Gene Clusters Allows Discovery of Arseno-Organic Metabolites in Model Streptomycetes.

    PubMed

    Cruz-Morales, Pablo; Kopp, Johannes Florian; Martínez-Guerrero, Christian; Yáñez-Guerra, Luis Alfonso; Selem-Mojica, Nelly; Ramos-Aboites, Hilda; Feldmann, Jörg; Barona-Gómez, Francisco

    2016-01-01

    Natural products from microbes have provided humans with beneficial antibiotics for millennia. However, a decline in the pace of antibiotic discovery exerts pressure on human health as antibiotic resistance spreads, a challenge that may better faced by unveiling chemical diversity produced by microbes. Current microbial genome mining approaches have revitalized research into antibiotics, but the empirical nature of these methods limits the chemical space that is explored.Here, we address the problem of finding novel pathways by incorporating evolutionary principles into genome mining. We recapitulated the evolutionary history of twenty-three enzyme families previously uninvestigated in the context of natural product biosynthesis in Actinobacteria, the most proficient producers of natural products. Our genome evolutionary analyses where based on the assumption that expanded-repurposed enzyme families-from central metabolism, occur frequently and thus have the potential to catalyze new conversions in the context of natural products biosynthesis. Our analyses led to the discovery of biosynthetic gene clusters coding for hidden chemical diversity, as validated by comparing our predictions with those from state-of-the-art genome mining tools; as well as experimentally demonstrating the existence of a biosynthetic pathway for arseno-organic metabolites in Streptomyces coelicolor and Streptomyces lividans, Using a gene knockout and metabolite profile combined strategy.As our approach does not rely solely on sequence similarity searches of previously identified biosynthetic enzymes, these results establish the basis for the development of an evolutionary-driven genome mining tool termed EvoMining that complements current platforms. We anticipate that by doing so real 'chemical dark matter' will be unveiled. PMID:27289100

  16. Phylogenomic Analysis of Natural Products Biosynthetic Gene Clusters Allows Discovery of Arseno-Organic Metabolites in Model Streptomycetes

    PubMed Central

    Cruz-Morales, Pablo; Kopp, Johannes Florian; Martínez-Guerrero, Christian; Yáñez-Guerra, Luis Alfonso; Selem-Mojica, Nelly; Ramos-Aboites, Hilda; Feldmann, Jörg; Barona-Gómez, Francisco

    2016-01-01

    Natural products from microbes have provided humans with beneficial antibiotics for millennia. However, a decline in the pace of antibiotic discovery exerts pressure on human health as antibiotic resistance spreads, a challenge that may better faced by unveiling chemical diversity produced by microbes. Current microbial genome mining approaches have revitalized research into antibiotics, but the empirical nature of these methods limits the chemical space that is explored. Here, we address the problem of finding novel pathways by incorporating evolutionary principles into genome mining. We recapitulated the evolutionary history of twenty-three enzyme families previously uninvestigated in the context of natural product biosynthesis in Actinobacteria, the most proficient producers of natural products. Our genome evolutionary analyses where based on the assumption that expanded—repurposed enzyme families—from central metabolism, occur frequently and thus have the potential to catalyze new conversions in the context of natural products biosynthesis. Our analyses led to the discovery of biosynthetic gene clusters coding for hidden chemical diversity, as validated by comparing our predictions with those from state-of-the-art genome mining tools; as well as experimentally demonstrating the existence of a biosynthetic pathway for arseno-organic metabolites in Streptomyces coelicolor and Streptomyces lividans, Using a gene knockout and metabolite profile combined strategy. As our approach does not rely solely on sequence similarity searches of previously identified biosynthetic enzymes, these results establish the basis for the development of an evolutionary-driven genome mining tool termed EvoMining that complements current platforms. We anticipate that by doing so real ‘chemical dark matter’ will be unveiled. PMID:27289100

  17. Prior knowledge driven Granger causality analysis on gene regulatory network discovery

    DOE PAGESBeta

    Yao, Shun; Yoo, Shinjae; Yu, Dantong

    2015-08-28

    Our study focuses on discovering gene regulatory networks from time series gene expression data using the Granger causality (GC) model. However, the number of available time points (T) usually is much smaller than the number of target genes (n) in biological datasets. The widely applied pairwise GC model (PGC) and other regularization strategies can lead to a significant number of false identifications when n>>T. In this study, we proposed a new method, viz., CGC-2SPR (CGC using two-step prior Ridge regularization) to resolve the problem by incorporating prior biological knowledge about a target gene data set. In our simulation experiments, themore » propose new methodology CGC-2SPR showed significant performance improvement in terms of accuracy over other widely used GC modeling (PGC, Ridge and Lasso) and MI-based (MRNET and ARACNE) methods. In addition, we applied CGC-2SPR to a real biological dataset, i.e., the yeast metabolic cycle, and discovered more true positive edges with CGC-2SPR than with the other existing methods. In our research, we noticed a “ 1+1>2” effect when we combined prior knowledge and gene expression data to discover regulatory networks. Based on causality networks, we made a functional prediction that the Abm1 gene (its functions previously were unknown) might be related to the yeast’s responses to different levels of glucose. In conclusion, our research improves causality modeling by combining heterogeneous knowledge, which is well aligned with the future direction in system biology. Furthermore, we proposed a method of Monte Carlo significance estimation (MCSE) to calculate the edge significances which provide statistical meanings to the discovered causality networks. All of our data and source codes will be available under the link https://bitbucket.org/dtyu/granger-causality/wiki/Home.« less

  18. RNA-Seq Based De Novo Transcriptome Assembly and Gene Discovery of Cistanche deserticola Fleshy Stem

    PubMed Central

    Yao, Fuwen; Li, Cuiping; Tang, Qingli; Sun, Min; Sun, Gaoyuan; Hu, Songnian; Yu, Jun; Song, Shuhui

    2015-01-01

    Backgrounds Cistanche deserticola is a completely non-photosynthetic parasitic plant with great medicinal value and mainly distributed in desert of Northwest China. Its dried fleshy stem is a crucial tonic in traditional Chinese medicine with roles of mainly improving male sexual function and strengthening immunity, but few mechanistic studies have been conducted partly due to the lack of genomic and transcriptomic resources. Results In this study, we performed deep transcriptome sequencing in fleshy stem of C. deserticola, and about 80 million reads were generated using Illumina pair-end sequencing on HiSeq2000 platform. Using trinity assembler, we obtained 95,787 transcript sequences with transcript lengths ranging from 200bp to 15,698bp, having an average length of 950 bases and the N50 length of 1,519 bases. 63,957 transcripts were identified actively expressed with FPKM ≥ 0.5, in which 30,098 transcripts were annotated with gene descriptions or gene ontology terms by sequence similarity analyses against several public databases (Uniprot, NR and Nt at NCBI, and KEGG). Furthermore, we identified key enzyme genes involved in biosynthesis of lignin and phenylethanoid glycosides (PhGs) which are known to be the primary active ingredients. Four phenylalanine ammonia-lyase (PAL) genes, the first key enzyme in lignin and PhG biosynthesis, were identified based on sequences comparison and phylogenetic analysis. Two biosynthesis pathways of PhGs were also proposed for the first time. Conclusions In all, we completed a global analysis of the C. deserticola fleshy stem transcriptome using RNA-seq technology. A collection of enzyme genes related to biosynthesis of lignin and phenylethanoid glysides were identified from the assembled and annotated transcripts, and the gene family of PAL was also predicted. The sequence data from this study will provide a valuable resource for conducting future phenylethanoid glysides biosynthesis researches and functional genomic studies

  19. Rare copy number variation discovery and cross-disorder comparisons identify risk genes for ADHD.

    PubMed

    Lionel, Anath C; Crosbie, Jennifer; Barbosa, Nicole; Goodale, Tara; Thiruvahindrapuram, Bhooma; Rickaby, Jessica; Gazzellone, Matthew; Carson, Andrew R; Howe, Jennifer L; Wang, Zhuozhi; Wei, John; Stewart, Alexandre F R; Roberts, Robert; McPherson, Ruth; Fiebig, Andreas; Franke, Andre; Schreiber, Stefan; Zwaigenbaum, Lonnie; Fernandez, Bridget A; Roberts, Wendy; Arnold, Paul D; Szatmari, Peter; Marshall, Christian R; Schachar, Russell; Scherer, Stephen W

    2011-08-10

    Attention deficit hyperactivity disorder (ADHD) is a common and persistent condition characterized by developmentally atypical and impairing inattention, hyperactivity, and impulsiveness. We identified de novo and rare copy number variations (CNVs) in 248 unrelated ADHD patients using million-feature genotyping arrays. We found de novo CNVs in 3 of 173 (1.7%) ADHD patients for whom we had DNA from both parents. These CNVs affected brain-expressed genes: DCLK2, SORCS1, SORCS3, and MACROD2. We also detected rare inherited CNVs in 19 of 248 (7.7%) ADHD probands, which were absent in 2357 controls and which either overlapped previously implicated ADHD loci (for example, DRD5 and 15q13 microduplication) or identified new candidate susceptibility genes (ASTN2, CPLX2, ZBBX, and PTPRN2). Among these de novo and rare inherited CNVs, there were also examples of genes (ASTN2, GABRG1, and CNTN5) previously implicated by rare CNVs in other neurodevelopmental conditions including autism spectrum disorder (ASD). To further explore the overlap of risks in ADHD and ASD, we used the same microarrays to test for rare CNVs in an independent, newly collected cohort of 349 unrelated individuals with a primary diagnosis of ASD. Deletions of the neuronal ASTN2 and the ASTN2-intronic TRIM32 genes yielded the strongest association with ADHD and ASD, but numerous other shared candidate genes (such as CHCHD3, MACROD2, and the 16p11.2 region) were also revealed. Our results provide support for a role for rare CNVs in ADHD risk and reinforce evidence for the existence of common underlying susceptibility genes for ADHD, ASD, and other neuropsychiatric disorders. PMID:21832240

  20. Prior knowledge driven Granger causality analysis on gene regulatory network discovery

    SciTech Connect

    Yao, Shun; Yoo, Shinjae; Yu, Dantong

    2015-08-28

    Our study focuses on discovering gene regulatory networks from time series gene expression data using the Granger causality (GC) model. However, the number of available time points (T) usually is much smaller than the number of target genes (n) in biological datasets. The widely applied pairwise GC model (PGC) and other regularization strategies can lead to a significant number of false identifications when n>>T. In this study, we proposed a new method, viz., CGC-2SPR (CGC using two-step prior Ridge regularization) to resolve the problem by incorporating prior biological knowledge about a target gene data set. In our simulation experiments, the propose new methodology CGC-2SPR showed significant performance improvement in terms of accuracy over other widely used GC modeling (PGC, Ridge and Lasso) and MI-based (MRNET and ARACNE) methods. In addition, we applied CGC-2SPR to a real biological dataset, i.e., the yeast metabolic cycle, and discovered more true positive edges with CGC-2SPR than with the other existing methods. In our research, we noticed a “ 1+1>2” effect when we combined prior knowledge and gene expression data to discover regulatory networks. Based on causality networks, we made a functional prediction that the Abm1 gene (its functions previously were unknown) might be related to the yeast’s responses to different levels of glucose. In conclusion, our research improves causality modeling by combining heterogeneous knowledge, which is well aligned with the future direction in system biology. Furthermore, we proposed a method of Monte Carlo significance estimation (MCSE) to calculate the edge significances which provide statistical meanings to the discovered causality networks. All of our data and source codes will be available under the link https://bitbucket.org/dtyu/granger-causality/wiki/Home.

  1. Discovery of candidate genes for muscle traits based on GWAS supported by eQTL-analysis.

    PubMed

    Ponsuksili, Siriluck; Murani, Eduard; Trakooljul, Nares; Schwerin, Manfred; Wimmers, Klaus

    2014-01-01

    Biochemical and biophysical processes that take place in muscle under relaxed and stressed conditions depend on the abundance and activity of gene products of metabolic and structural pathways. In livestock at post-mortem, these muscle properties determine aspects of meat quality and are measurable. The conversion of muscle to meat mimics pathological processes associated with muscle ischemia, injury or damage in humans and it is an economic factor in pork production. Linkage, association, and expression analyses independently contributed to the identification of trait-associated molecular pathways and genes. We aim at providing multiple evidences for the role of specific genes in meat quality by integrating a genome-wide association study (GWAS) for meat quality traits and the detection of eQTL based on trait-correlated expressed genes and trait-associated markers. The GWAS revealed 51 and 200 SNPs significantly associated with meat quality in a crossbred Pietrain×(German Landrace×Large White) (Pi×(GL×LW)) and a purebred German Landrace (GL) population, respectively. Most significant SNPs in Pi×(GL×LW) were located on chromosomes (SSC) 4 and 6. The data of 47,836 eQTLs at a significance level of p<10(-5) were used to scale down the number candidate genes located in these regions. These SNPs on SSC4 showed association with expression levels of ZNF704, IMPA1, and OXSR1; SSC6 SNPs were associated with expression of SIGLEC10 and PIH1D1. Most significant SNPs in GL were located on SSC6 and associated with expression levels of PIH1D1, SIGLEC10, TBCB, LOC100518735, KIF1B, LOC100514845, and two unknown genes. The abundance of transcripts of these genes in muscle, in turn, is significantly correlated with meat quality traits. We identified several genes with evidence for their candidacy for meat quality arising from the integrative approach of a genome-wide association study and eQTL analysis. PMID:24643240

  2. Plant gravitropic signal transduction: A network analysis leads to gene discovery

    NASA Astrophysics Data System (ADS)

    Wyatt, Sarah

    Gravity plays a fundamental role in plant growth and development. Although a significant body of research has helped define the events of gravity perception, the role of the plant growth regulator auxin, and the mechanisms resulting in the gravity response, the events of signal transduction, those that link the biophysical action of perception to a biochemical signal that results in auxin redistribution, those that regulate the gravitropic effects on plant growth, remain, for the most part, a “black box.” Using a cold affect, dubbed the gravity persistent signal (GPS) response, we developed a mutant screen to specifically identify components of the signal transduction pathway. Cloning of the GPS genes have identified new proteins involved in gravitropic signaling. We have further exploited the GPS response using a multi-faceted approach including gene expression microarrays, proteomics analysis, and bioinformatics analysis and continued mutant analysis to identified additional genes, physiological and biochemical processes. Gene expression data provided the foundation of a regulatory network for gravitropic signaling. Based on these gene expression data and related data sets/information from the literature/repositories, we constructed a gravitropic signaling network for Arabidopsis inflorescence stems. To generate the network, both a dynamic Bayesian network approach and a time-lagged correlation coefficient approach were used. The dynamic Bayesian network added existing information of protein-protein interaction while the time-lagged correlation coefficient allowed incorporation of temporal regulation and thus could incorporate the time-course metric from the data set. Thus the methods complemented each other and provided us with a more comprehensive evaluation of connections. Each method generated a list of possible interactions associated with a statistical significance value. The two networks were then overlaid to generate a more rigorous, intersected

  3. The AEROPATH project targeting Pseudomonas aeruginosa: crystallographic studies for assessment of potential targets in early-stage drug discovery

    PubMed Central

    Moynie, Lucille; Schnell, Robert; McMahon, Stephen A.; Sandalova, Tatyana; Boulkerou, Wassila Abdelli; Schmidberger, Jason W.; Alphey, Magnus; Cukier, Cyprian; Duthie, Fraser; Kopec, Jolanta; Liu, Huanting; Jacewicz, Agata; Hunter, William N.; Naismith, James H.; Schneider, Gunter

    2013-01-01

    Bacterial infections are increasingly difficult to treat owing to the spread of antibiotic resistance. A major concern is Gram-negative bacteria, for which the discovery of new antimicrobial drugs has been particularly scarce. In an effort to accelerate early steps in drug discovery, the EU-funded AEROPATH project aims to identify novel targets in the opportunistic pathogen Pseudomonas aeruginosa by applying a multidisciplinary approach encompassing target validation, structural characterization, assay development and hit identification from small-molecule libraries. Here, the strategies used for target selection are described and progress in protein production and structure analysis is reported. Of the 102 selected targets, 84 could be produced in soluble form and the de novo structures of 39 proteins have been determined. The crystal structures of eight of these targets, ranging from hypothetical unknown proteins to metabolic enzymes from different functional classes (PA1645, PA1648, PA2169, PA3770, PA4098, PA4485, PA4992 and PA5259), are reported here. The structural information is expected to provide a firm basis for the improvement of hit compounds identified from fragment-based and high-throughput screening campaigns. PMID:23295481

  4. Phenotype discovery by gene expression profiling: mapping of biological processes linked to BMP-2-mediated osteoblast differentiation.

    PubMed

    Balint, Eva; Lapointe, David; Drissi, Hicham; van der Meijden, Caroline; Young, Daniel W; van Wijnen, Andre J; Stein, Janet L; Stein, Gary S; Lian, Jane B

    2003-05-15

    osteogenic phenotype is recognized by 8 h, reflected by downregulation of most myogenic-related genes and induction of a spectrum of signaling proteins and enzymes facilitating synthesis and assembly of an extracellular skeletal environment. These genes included collagens Type I and VI and the small leucine rich repeat family of proteoglycans (e.g., decorin, biglycan, osteomodulin, fibromodulin, and osteoadherin/osteoglycin) that reached peak expression at 24 h. With extracellular matrix development, the bone phenotype was further established from 16 to 24 h by induction of genes for cell adhesion and communication and enzymes that organize the bone ECM. Our microarray analysis resulted in the discovery of a class of genes, initially described in relation to differentiation of astrocytes and oligodendrocytes that are functionally coupled to signals for cellular extensions. They include nexin, neuropilin, latexin, neuroglian, neuron specific gene 1, and Ulip; suggesting novel roles for these genes in the bone microenvironment. This global analysis identified a multistage molecular and cellular cascade that supports BMP-2-mediated osteoblast differentiation. PMID:12704803

  5. Discovery of new soybean and soybean rust genes using next generation sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Soybean is one of the top five agricultural products in the United States and is highly susceptible to soybean rust (SR), an exotic obligate fungus that arrived in the USA in 2004. We used mRNA-Seq by Illumina/Solexa to analyze gene expression patterns of the host and pathogen at different time poin...

  6. Transcriptome analysis of Catharanthus roseus for gene discovery and expression profiling.

    PubMed

    Verma, Mohit; Ghangal, Rajesh; Sharma, Raghvendra; Sinha, Alok K; Jain, Mukesh

    2014-01-01

    The medicinal plant, Catharanthus roseus, accumulates wide range of terpenoid indole alkaloids, which are well documented therapeutic agents. In this study, deep transcriptome sequencing of C. roseus was carried out to identify the pathways and enzymes (genes) involved in biosynthesis of these compounds. About 343 million reads were generated from different tissues (leaf, flower and root) of C. roseus using Illumina platform. Optimization of de novo assembly involving a two-step process resulted in a total of 59,220 unique transcripts with an average length of 1284 bp. Comprehensive functional annotation and gene ontology (GO) analysis revealed the representation of many genes involved in different biological processes and molecular functions. In total, 65% of C. roseus transcripts showed homology with sequences available in various public repositories, while remaining 35% unigenes may be considered as C. roseus specific. In silico analysis revealed presence of 11,620 genic simple sequence repeats (excluding mono-nucleotide repeats) and 1820 transcription factor encoding genes in C. roseus transcriptome. Expression analysis showed roots and leaves to be actively participating in bisindole alkaloid production with clear indication that enzymes involved in pathway of vindoline and vinblastine biosynthesis are restricted to aerial tissues. Such large-scale transcriptome study provides a rich source for understanding plant-specialized metabolism, and is expected to promote research towards production of plant-derived pharmaceuticals. PMID:25072156

  7. Transcriptome Analysis of Catharanthus roseus for Gene Discovery and Expression Profiling

    PubMed Central

    Sharma, Raghvendra; Sinha, Alok K.; Jain, Mukesh

    2014-01-01

    The medicinal plant, Catharanthus roseus, accumulates wide range of terpenoid indole alkaloids, which are well documented therapeutic agents. In this study, deep transcriptome sequencing of C. roseus was carried out to identify the pathways and enzymes (genes) involved in biosynthesis of these compounds. About 343 million reads were generated from different tissues (leaf, flower and root) of C. roseus using Illumina platform. Optimization of de novo assembly involving a two-step process resulted in a total of 59,220 unique transcripts with an average length of 1284 bp. Comprehensive functional annotation and gene ontology (GO) analysis revealed the representation of many genes involved in different biological processes and molecular functions. In total, 65% of C. roseus transcripts showed homology with sequences available in various public repositories, while remaining 35% unigenes may be considered as C. roseus specific. In silico analysis revealed presence of 11,620 genic simple sequence repeats (excluding mono-nucleotide repeats) and 1820 transcription factor encoding genes in C. roseus transcriptome. Expression analysis showed roots and leaves to be actively participating in bisindole alkaloid production with clear indication that enzymes involved in pathway of vindoline and vinblastine biosynthesis are restricted to aerial tissues. Such large-scale transcriptome study provides a rich source for understanding plant-specialized metabolism, and is expected to promote research towards production of plant-derived pharmaceuticals. PMID:25072156

  8. Large-Scale Discovery of Disease-Disease and Disease-Gene Associations

    PubMed Central

    Gligorijevic, Djordje; Stojanovic, Jelena; Djuric, Nemanja; Radosavljevic, Vladan; Grbovic, Mihajlo; Kulathinal, Rob J.; Obradovic, Zoran

    2016-01-01

    Data-driven phenotype analyses on Electronic Health Record (EHR) data have recently drawn benefits across many areas of clinical practice, uncovering new links in the medical sciences that can potentially affect the well-being of millions of patients. In this paper, EHR data is used to discover novel relationships between diseases by studying their comorbidities (co-occurrences in patients). A novel embedding model is designed to extract knowledge from disease comorbidities by learning from a large-scale EHR database comprising more than 35 million inpatient cases spanning nearly a decade, revealing significant improvements on disease phenotyping over current computational approaches. In addition, the use of the proposed methodology is extended to discover novel disease-gene associations by including valuable domain knowledge from genome-wide association studies. To evaluate our approach, its effectiveness is compared against a held-out set where, again, it revealed very compelling results. For selected diseases, we further identify candidate gene lists for which disease-gene associations were not studied previously. Thus, our approach provides biomedical researchers with new tools to filter genes of interest, thus, reducing costly lab studies. PMID:27578529

  9. A Sorghum Mutant Resource as an Efficient Platform for Gene Discovery in Grasses.

    PubMed

    Jiao, Yinping; Burke, John; Chopra, Ratan; Burow, Gloria; Chen, Junping; Wang, Bo; Hayes, Chad; Emendack, Yves; Ware, Doreen; Xin, Zhanguo

    2016-07-01

    Sorghum (Sorghum bicolor) is a versatile C4 crop and a model for research in family Poaceae. High-quality genome sequence is available for the elite inbred line BTx623, but functional validation of genes remains challenging due to the limited genomic and germplasm resources available for comprehensive analysis of induced mutations. In this study, we generated 6400 pedigreed M4 mutant pools from EMS-mutagenized BTx623 seeds through single-seed descent. Whole-genome sequencing of 256 phenotyped mutant lines revealed >1.8 million canonical EMS-induced mutations, affecting >95% of genes in the sorghum genome. The vast majority (97.5%) of the induced mutations were distinct from natural variations. To demonstrate the utility of the sequenced sorghum mutant resource, we performed reverse genetics to identify eight genes potentially affecting drought tolerance, three of which had allelic mutations and two of which exhibited exact cosegregation with the phenotype of interest. Our results establish that a large-scale resource of sequenced pedigreed mutants provides an efficient platform for functional validation of genes in sorghum, thereby accelerating sorghum breeding. Moreover, findings made in sorghum could be readily translated to other members of the Poaceae via integrated genomics approaches. PMID:27354556

  10. USING NATURAL VARIATION FOR GENE DISCOVERY TO IMPROVE SEED IRON NUTRITIONAL VALUE

    Technology Transfer Automated Retrieval System (TEKTRAN)

    We and others are interested in developing crops biofortified with iron to improve their nutritional value for human consumption. One of the crucial tasks, therefore, is to identify relevant genes that can be targeted for transgenic or conventional breeding approaches to improve the Fe concentratio...

  11. Discovery and assessment of conserved Pax6 target genes and enhancers

    PubMed Central

    Coutinho, Pedro; Pavlou, Sofia; Bhatia, Shipra; Chalmers, Kevin J.; Kleinjan, Dirk A.; van Heyningen, Veronica

    2011-01-01

    The characterization of transcriptional networks (TNs) is essential for understanding complex biological phenomena such as development, disease, and evolution. In this study, we have designed and implemented a procedure that combines in silico target screens with zebrafish and mouse validation, in order to identify cis-elements and genes directly regulated by Pax6. We chose Pax6 as the paradigm because of its crucial roles in organogenesis and human disease. We identified over 600 putative Pax6 binding sites and more than 200 predicted direct target genes, conserved in evolution from zebrafish to human and to mouse. This was accomplished using hidden Markov models (HMMs) generated from experimentally validated Pax6 binding sites. A small sample of genes, expressed in the neural lineage, was chosen from the predictions for RNA in situ validation using zebrafish and mouse models. Validation of DNA binding to some predicted cis-elements was also carried out using chromatin immunoprecipitation (ChIP) and zebrafish reporter transgenic studies. The results show that this combined procedure is a highly efficient tool to investigate the architecture of TNs and constitutes a useful complementary resource to ChIP and expression data sets because of its inherent spatiotemporal independence. We have identified several novel direct targets, including some putative disease genes, among them Foxp2; these will allow further dissection of Pax6 function in development and disease. PMID:21617155

  12. Biomarker discovery and gene expression responses in Lycopersicon esculentum root exposed to lead.

    PubMed

    Hou, Jing; Bai, Lili; Xie, Yujia; Liu, Xinhui; Cui, Baoshan

    2015-12-15

    Gene expression analysis has shown particular promise for the identification of molecular biomarkers that can be used for further evaluation of potential toxicity of chemicals present in agricultural soil. In the study, we focused on the development of molecular markers to detect Pb toxicity in agricultural soil. Using the results obtained from microarray analysis, twelve Pb-responsive genes were selected and tested in different Pb concentrations to examine their concentration-response characteristics using real-time quantitative polymerase chain reaction (RT-qPCR). All the Pb treatments set in our study could generally induce the differential expression of the 12 genes, while the lowest observable adverse effect concentration (LOAEC) of Pb for seed germination, root elongation, biomass and structural modification derived from 1,297, 177, 177, and 1,297 mg Pb/kg soil, respectively, suggesting that the transcriptional approach was more sensitive than the traditional end points of death, growth, and morphology for the evaluation of Pb toxicity. The relative expression of glycoalkaloid metabolism 1 (P=-0.790), ethylene-responsive transcription factor ERF017 (P=-0.686) and CASP-like protein 4C2 (P=-0.652) demonstrates a dose-dependent response with Pb content in roots, implying that the three genes can be used as sensitive bioindicators of Pb stress in Lycopersicon esculentum. PMID:26252993

  13. A Sorghum Mutant Resource as an Efficient Platform for Gene Discovery in Grasses[OPEN

    PubMed Central

    Burke, John; Chen, Junping; Wang, Bo; Hayes, Chad; Emendack, Yves

    2016-01-01

    Sorghum (Sorghum bicolor) is a versatile C4 crop and a model for research in family Poaceae. High-quality genome sequence is available for the elite inbred line BTx623, but functional validation of genes remains challenging due to the limited genomic and germplasm resources available for comprehensive analysis of induced mutations. In this study, we generated 6400 pedigreed M4 mutant pools from EMS-mutagenized BTx623 seeds through single-seed descent. Whole-genome sequencing of 256 phenotyped mutant lines revealed >1.8 million canonical EMS-induced mutations, affecting >95% of genes in the sorghum genome. The vast majority (97.5%) of the induced mutations were distinct from natural variations. To demonstrate the utility of the sequenced sorghum mutant resource, we performed reverse genetics to identify eight genes potentially affecting drought tolerance, three of which had allelic mutations and two of which exhibited exact cosegregation with the phenotype of interest. Our results establish that a large-scale resource of sequenced pedigreed mutants provides an efficient platform for functional validation of genes in sorghum, thereby accelerating sorghum breeding. Moreover, findings made in sorghum could be readily translated to other members of the Poaceae via integrated genomics approaches. PMID:27354556

  14. Large-Scale Discovery of Disease-Disease and Disease-Gene Associations.

    PubMed

    Gligorijevic, Djordje; Stojanovic, Jelena; Djuric, Nemanja; Radosavljevic, Vladan; Grbovic, Mihajlo; Kulathinal, Rob J; Obradovic, Zoran

    2016-01-01

    Data-driven phenotype analyses on Electronic Health Record (EHR) data have recently drawn benefits across many areas of clinical practice, uncovering new links in the medical sciences that can potentially affect the well-being of millions of patients. In this paper, EHR data is used to discover novel relationships between diseases by studying their comorbidities (co-occurrences in patients). A novel embedding model is designed to extract knowledge from disease comorbidities by learning from a large-scale EHR database comprising more than 35 million inpatient cases spanning nearly a decade, revealing significant improvements on disease phenotyping over current computational approaches. In addition, the use of the proposed methodology is extended to discover novel disease-gene associations by including valuable domain knowledge from genome-wide association studies. To evaluate our approach, its effectiveness is compared against a held-out set where, again, it revealed very compelling results. For selected diseases, we further identify candidate gene lists for which disease-gene associations were not studied previously. Thus, our approach provides biomedical researchers with new tools to filter genes of interest, thus, reducing costly lab studies. PMID:27578529

  15. Human Transporter Database: Comprehensive Knowledge and Discovery Tools in the Human Transporter Genes

    PubMed Central

    Ye, Adam Y.; Liu, Qing-Rong; Li, Chuan-Yun; Zhao, Min; Qu, Hong

    2014-01-01

    Transporters are essential in homeostatic exchange of endogenous and exogenous substances at the systematic, organic, cellular, and subcellular levels. Gene mutations of transporters are often related to pharmacogenetics traits. Recent developments in high throughput technologies on genomics, transcriptomics and proteomics allow in depth studies of transporter genes in normal cellular processes and diverse disease conditions. The flood of high throughput data have resulted in urgent need for an updated knowledgebase with curated, organized, and annotated human transporters in an easily accessible way. Using a pipeline with the combination of automated keywords query, sequence similarity search and manual curation on transporters, we collected 1,555 human non-redundant transporter genes to develop the Human Transporter Database (HTD) (http://htd.cbi.pku.edu.cn). Based on the extensive annotations, global properties of the transporter genes were illustrated, such as expression patterns and polymorphisms in relationships with their ligands. We noted that the human transporters were enriched in many fundamental biological processes such as oxidative phosphorylation and cardiac muscle contraction, and significantly associated with Mendelian and complex diseases such as epilepsy and sudden infant death syndrome. Overall, HTD provides a well-organized interface to facilitate research communities to search detailed molecular and genetic information of transporters for development of personalized medicine. PMID:24558441

  16. Transcriptome Analysis and Discovery of Genes Relevant to Development in Bradysia odoriphaga at Three Developmental Stages.

    PubMed

    Gao, Huanhuan; Zhai, Yifan; Wang, Wenbo; Chen, Hao; Zhou, Xianhong; Zhuang, Qianying; Yu, Yi; Li, Rumei

    2016-01-01

    Bradysia odoriphaga (Diptera: Sciaridae) is the most important pest of Chinese chive (Allium tuberosum) in Asia; however, the molecular genetics are poorly understood. To explore the molecular biological mechanism of development, Illumina sequencing and de novo assembly were performed in the third-instar, fourth-instar, and pupal B. odoriphaga. The study resulted in 16.2 Gb of clean data and 47,578 unigenes (≥125 bp) contained in 7,632,430 contigs, 46.21% of which were annotated from non-redundant protein (NR), Gene Ontology (GO), Clusters of Orthologous Groups (COG), Eukaryotic Orthologous Groups (KOG), and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. It was found that 19.67% of unigenes matched the homologous species mainly, including Aedes aegypti, Culex quinquefasciatus, Ceratitis capitata, and Anopheles gambiae. According to differentially expressed gene (DEG) analysis, 143, 490, and 309 DEGs were annotated as involved in the developmental process in the GO database respectively, in the comparisons of third-instar and fourth-instar larvae, third-instar larvae and pupae, and fourth-instar larvae and pupae. Twenty-five genes were closely related to these processes, including developmental process, reproduction process, and reproductive organs development and programmed cell death (PCD). The information of unigenes assembled in B. odoriphaga through transcriptome and DEG analyses could provide a detailed genetic basis and regulated information for elaborating the developmental mechanism from the larval, pre-pupal to pupal stages of B. odoriphaga. PMID:26891450

  17. The discovery of integrated gene networks for autism and related disorders

    PubMed Central

    Hormozdiari, Fereydoun; Penn, Osnat; Borenstein, Elhanan; Eichler, Evan E.

    2015-01-01

    Despite considerable genetic heterogeneity underlying neurodevelopmental diseases, there is compelling evidence that many disease genes will map to a much smaller number of biological subnetworks. We developed a computational method, termed MAGI (merging affected genes into integrated networks), that simultaneously integrates protein–protein interactions and RNA-seq expression profiles during brain development to discover “modules” enriched for de novo mutations in probands. We applied this method to recent exome sequencing of 1116 patients with autism and intellectual disability, discovering two distinct modules that differ in their properties and associated phenotypes. The first module consists of 80 genes associated with Wnt, Notch, SWI/SNF, and NCOR complexes and shows the highest expression early during embryonic development (8–16 post-conception weeks [pcw]). The second module consists of 24 genes associated with synaptic function, including long-term potentiation and calcium signaling with higher levels of postnatal expression. Patients with de novo mutations in these modules are more significantly intellectually impaired and carry more severe missense mutations when compared to probands with de novo mutations outside of these modules. We used our approach to define subsets of the network associated with higher functioning autism as well as greater severity with respect to IQ. Finally, we applied MAGI independently to epilepsy and schizophrenia exome sequencing cohorts and found significant overlap as well as expansion of these modules, suggesting a core set of integrated neurodevelopmental networks common to seemingly diverse human diseases. PMID:25378250

  18. Discovery of Chemosensory Genes in the Oriental Fruit Fly, Bactrocera dorsalis.

    PubMed

    Wu, Zhongzhen; Zhang, He; Wang, Zhengbing; Bin, Shuying; He, Hualiang; Lin, Jintian

    2015-01-01

    The oriental fruit fly, Bactrocera dorsalis, is a devastating fruit fly pest in tropical and sub-tropical countries. Like other insects, this fly uses its chemosensory system to efficiently interact with its environment. However, our understanding of the molecular components comprising B. dorsalis chemosensory system is limited. Using next generation sequencing technologies, we sequenced the transcriptome of four B. dorsalis developmental stages: egg, larva, pupa and adult chemosensory tissues. A total of 31 candidate odorant binding proteins (OBPs), 4 candidate chemosensory proteins (CSPs), 23 candidate odorant receptors (ORs), 11 candidate ionotropic receptors (IRs), 6 candidate gustatory receptors (GRs) and 3 candidate sensory neuron membrane proteins (SNMPs) were identified. The tissue distributions of the OBP and CSP transcripts were determined by RT-PCR and a subset of nine genes were further characterized. The predicted proteins from these genes shared high sequence similarity to Drosophila melanogaster pheromone binding protein related proteins (PBPRPs). Interestingly, one OBP (BdorOBP19c) was exclusively expressed in the sex pheromone glands of mature females. RT-PCR was also used to compare the expression of the candidate genes in the antennae of male and female B. dorsalis adults. These antennae-enriched OBPs, CSPs, ORs, IRs and SNMPs could play a role in the detection of pheromones and general odorants and thus could be useful target genes for the integrated pest management of B. dorsalis and other agricultural pests. PMID:26070069

  19. Discovery of Chemosensory Genes in the Oriental Fruit Fly, Bactrocera dorsalis

    PubMed Central

    Wu, Zhongzhen; Zhang, He; Wang, Zhengbing; Bin, Shuying; He, Hualiang; Lin, Jintian

    2015-01-01

    The oriental fruit fly, Bactrocera dorsalis, is a devastating fruit fly pest in tropical and sub-tropical countries. Like other insects, this fly uses its chemosensory system to efficiently interact with its environment. However, our understanding of the molecular components comprising B. dorsalis chemosensory system is limited. Using next generation sequencing technologies, we sequenced the transcriptome of four B. dorsalis developmental stages: egg, larva, pupa and adult chemosensory tissues. A total of 31 candidate odorant binding proteins (OBPs), 4 candidate chemosensory proteins (CSPs), 23 candidate odorant receptors (ORs), 11 candidate ionotropic receptors (IRs), 6 candidate gustatory receptors (GRs) and 3 candidate sensory neuron membrane proteins (SNMPs) were identified. The tissue distributions of the OBP and CSP transcripts were determined by RT-PCR and a subset of nine genes were further characterized. The predicted proteins from these genes shared high sequence similarity to Drosophila melanogaster pheromone binding protein related proteins (PBPRPs). Interestingly, one OBP (BdorOBP19c) was exclusively expressed in the sex pheromone glands of mature females. RT-PCR was also used to compare the expression of the candidate genes in the antennae of male and female B. dorsalis adults. These antennae-enriched OBPs, CSPs, ORs, IRs and SNMPs could play a role in the detection of pheromones and general odorants and thus could be useful target genes for the integrated pest management of B. dorsalis and other agricultural pests. PMID:26070069

  20. Transcriptome Analysis and Discovery of Genes Relevant to Development in Bradysia odoriphaga at Three Developmental Stages

    PubMed Central

    Wang, Wenbo; Chen, Hao; Zhou, Xianhong; Zhuang, Qianying; Yu, Yi; Li, Rumei

    2016-01-01

    Bradysia odoriphaga (Diptera: Sciaridae) is the most important pest of Chinese chive (Allium tuberosum) in Asia; however, the molecular genetics are poorly understood. To explore the molecular biological mechanism of development, Illumina sequencing and de novo assembly were performed in the third-instar, fourth-instar, and pupal B. odoriphaga. The study resulted in 16.2 Gb of clean data and 47,578 unigenes (≥125bp) contained in 7,632,430contigs, 46.21% of which were annotated from non-redundant protein (NR), Gene Ontology (GO), Clusters of Orthologous Groups (COG), Eukaryotic Orthologous Groups (KOG), and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. It was found that 19.67% of unigenes matched the homologous species mainly, including Aedes aegypti, Culex quinquefasciatus, Ceratitis capitata, and Anopheles gambiae. According to differentially expressed gene (DEG) analysis, 143, 490, and 309 DEGs were annotated as involved in the developmental process in the GO database respectively, in the comparisons of third-instar and fourth-instar larvae, third-instar larvae and pupae, and fourth-instar larvae and pupae. Twenty-five genes were closely related to these processes, including developmental process, reproduction process, and reproductive organs development and programmed cell death (PCD). The information of unigenes assembled in B. odoriphaga through transcriptome and DEG analyses could provide a detailed genetic basis and regulated information for elaborating the developmental mechanism from the larval, pre-pupal to pupal stages of B. odoriphaga. PMID:26891450

  1. Gene Discovery in the Threatened Elkhorn Coral: 454 Sequencing of the Acropora palmata Transcriptome

    PubMed Central

    Polato, Nicholas R.; Vera, J. Cristobal; Baums, Iliana B.

    2011-01-01

    Background Cnidarians, including corals and anemones, offer unique insights into metazoan evolution because they harbor genetic similarities with vertebrates beyond that found in model invertebrates and retain genes known only from non-metazoans. Cataloging genes expressed in Acropora palmata, a foundation-species of reefs in the Caribbean and western Atlantic, will advance our understanding of the genetic basis of ecologically important traits in corals and comes at a time when sequencing efforts in other cnidarians allow for multi-species comparisons. Results A cDNA library from a sample enriched for symbiont free larval tissue was sequenced on the 454 GS-FLX platform. Over 960,000 reads were obtained and assembled into 42,630 contigs. Annotation data was acquired for 57% of the assembled sequences. Analysis of the assembled sequences indicated that 83–100% of all A. palmata transcripts were tagged, and provided a rough estimate of the total number genes expressed in our samples (∼18,000–20,000). The coral annotation data contained many of the same molecular components as in the Bilateria, particularly in pathways associated with oxidative stress and DNA damage repair, and provided evidence that homologs of p53, a key player in DNA repair pathways, has experienced selection along the branch separating Cnidaria and Bilateria. Transcriptome wide screens of paralog groups and transition/transversion ratios highlighted genes including: green fluorescent proteins, carbonic anhydrase, and oxidative stress proteins; and functional groups involved in protein and nucleic acid metabolism, and the formation of structural molecules. These results provide a starting point for study of adaptive evolution in corals. Conclusions Currently available transcriptome data now make comparative studies of the mechanisms underlying coral's evolutionary success possible. Here we identified candidate genes that enable corals to maintain genomic integrity despite considerable

  2. A comprehensive resource of drought- and salinity- responsive ESTs for gene discovery and marker development in chickpea (Cicer arietinum L.)

    PubMed Central

    2009-01-01

    and their expression profile showed predominance in specific stress-challenged libraries. Conclusion Generated set of chickpea ESTs serves as a resource of high quality transcripts for gene discovery and development of functional markers associated with abiotic stress tolerance that will be helpful to facilitate chickpea breeding. Mapping of gene-based markers in chickpea will also add more anchoring points to align genomes of chickpea and other legume species. PMID:19912666

  3. Mapping our genes: The genome projects: How big, how fast

    SciTech Connect

    none,

    1988-04-01

    For the past 2 years, scientific and technical journals in biology and medicine have extensively covered a debate about whether and how to determine the function and order of human genes on human chromosomes and when to determine the sequence of molecular building blocks that comprise DNA in those chromosomes. In 1987, these issues rose to become part of the public agenda. The debate involves science, technology, and politics. Congress is responsible for /open quotes/writing the rules/close quotes/ of what various federal agencies do and for funding their work. This report surveys the points made so far in the debate, focusing on those that most directly influence the policy options facing the US Congress. Congressional interest focused on how to assess the rationales for conducting human genome projects, how to fund human genome projects (at what level and through which mechanisms), how to coordinate the scientific and technical programs of the several federal agencies and private interests already supporting various genome projects, and how to strike a balance regarding the impact of genome projects on international scientific cooperation and international economic competition in biotechnology. OTA prepared this report with the assistance of several hundred experts throughout the world. 342 refs., 26 figs., 11 tabs.

  4. Mapping Our Genes: The Genome Projects: How Big, How Fast

    DOE R&D Accomplishments Database

    1988-04-01

    For the past 2 years, scientific and technical journals in biology and medicine have extensively covered a debate about whether and how to determine the function and order of human genes on human chromosomes and when to determine the sequence of molecular building blocks that comprise DNA in those chromosomes. In 1987, these issues rose to become part of the public agenda. The debate involves science, technology, and politics. Congress is responsible for �writing the rules� of what various federal agencies do and for funding their work. This report surveys the points made so far in the debate, focusing on those that most directly influence the policy options facing the US Congress. Congressional interest focused on how to assess the rationales for conducting human genome projects, how to fund human genome projects (at what level and through which mechanisms), how to coordinate the scientific and technical programs of the several federal agencies and private interests already supporting various genome projects, and how to strike a balance regarding the impact of genome projects on international scientific cooperation and international economic competition in biotechnology. The Office of Technology Assessment (OTA) prepared this report with the assistance of several hundred experts throughout the world.

  5. Scientific Discovery through Advanced Computing (SciDAC-3) Partnership Project Annual Report

    SciTech Connect

    Hoffman, Forest M.; Bochev, Pavel B.; Cameron-Smith, Philip J..; Easter, Richard C; Elliott, Scott M.; Ghan, Steven J.; Liu, Xiaohong; Lowrie, Robert B.; Lucas, Donald D.; Ma, Po-lun; Sacks, William J.; Shrivastava, Manish; Singh, Balwinder; Tautges, Timothy J.; Taylor, Mark A.; Vertenstein, Mariana; Worley, Patrick H.

    2014-01-15

    The Applying Computationally Efficient Schemes for BioGeochemical Cycles ACES4BGC Project is advancing the predictive capabilities of Earth System Models (ESMs) by reducing two of the largest sources of uncertainty, aerosols and biospheric feedbacks, with a highly efficient computational approach. In particular, this project is implementing and optimizing new computationally efficient tracer advection algorithms for large numbers of tracer species; adding important biogeochemical interactions between the atmosphere, land, and ocean models; and applying uncertainty quanti cation (UQ) techniques to constrain process parameters and evaluate uncertainties in feedbacks between biogeochemical cycles and the climate system.

  6. Characterization of Greater Middle Eastern genetic variation for enhanced disease gene discovery.

    PubMed

    Scott, Eric M; Halees, Anason; Itan, Yuval; Spencer, Emily G; He, Yupeng; Azab, Mostafa Abdellateef; Gabriel, Stacey B; Belkadi, Aziz; Boisson, Bertrand; Abel, Laurent; Clark, Andrew G; Alkuraya, Fowzan S; Casanova, Jean-Laurent; Gleeson, Joseph G

    2016-09-01

    The Greater Middle East (GME) has been a central hub of human migration and population admixture. The tradition of consanguinity, variably practiced in the Persian Gulf region, North Africa, and Central Asia, has resulted in an elevated burden of recessive disease. Here we generated a whole-exome GME variome from 1,111 unrelated subjects. We detected substantial diversity and admixture in continental and subregional populations, corresponding to several ancient founder populations with little evidence of bottlenecks. Measured consanguinity rates were an order of magnitude above those in other sampled populations, and the GME population exhibited an increased burden of runs of homozygosity (ROHs) but showed no evidence for reduced burden of deleterious variation due to classically theorized 'genetic purging'. Applying this database to unsolved recessive conditions in the GME population reduced the number of potential disease-causing variants by four- to sevenfold. These results show variegated genetic architecture in GME populations and support future human genetic discoveries in Mendelian and population genetics. PMID:27428751

  7. Leveraging a Sturge-Weber Gene Discovery: An Agenda for Future Research.

    PubMed

    Comi, Anne M; Sahin, Mustafa; Hammill, Adrienne; Kaplan, Emma H; Juhász, Csaba; North, Paula; Ball, Karen L; Levin, Alex V; Cohen, Bernard; Morris, Jill; Lo, Warren; Roach, E Steve

    2016-05-01

    Sturge-Weber syndrome (SWS) is a vascular neurocutaneous disorder that results from a somatic mosaic mutation in GNAQ, which is also responsible for isolated port-wine birthmarks. Infants with SWS are born with a cutaneous capillary malformation (port-wine birthmark) of the forehead or upper eyelid which can signal an increased risk of brain and/or eye involvement prior to the onset of specific symptoms. This symptom-free interval represents a time when a targeted intervention could help to minimize the neurological and ophthalmologic manifestations of the disorder. This paper summarizes a 2015 SWS workshop in Bethesda, Maryland that was sponsored by the National Institutes of Health. Meeting attendees included a diverse group of clinical and translational researchers with a goal of establishing research priorities for the next few years. The initial portion of the meeting included a thorough review of the recent genetic discovery and what is known of the pathogenesis of SWS. Breakout sessions related to neurology, dermatology, and ophthalmology aimed to establish SWS research priorities in each field. Key priorities for future development include the need for clinical consensus guidelines, further work to develop a clinical trial network, improvement of tissue banking for research purposes, and the need for multiple animal and cell culture models of SWS. PMID:27268758

  8. Beyond gene discovery in inflammatory bowel disease: the emerging role of epigenetics.

    PubMed

    Ventham, Nicholas T; Kennedy, Nicholas A; Nimmo, Elaine R; Satsangi, Jack

    2013-08-01

    In the past decade, there have been fundamental advances in our understanding of genetic factors that contribute to the inflammatory bowel diseases (IBDs) Crohn's disease and ulcerative colitis. The latest international collaborative studies have brought the number of IBD susceptibility gene loci to 163. However, genetic factors account for only a portion of overall disease variance, indicating a need to better explore gene-environment interactions in the development of IBD. Epigenetic factors can mediate interactions between the environment and the genome; their study could provide new insight into the pathogenesis of IBD. We review recent progress in identification of genetic factors associated with IBD and discuss epigenetic mechanisms that could affect development and progression of IBD. PMID:23751777

  9. Discovery of Nuclear-Encoded Genes for the Neurotoxin Saxitoxin in Dinoflagellates

    PubMed Central

    Stüken, Anke; Orr, Russell J. S.; Kellmann, Ralf; Murray, Shauna A.; Neilan, Brett A.; Jakobsen, Kjetill S.

    2011-01-01

    Saxitoxin is a potent neurotoxin that occurs in aquatic environments worldwide. Ingestion of vector species can lead to paralytic shellfish poisoning, a severe human illness that may lead to paralysis and death. In freshwaters, the toxin is produced by prokaryotic cyanobacteria; in marine waters, it is associated with eukaryotic dinoflagellates. However, several studies suggest that saxitoxin is not produced by dinoflagellates themselves, but by co-cultured bacteria. Here, we show that genes required for saxitoxin synthesis are encoded in the nuclear genomes of dinoflagellates. We sequenced >1.2×106 mRNA transcripts from the two saxitoxin-producing dinoflagellate strains Alexandrium fundyense CCMP1719 and A. minutum CCMP113 using high-throughput sequencing technology. In addition, we used in silico transcriptome analyses, RACE, qPCR and conventional PCR coupled with Sanger sequencing. These approaches successfully identified genes required for saxitoxin-synthesis in the two transcriptomes. We focused on sxtA, the unique starting gene of saxitoxin synthesis, and show that the dinoflagellate transcripts of sxtA have the same domain structure as the cyanobacterial sxtA genes. But, in contrast to the bacterial homologs, the dinoflagellate transcripts are monocistronic, have a higher GC content, occur in multiple copies, contain typical dinoflagellate spliced-leader sequences and eukaryotic polyA-tails. Further, we investigated 28 saxitoxin-producing and non-producing dinoflagellate strains from six different genera for the presence of genomic sxtA homologs. Our results show very good agreement between the presence of sxtA and saxitoxin-synthesis, except in three strains of A. tamarense, for which we amplified sxtA, but did not detect the toxin. Our work opens for possibilities to develop molecular tools to detect saxitoxin-producing dinoflagellates in the environment. PMID:21625593

  10. Gene Discovery through Transcriptome Sequencing for the Invasive Mussel Limnoperna fortunei

    PubMed Central

    Uliano-Silva, Marcela; Americo, Juliana Alves; Brindeiro, Rodrigo; Dondero, Francesco; Prosdocimi, Francisco; de Freitas Rebelo, Mauro

    2014-01-01

    The success of the Asian bivalve Limnoperna fortunei as an invader in South America is related to its high acclimation capability. It can inhabit waters with a wide range of temperatures and salinity and handle long-term periods of air exposure. We describe the transcriptome of L. fortunei aiming to give a first insight into the phenotypic plasticity that allows non-native taxa to become established and widespread. We sequenced 95,219 reads from five main tissues of the mussel L. fortunei using Roche’s 454 and assembled them to form a set of 84,063 unigenes (contigs and singletons) representing partial or complete gene sequences. We annotated 24,816 unigenes using a BLAST sequence similarity search against a NCBI nr database. Unigenes were divided into 20 eggNOG functional categories and 292 KEGG metabolic pathways. From the total unigenes, 1,351 represented putative full-length genes of which 73.2% were functionally annotated. We described the first partial and complete gene sequences in order to start understanding bivalve invasiveness. An expansion of the hsp70 gene family, seen also in other bivalves, is present in L. fortunei and could be involved in its adaptation to extreme environments, e.g. during intertidal periods. The presence of toll-like receptors gives a first insight into an immune system that could be more complex than previously assumed and may be involved in the prevention of disease and extinction when population densities are high. Finally, the apparent lack of special adaptations to extremely low O2 levels is a target worth pursuing for the development of a molecular control approach. PMID:25047650